Running queries report when RDS load is high

While working on RDS database, several times we observe that RDS slowdowns due to high CPU load which can cause due to lots of queries hitting to DB and run at the same time.  surely you can setup a monitoring system which can send you alerts related to CPU load. But manually checking the db, finding running queries is little bit lengthy task when the db is in production. isn’t it?  Instead of  manual work, we can setup  or tweak monitoring system like Nagios to send reports with running database queries when Cpu load is high on RDS.

 This post exactly shows how to do it…! As we all know that Nagios is capable of event handling so it can send alerts depends of the service state change. So while monitoring RDS cpu load, when Nagios detects that cpu load is more than the defined value, it will login to  RDS instance, collect the running queries and send it as a report with a alert email. It will be easy for  administrator to find out faulty queries and perform the remaining task.

To achieve this, we need working Nagios setup and  Nagios AWS Cloud Watch metric plugin which can be found at this link ( click on link)

There are 4 files,

a)  check_cloudwatch_status.rb  :  The ruby script that retrieve the metric from AWS EC2, ELB or RDS.

b) check_cloudwatch_status.cfg  : The definition of the nagios command.

c) sample_service.cfg  :  Sample of use for retrieving CPU usage of an instance on Amazon EC2.

d) encrypt_credentials.rb :   Ruby script that encrypts Amazon Web Services Credentials.

 check_cloudwatch_status.rb  is the main monitoring script and you need  ruby  and rubygems  packages installed on your system. Once you install these softwares, dont forget to run “gem install amazon-ec2”.  This script uses amazon-ec2 to call instance api’s on AWS.  Now we will do the real work.

a)  Log on to your AWS account with admin account and download  your security keys  using IAM.

b) Copy  AWS  .pem key in directory where you downloaded cloudwatch metric files  as it will be necessary to create credentials to connect the instance you wish to monitor.

c) Edit encrypt_credentials.rb and check_cloudwatch_status.rb.Replace “/etc/cloutomate/cloutomate.pem ”  with  “/pathto/.pem”  ( your .pem file path ) .

d) Create credentials as below example command,

RUBYOPT=rubygems ruby encrypt_credentials.rb -A XXXXXXXXXXXXXXXXXXXX -S XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX -f ec2_credentials.cfg  ( Where -A is your User ID Key and -S is your security key.)  You should see your keys are stored in encrypted format inside  ec2_credentials.cfg.

 e)  Unfortunately check_cloudwatch_status.rb has some coding bugs which I saw when I initially ran this script to check my aws instance so to  fix it , edit check_cloudwatch_status.rb  and replace  all,  + e   with  #{e}” .

f) This version of check_cloudwatch_status.rb  call apis  on region 1 for example, us-east-1, us-west-1, So if you planning to monitor instance on region 2, add urls  in check_cloudwatch_status.rb  for respective regions for example,  “ec2.us-west-2.amazonaws.com“.

g) check_cloudwatch_status.rb  has multiple options to check  multiple metrics on aws instance, such as  ec2, rds ,elb etc.  You can list them by  hitting  “ruby check_cloudwatch_status.rb”.  Here is the example command of monitoring Cpu usage of  ec2 instance,

RUBYOPT=rubygems ruby lib/nagios/check_cloudwatch_status.rb -a ec2-XXX-XXX-XXX-XXX.compute-1.amazonaws.com -i i-XXXXXXXX -f ec2_credentials_1.cfg -C CPUUtilization -c 90 -w 75  ( where  -a is instance public dns url, -i  is a instance id ) Output you get is , CloudWatch Metric: CPUUtilization, Average: 16.81, Maximum: 22.93, Minimum: 11.56|metric_average=16.81 metric_maximum=22.93 metric_minimum=11.56

 At this point  our initial setup is good and next part is integrating it with Nagios.

You can refer check_cloudwatch_status.cfg and  sample_service.cfg to get an idea about Cloudwatch metic Nagios setup.   For this setup, assuming there is a working nagios setup, I am writing steps for  adding RDS Cpu utilization monitoring with email alerts containing running database queries when nagios state changes.

1)  Create RDS instance if it is not already created.  Connect to RDS database  with mysql client and create  user which will have access over all other users activities.

2)  Run check_cloudwatch_status.rb script on command line to check Cpu utilization of RDS database ,

RUBYOPT=rubygems ruby lib/nagios/check_cloudwatch_status.rb -a db1-XXX-XXX-XXX-XXX.compute-1.amazonaws.com -i db1 -f ec2_credentials_1.cfg -D CPUUtilization -c 90 -w 75  ( -D is a metric option for RDS )   and you should get result as shown above.

 3) Install ssmtp package and configure your email settings in /etc/ssmtp dir by editing revalises and ssmtp.conf files  ( ssmtp is a sendmail client).

4)  Nagios configuration Part  : Change dir to Nagios config dir  example  :  /usr/local/nagios/etc/objects.

a) Edit  command.cfg  and add

define command{
                 command_name notify-mysql-by-email
command_line    /usr/bin/printf "To: $CONTACTEMAIL$nFrom: nagios@fqdnnSubject: RDS - ** mysql CPU Load Alert: $HOSTALIAS$/$SERVICEDESC$ is $SERVICESTATE$ *nn `mysqladmin -h $HOSTADDRESS$ -u dbuser -ppassword processlist`n"|/usr/sbin/ssmtp -t

}

  b) Edit contacts.cfg and add

define contact{
        contact_name                    mysqlcpu
        alias                           mysqlcpu
        use                             generic-contact
        service_notification_commands   notify-mysql-by-email
        email                           user1@example.com
}

define contactgroup{
        contactgroup_name     mysql
        members               mysqlcpu
}

 c)  Create  check_cloudwatch_status.cfg and add

define command {
        command_name    check_cloudwatch_status_rds
        command_line    /pathto/ruby  /pathto/check_cloudwatch_status.rb -a $HOSTADDRESS$ -i $ARG1$ -f $ARG2$ -D $ARG3$ --warning $ARG4$ --critical $ARG5$
}

 d) Create rds-service.cfg  and add,

define host {
        use                       generic-host
        host_name                 db1.XXXXXXXX.us-west-2.rds.amazonaws.com
        address                   db1.XXXXXXXX.us-west-2.rds.amazonaws.com
}
define service {
use                              generic-service
host_name                        db1.XXXXXXXX.us-west-2.rds.amazonaws.com
service_description              Cpu Usage
check_command                    check_cloudwatch_status_rds!db1!/pathto/ec2_credentials_1.cfg!CPUUtilization!1!2
#max_check_attempts               3
normal_check_interval            3
notifications_enabled            1
contact_groups                   mysql
}

 e)  Edit  /usr/local/nagios/etc/nagios.cfg file  and  add

cfg_file=/usr/local/nagios/etc/objects/check_cloudwatch_status.cfg
cfg_file=/usr/local/nagios/etc/objects/rds-service.cfg

 f) Restart the nagios service and if there is no configuration error,   You can see the new service added on the  nagios web front end.

 Next time, when CPU load service state changes, you start with  getting email reports  with  RDS  running queries.

Thats it,  You can add similar services for other AWS services.  Have Fun.

 

Comments are closed.

Subscribe to Our Newsletter

Join our community of DevOps enthusiast - Get free tips, advice, and insights from our industry leading team of AWS experts.