Maatkit’s mk-loadavg
tool is a helpful way to gather information about infrequent conditions on your database server (or any other server, really). We wrote it at Percona to help with those repeated cases of things like “every two weeks, my database stops processing queries for 30 seconds, but it’s not locked up and during this time there is nothing happening.” That’s pretty much impossible to catch in action, and these conditions can take months to resolve without the aid of good tools.
In this blog post I’ll illustrate a very simple usage of mk-loadavg
to help in solving a much smaller problem: find out what is happening on the database server during periods of CPU spikes that happen every so often.
First, set everything up.
- Start a screen session:
screen -S loadmonitoring
. If you don’t have screen, you can run mk-loadavg as a daemon, but it’s much better to use screen in my opinion. - Get
mk-loadavg
. For purposes of this blog post, I’m going to get the latest trunk code, because I know a bug or two has been fixed since the last release.wget http://www.maatkit.org/trunk/mk-loadavg
- Create a directory to hold the collected information in files.
mkdir collected
Next let’s set up a script that mk-loadavg
can use to gather some information when it detects a high CPU condition. Save the contents of this script as “collect-stats.sh”. The script will gather about 30 seconds worth of statistics. It uses a simple sentinel file /tmp/gatherinfo
to prevent multiple occurrences from gathering statistics at the same time. (This is intentionally simple for demo purposes.)
#!/bin/bash
if [ -f /tmp/gatherinfo ]; then exit 0; fi
touch /tmp/gatherinfo
d=`date +%F-%T`
echo "gathering info for $d"
ps -eaf >> collected/$d-ps 2>&1 &
top -bn1 > collected/$d-top 2>&1 &
mysql -e 'show innodb status\G show full processlist\G' >> collected/$d-innodbstatus 2>&1 &
vmstat 1 30 >collected/$d-vmstat 2>&1 &
iostat -dx 1 30 >collected/$d-iostat 2>&1 &
mysqladmin ext -i1 -c30 > collected/$d-mysqladmin 2>&1 &
sleep 30
ps -eaf >> collected/$d-ps 2>&1 &
mysql -e 'show innodb status\G show full processlist\G' >> collected/$d-innodbstatus 2>&1 &
rm /tmp/gatherinfo
Now make the script executable: chmod +x collect-stats.sh
. At this point we’re ready to start working. Let’s fire the stats-collection script when the system’s user CPU goes above 40%.
perl mk-loadavg --watch "Server:vmstat:us:>:40" --interval 1 --execute collect-stats.sh
If the CPU goes over 40%, you'll get a bunch of files in the collected
directory, with helpful information to diagnose the problem. This example usage is pretty similar to a real-life one I set up recently. It enabled me to take a methodical approach to the problem:
- From the
top
output I was able to identify that MySQL was causing the spike. - I then looked at the
SHOW STATUS
output to see what the database server was doing, using mext as a helper. - From
Select_full_scan
andHandler_read_rnd_next
I isolated table scans as a problem. - From the saved
SHOW PROCESSLIST
I found problem queries and optimized them.
You would be right if you said there are much better tools for finding problem queries -- but remember two things: 1) sometimes clients ask for the lightweight, short-term solution that can be set up in about 5 minutes and checked the next day; and 2) when it is unclear that queries are the problem, setting up only a query monitor is stabbing in the dark and will not get results.
In addition to watching vmstat
to measure system CPU usage, mk-loadavg
can watch many other things, such as the MySQL server's SHOW PROCESSLIST
, parsing values from SHOW INNODB STATUS
, and so on.
Related posts:
- A metric for MySQL load average If you wer
- Maatkit version 3119 released Your month
- An alternative to the MySQL Query Analyzer MySQL just
Related posts brought to you by Yet Another Related Posts Plugin.
PlanetMySQL Voting:
Vote UP /
Vote DOWN"
No comments:
Post a Comment