What it monitors: pretty much every system stat you can imagine (and some you haven’t)
Where to get it: it’s probably pre-installed on your system; if not, try the sysstats package (the same one that includes iostats)
Why you’d want to use it:
- you need an answer fast, but maybe don’t have access to the “enterprise” monitoring (or there isn’t any…)
- you’re doing system testing and want a command-line tool that’s easy to configure and run in discrete timeframes.
Why you wouldn’t want to use it:
- you want data you can easily throw into a graphing or analysis program; the data produced by sar isn’t readily machine-readable
- you’re looking for a near-real-time long-term monitoring solution. In that case, just go ahead and set up munin or collectd.
Because it’s lightweight and so readily available, it’s a good tool to have in your toolbox. Plus, it’ll tell you things like fan speed and temperature, and I’m just a sucker for environmental monitoring .
How to use it:
Basically the way it works is: sadc (aka sa1) collects data which sar (aka sa2) then transforms into human-readable text. Confused? Yeah. Different references call the pieces different things; I just put it all under the “sar” umbrella.
– configure it in /etc/default/sysstat (just set ENABLED=”true”)
I also set SA1_OPTIONS=”-S DISK -S POWER” so I get my environmental stats. :)
– start/restart/stop it from /etc/init.d/sysstat
– the crontab to manage the data collection rate is /etc/cron.d/sysstat. Default poll cycle is 10 minutes.
– collected data is in /var/log/sysstat (/var/log/sa on CentOS); saXX are your data files, sarXX are your text files (rollup will happen daily.)
You can also run it from the command line like so:
sar -A -o sardata 60 20 > /dev/null &
-A = collect all data
-o = output to the following file
60 = do a poll every 60 seconds
20 = this many times (IOW I’ll get 20 minutes of one-minute samples)
… redirect the rest of the output to /dev/null and run in the background (aka “don’t annoy me”)
Once you’ve captured the data, you need to transform it to text so you can read it. This is where the options get many and varied (they can be different on different distros, so check those man pages.) It took just a few hours of experimentation to familiarize myself with the available options and pick the ones I liked the best.
The basic format is:
sar [options] -f [datafile]
I recommend checking out:
sar -A -f [datafile] > outputfile
…just to see what’s available.
Options I particularly like:
-b for I/O stats
-dp activity per block device; pretty-print the block names (make sure you have -S DISK in your config, or the -A option)
-m “FAN,TEMP” power management (make sure you have -S POWER in your config, or use the -A option)
-n “DEV,EDEV” network stats, including errors
-u CPU usage (or -P “ALL” for CPU usage per processor)
-w context switching
-W pages swapped
So, there’s a couple of things you can do here:
- use the data stored in the system sar files for a quick (text) view into recent system stats; e.g., say I want to look at recent disk activity, I run sar -dp -f on one of the system-collected data files and eye-grep for interesting details.
- run your own collection to grab data for special circumstances; like, I’m running a 2-hour benchmark and want to collect pretty fine-grained stats, so I’ll run sar every 10 seconds for a 3-hour period surrounding the test.
Note: you have to do this text transform locally, or on a machine with an identical build.
Once it’s in text, well…you can read it ;) Unfortunately, as I mentioned before, it’s not readily machine-parsable; you can’t just e.g. dump it into a database without some munging. I like sar2rrd to pump out rrdtool graphs; there’s an Excel solution as well , and of course, you can always Roll Your Own [tm].
1 – Do NOT get me started.
2 – Story time: A while back (like, a looong while back) I discovered that you could get the temperature for certain Cisco routers and switches. One of the senior engineers made fun of me  – “Why would anyone need to monitor temperature? That’s dumb.” I did it anyway, and had the last laugh when I found one of the switches at an unmanned colo was seriously overheating.
3 – Yes, he was kind of a jerk.
4 – googling that is an exercise I leave to the reader, as I am no longer as enamored of Excel as I once was.