13 June, 2014
Since “what should I monitor in my database” has come up in conversation several times lately, I thought I’d put this here where I (theoretically) won’t lose it. I’ll save for later the discussion of where to get this info and which tools give me which stats :)
server CPU, memory, I/O, network usage, and all “slow” queries logged.
CPU usage, per-proc if available
Memory usage, including swap
disk usage (in terms of space – pay special attention to database partitions)
network stats, including errors (if you have a Cisco network & are friends with the network team, netflow data is cool to have)
If I could have everything I wanted: everything from vmstat and iostat extended data
number of connections
commits vs rollbacks
table size (plus bloat, if we can find a good query for it)
index size (same)
If I could have everything I wanted:
everything from pg_stat_database, pg_stat_sys_*, pg_statio_*, and pg_stat_user_activity
Activity logs configured as outlined here.
Then there’s a whole class of things that fall under “How long does it take to…”: do a backup, restore a backup, etc.
18 October, 2013
We had one of those truly amazing meetings at PDXPUG this week. Along with the ideas that came out of this meeting (such as, leveraging Calagator for optimal scheduling of new user groups and this), Matt Smiley schooled a bunch of us on some basic unix utilities. Recorded here so I don’t forget them; these are version-dependent, YMMV.
-S prevents line wrap, then you use the arrow keys to page through your output. This is super-handy when viewing wide, tabular output.
– ctrl-m sorts by mem
– s lets you choose the refresh rate
– await is the value to use for disk latency
– svctime is not :) (it’s a calculated value instead of an actual measurement). The sar man page notes that this field is not to be trusted and will be removed in the future.
– collect ongoing stats: iostat -x -t -k 1 100
-x = extended stats
-t = include timestamps
-k = measurements in kB :)
1 = one second intervals
100 = 100X
Your first (and possibly second) set of data collected from this can be thrown out, as it contains the cumulative stats since the system started. This also affects running a single timepoint.
I also learned about a couple of monitoring tools I need to check out: saidar and Data Dog.
22 March, 2013
Next up in my occasional monitoring tools review series: another oldie-but-goodie, readily available tool, nmon.
What it monitors: system stats
Where to get it: it’s probably pre-installed on your system. If not, get it from sourceforge.
Why you’d want (or not) to use it: Pretty much the same reasons you’d want to use sar, as I discussed previously.
I’ve (casually) used the interactive interface, and until a few weeks ago, thought that’s all that there was to this tool. Not so. There’s an option (-f) you can use to save a single data poll to a file, in “spreadsheet format”. You can also specify an interval and a number of polls to take:
nmon -f -s 60 -c 60
= poll once a minute for an hour.
nmon will create a file for you, with a default name of [server]-timestamp.nmon, or you can specify your own filename with -F.
To generate graphs, there are two Excel spreadsheets you can download from the wiki. I tried the nmon Analyzer Spreadsheet (the newer of the two). The docs recommend “keep the number of snapshots to around 300”. I agree. The graphs look a lot nicer with fewer data points in them. However, Excel graphs just aren’t as pretty as rrdtool graphs.
There’s an nmon2rrd tool, but it was compiled for AIX so I didn’t try it out.
Of the two, if I’m looking for on-the-spot visualization of system performance, nmon wins it. For storage and later review of the data, I’d go with sar + sar2rrd.pl over nmon + the Excel spreadsheet. The graphs are prettier and easier to read with sar.
15 March, 2013
What it monitors: pretty much every system stat you can imagine (and some you haven’t)
Where to get it: it’s probably pre-installed on your system; if not, try the sysstats package (the same one that includes iostats)
Why you’d want to use it:
- you need an answer fast, but maybe don’t have access to the “enterprise” monitoring (or there isn’t any…)
- you’re doing system testing and want a command-line tool that’s easy to configure and run in discrete timeframes.
Why you wouldn’t want to use it:
- you want data you can easily throw into a graphing or analysis program; the data produced by sar isn’t readily machine-readable
- you’re looking for a near-real-time long-term monitoring solution. In that case, just go ahead and set up munin or collectd.
Because it’s lightweight and so readily available, it’s a good tool to have in your toolbox. Plus, it’ll tell you things like fan speed and temperature, and I’m just a sucker for environmental monitoring .
read more »