Since “what should I monitor in my database” has come up in conversation several times lately, I thought I’d put this here where I (theoretically) won’t lose it. I’ll save for later the discussion of where to get this info and which tools give me which stats :)
Bare minimum:
server CPU, memory, I/O, network usage, and all “slow” queries logged.
More extensive:
System stats:
CPU usage, per-proc if available
Memory usage, including swap
disk usage (in terms of space – pay special attention to database partitions)
disk I/O
disk busy
network stats, including errors (if you have a Cisco network & are friends with the network team, netflow data is cool to have)
If I could have everything I wanted: everything from vmstat and iostat extended data
Pg stats:
number of connections
transactions
idle transactions
commits vs rollbacks
locks
checkpoint frequency
database size
table size (plus bloat, if we can find a good query for it)
index size (same)
If I could have everything I wanted:
everything from pg_stat_database, pg_stat_sys_*, pg_statio_*, and pg_stat_user_activity
Activity logs configured as outlined here.
Then there’s a whole class of things that fall under “How long does it take to…”: do a backup, restore a backup, etc.