27 June, 2016

Slow Query Checklist

by gorthx

Every slow query situation has its own quirks.  Here’s my generic template for troubleshooting them.

First, I start with the holy trinity of IT Troubleshooting Questions:

  1. What are you trying to do?
  • What data do you want?
  • What is the end result you want?
  1. How does your result differ from what you expected?
  • “I thought I could do a 7-way join on unindexed fields in tables with over 3 million records each in less than 2 seconds.  I’m pretty disappointed with Postgres’s performance.”
  1. What changed?
  • Has it always been this slow?
  • Did this happen suddenly, or did it slowly degrade?
  • If suddenly, what changed?  (“This started right after our last deploy.”)

Continue reading

20 June, 2016

RDS: log vacuum messages

by gorthx

Last year I commented that “autovacuum log messages are still missing in [RDS] 9.4“.

Amazon fixed that for versions 9.4.5 and up with the custom GUC rds.force_autovacuum_logging_level.

I discovered this week that it’s also available on earlier versions of RDS – I have it enabled on a 9.3.10 instance. The default value in the config is one of the debug levels; just change it to ‘log’. No restart required (assuming you already have logging configured… and you do, don’t you?)

Works as expected:

2016-06-16 18:36:41 UTC::@:[31403]:LOG: automatic vacuum of table "my_db.public.my_dimension": index scans: 1
 pages: 0 removed, 29457 remain
 tuples: 3454 removed, 429481 remain, 0 are dead but not yet removable
 buffer usage: 64215 hits, 8056 misses, 22588 dirtied
 avg read rate: 1.018 MB/s, avg write rate: 2.855 MB/s
 system usage: CPU 0.10s/0.88u sec elapsed 61.80 sec

This makes me very happy – no more cron job to monitor vacuums!

30 March, 2016

This month’s RDS tip, a pgloader exercise, and bonus recruiter spam

by gorthx

1. Several months back, AWS re-worked the web console. You can get sparkline-style graphs for CPU usage, memory, storage, and database connections at the top level, instead of having to drill down to the Cloudwatch metrics for the instance.

I find this really handy – but for one quirk with the graphs:
connection_threshold

There’s white space to the right of the red line. Therefore, I was interpreting the red line to mean “trouble is brewing & I should do something about this”.

Turns out that red line is the limit; there’s nowhere else to go. Whoops!

2. This week I finally had a reason to try out pgloader: One of my analysts needed some help loading some ugly fixed-width data1.

Installing was super-easy on my mac (`brew install pgloader`).  I worked through the examples before starting to work with my actual dataset2.

The data to be loaded came bundled as several text files: the data, plus three or four additional files describing the layout.  I wrote a truly glorious string of cut, sed, paste, and awk to create a pgloader control file that would work. And it did!

Except:

field1 | field2 | field3 
--------+--------+----------
 [null] | D | IMITRI
 [null] | G | ABRIELLE
 [null] | M | ARK
 [null] | S | ELENA

Uh, what?

The data descriptions had character counts starting at 1, and pgloader expects them to start at 0 (as they should).  (For extra fun, the first column in all records of this dataset was nothing but spaces.)

3. This week’s hilarious recruiter spam:
“Hey there Gabrielle! I was doing my homework on sites like Meetup, GitHub, etc. and I noticed your Java skills.”

I’m pretty sure you didn’t.



1 – Is that redundant?
2 – 600 columns of fixed-width data?  Who does that?!
3 – Why couldn’t it all be in one file? Again: who does this?!4
4 – People who hate DBAs, that’s who.

3 February, 2016

RDS OS update

by gorthx

I’ve some more info for the AWS RDS OS update I mentioned last week.

The announcement states that this update will take an outage. In my experience, it’ll be the usual failover time for a Multi-AZ instance (a minute or so), and 8-10 minutes for no-Multi-AZ instances. According to my event logs, the entire update process takes 20-25 minutes.

Since I had n > 1 instances to handle, I preferred to review & schedule updates via the cli, because I can then write a quick bash wrapper for each piece & handle them in bulk.

Here’s the command:

aws rds describe-pending-maintenance-actions

To review upcoming maintenance for only specific instances, pass it a comma-separated list of instance names to the filters switch, like so:

aws rds describe-pending-maintenance-actions \
--filters Name=db-instance-id,Values=mydb1,mydb2

Here’s an example of what you’ll get back:

PENDINGMAINTENANCEACTIONS arn:aws:rds:[az]:[acct_no]:db:[dbname]
PENDINGMAINTENANCEACTIONDETAILS system-update 2016-02-26T19:41:22Z 2016-02-26T19:41:22Z Performance improvements and security updates

(The timestamps are in UTC, so you may have to do some math.)

Neither of those timestamps were in my maintenance window, so I dug into the docs  a bit further.

There are three possible dates available:
AutoAppliedAfterDate – AWS will apply the maintenance in the next maint window after this date
ForcedApplyDate – AWS will apply the maintenance at this time, regardless of maintenance window (eeeek!)
CurrentApplyDate – When AWS will apply it, if you’ve opted-in.

Without field headers, I can’t really figure out which two of the possible three dates are displayed. That “ForcedApplyDate” is a bit concerning, so I used the ‘output to JSON’ option to get some more info:

aws rds describe-pending-maintenance-actions \
--filters Name=db-instance-id,Values=mydb1 \
--output json

(excerpt):

"PendingMaintenanceActionDetails": [
  {
    "Action": "system-update",
    "Description": "Performance improvements and security updates",
    "CurrentApplyDate": "2016-02-26T19:41:22Z",
    "AutoAppliedAfterDate": "2016-02-26T19:41:22Z",
    "OptInStatus": ""
  }
],

So that clears that up, a little. Now to get this stuff scheduled!

aws rds apply-pending-maintenance-action \
--resource-identifier arn:aws:rds:[region]:[account]:db:[db-name] \
--apply-action system-update \ 
--opt-in-type next-maintenance

Since I now know which fields I want, I can use ‘output as text’ to verify my scheduling action, filtered with the –query switch:

aws rds describe-pending-maintenance-actions \
--filters Name=db-instance-id,Values=mydb1 \
--output text \
--query 'PendingMaintenanceActions[*].{OptIn:PendingMaintenanceActionDetails[*].OptInStatus,Date:PendingMaintenanceActionDetails[*].CurrentApplyDate}'

Output:

DATE 2016-01-22T17:04:00Z
OPTIN next-maintenance

(If you are wondering “how in Sam Hill did you create that query filter from that JSON output?”, wonder no more: here’s a link to the docs.  Getting the –query filter correct usually takes some back & forth with reviewing the JSON output, so don’t be discouraged if it takes a few tries.)

27 January, 2016

Scale 14x recap

by gorthx

I went to SCALE mainly for PostgreSQL@Scale – Joe Conway’s grown this annual add-on into a two-day, two-track event.

The new venue is a huge improvement, even though the walk from the hotel was a bit longer than I expected. There are a lot more accessible-on-foot food options.

For my session, I gave an updated version of last year’s “RDS Postgres: A Journey down the Amazon”.

If you’re using RDS and aren’t keeping up with your monitoring, please be aware of the current required OS update.  You can view & schedule required updates via the web console; here are the corresponding cli commands:

aws rds describe-pending-maintenance-actions \
--filters Name=db-instance-id,Values=mydb,mydb1,mydb2
aws rds apply-pending-maintenance-action \
--resource-identifier arn:aws:rds:[region]:[account]:db:[db-name] \
--apply-action system-update \ 
--opt-in-type next-maintenance

I find JSON and text output more useful for these types of tasks; I’ll cover that in a later post.

Accessing RDS Pg logs is kind of a sticking point for some people, me included. (I _really_ want an easy way to get them into Splunk.)
Here’s Denish Patel’s work using the rds cli, and a quick bash script I threw together, which uses the aws cli.

Based on the quick headcount I took at the beginning of the session, attendees were about 2/3 devs, 1/3 DBAs. We also had a couple of people from the Amazon RDS team; it’s always nice to know I have backup in the audience! ;)

A couple of folks had questions after my talk and I didn’t catch up with them in the booth. If that is you, feel free to leave a comment here, or email me.

Other talks I attended:
Jim Mlodgenski’s Debugging PL/PgSQL contained some good tips on use of RAISE NOTICE and what not to do. I installed pl_profiler and started using it the next afternoon.

Xof’s JSON Home Improvement included advice on when to use JSON vs JSONB, or maybe even neither :) If you’ve been reading my blog for any length of time, you know that these kinds of discussions just end in me making plans for more benchmarking…

Peter Geoghegan’s UPSERT use cases  covered a good range of material from beginning to “We can do WHAT now?” and made me realize I can always improve my SQL skills.

I’ve been hearing about ToroDB  but hadn’t attended one of Alvaro’s talks yet. Very basically, it’s a Pg-backed Mongo interface. They take the json data and explode it out into sub-documents. Then there’s a table that stores the structure _in json_. It both blew my mind & had me smacking my forehead “of course!”

It is in alpha, so this applies.

Slides for talks are (or will be) on the Pg wiki.  We do rely on speakers to load their own slides, so if you are after something in particular, please ask & I will remind them. :)

I’m already looking forward to next year, and hope I get to go ice skating then!

20 January, 2016

Postgres FDW, one year later

by gorthx

I’ve been using Postgres’ foreign data wrappers for about a year now.

Stuff I’ve learned/done:
– Those of you on 9.5 can use the new IMPORT FOREIGN SCHEMA. If you’re not quite there yet, Leo and Regina’s function to script table creation as been saving me a ton of time.

I modified it slightly (gist here):
– I name my foreign servers after their source database, and organize the foreign tables into schemas named for the source db/foreign server, so I tweaked the function to handle that
– added a parameter to specify updateable true/false, since I explicitly set that for each table
– changed the output to my preferred formatting.

Updating a server definition is much easier than I expected:

BEGIN;
ALTER SERVER my_fdw_server
 OPTIONS (
SET dbname 'new_db_name'
, SET host 'new_host_name'
);

You can do the same thing with the table definitions;  no need to drop & re-create them.

– Handling a serial field on the source table is interesting.
Say I have a table that looks like this:

CREATE TABLE table1 (
id serial primary key
, name text
, color text
);

I create a foreign table on that, including all fields (exercise left to the reader). Then try to update it as I normally would, allowing Pg to handle the incrementing of the id field:

INSERT INTO table1 (name, color)
VALUES ('bob', 'blue');
ERROR: null value in column "id" violates not-null constraint

I don’t want to pass in a value for id, because that negates the entire purpose of have a serial datatype.

My solution was to leave the id field out of the foreign table definition.

Watch your datatypes, as CREATE FOREIGN TABLE doesn’t do any verification. “Fun” things can happen when you define your foreign table with timestamp without tz (to match the source table), then finally get approval to correct that on the source table, and don’t also change it in your foreign table definition.

Tags: ,
28 September, 2015

PgOpen recap

by gorthx

We’ve wrapped up another PostgresOpen! The official numbers aren’t in yet, but we were up a significant percentage from last year. We had so many last-minute registrants and walk-ups that we ran out of extra badges.

Even better, I got to attend most of the talks on my list.

I can’t pick a favorite, but judging by the amount of notes I took, I learned the most in Joe Celko’s talk on measurement scales. The concepts and nomenclature were vaguely familiar from my long-ago education in the sciences, and it was interesting to tie them in with what I’m doing now.

Grant gave an excellent introduction to RDS Pg. I even learned some things. If you’re new to RDS Postgres, you should take this tutorial next time it’s offered. I continue to be impressed with the RDS team members; not only are they a whip-smart group, they’re also very personable and kind.

I learned about \ef (allows you to edit functions in place) in Keith Fiske’s Don’t Forget the Elephant, and was also reminded of how far Postgres has come, even just since 9.0. So many useful features!

Stella’s SQL Guru talk gave me some good pointers on window functions.

Slides for most of the talks are available on the wiki, and videos are on the way.

Between the social events and prepping for the Tour des Fleurs, I did not have time to see much of Dallas. I’ll likely run the TDF again next year, if the conf coincides with it. LMK if you’d like to join. The course is mostly flat; there’s one small hill.

Great to see everyone and hope to see you again next year!

30 August, 2015

My “Must-see” List for PgOpen

by gorthx

PostgresOpen is coming up in just a few weeks! (Disclosure: I am on the conference committee.) We are still working out the room assignments; your conference badge/printed program will have the right info.

The committee is very excited to have snagged Lacey Williams Henschel to give our keynote, on Open Source & Higher Ed. I’m looking forward to meeting her! (The real reason I’m at the reg desk is to try to meet everyone ;) )

As in previous years, four tutorials are scheduled for the day before the conference proper. As anyone who’s talked to me in the past year could guess, I’m most excited about Grant McAlister’s RDS tutorial. There’s always more to learn about this quickly maturing platform.

My picks for the rest of the conference are as follows:
Thursday sessions:
– Kevin Kempter’s Monitoring Postgres

– Joe Conway’s R talk

– Sarah Conway’s security talk was SRO last year; this year she’s back to focus on security in RDS. (In the same time slot, Jim Mlodgenski’s FDW talk is a good intro to that subject.)

– I’ve been using SQL for a while now :koff: but I’m still not that great at some things like window functions, so I’ll be in Stella Nisenbaum’s SQL Guru talk.

– Then I’ll finish out my day with David Steele’s Audit Logging.

Friday sessions:
– I’ve been enjoying the “History of Postgres”-type talks we’ve had at a few confs lately, and am looking forward to Keith Fiske’s version, Don’t Forget the Elephant.

– Shaun Thomas Highly Available Postgres.

– Of course I’ll be in Peter Don’t-even-try-to-pronounce-it Geoghegan’s UPSERT use cases.

– I’ve caught Magnus Hagander’s Tardis talk at NYC earlier this year. If you haven’t had the chance, you should check it out!

– And last, Phil Vacca’s Text Search.

We will be having a lightning talk session, as in previous years. Signups will open soon; watch the conference twitter stream for announcements.

It’s not too late to register, but our hotel room rate and reservation block expires on Sept 2, so act fast! If you’re a member of a PUG, ask your group leader for the conference discount code. (If you’re a PUG leader and haven’t received the code, please email the conference committee.

6 July, 2015

More AWS + redis fun

by gorthx

Part 1.

1. I “upgraded” one of my smaller test clusters to a t2.medium, which has better specs than the m1.small it was on previously. It came up in a weird not-accessible state, and when I started troubleshooting it, I noticed backups weren’t configured. Turns out backup and restore is not supported on cache.t1.* and cache.t2.* instance types. Reference: http://docs.aws.amazon.com/AmazonElastiCache/latest/UserGuide/ManagingSnapshots.html (Update: duh, “t” stands for “testing”, as in “don’t use this in production.”)

2. We started getting OOM errors on one of our clusters, another m1.small. The dataset was only 900M, so I was a bit mystified. Apparently, when you configure redis to be persistent (ie you won’t lose your data if it restarts) (at least that’s the way it’s supposed to work, :koff: ) it can actually take up to twice the memory of the dataset.

Reference: http://redis.io/topics/admin, see bold text: “If you are using Redis in a very write-heavy application, while saving an RDB file on disk or rewriting the AOF log Redis may use up to 2 times the memory normally used.”

Familiarize yourself with these:
http://redis.io/topics/memory-optimization
http://docs.aws.amazon.com/AmazonElastiCache/latest/UserGuide/BestPractices.html#BestPractices.BGSAVE

The bad news: INFO (at least on 2.6.13) doesn’t tell you the max memory configured. Nor is that available via describe-cache-cluster or describe-cache-parameters; you have to infer it from the instance class. Kind of a bummer!

Of course, another option is not to store data you like in an in-memory database, but that’s a discussion for another time.

3. Taking a final snapshot for a cluster is now supported!
aws elasticache delete-cache-cluster \
–cache-cluster-id gabrielles-redis \
–final-snapshot-identifier gabrielles-redis-hinky-2015-07-01

25 May, 2015

LFNW Conference Report

by gorthx

At the end of April, Mark Wong & I headed up to Bellingham for LinuxFest. This was my first time at the conference in several years, and though the expo hall was a bit smaller than I remembered, they had 1849 registered attendees! That is a truly excellent number for a community-run conference in a far corner of the US.

Somehow we had three Postgres booths: PgUS, 2nd Quadrant, and SEAPUG. This caused some confusion among the attendees. As in past years, booth visitors ran the gamut between experienced folks, random people who dropped in to see what was going on, and That Guy Who Just Wants to Argue^W Discuss Why MySQL Is Better [tm]. I hadn’t actually seen him in a while, it was good to catch up. :koff:

Talks I attended:
Emily Dunham’s Thinking in git, where I learned about git commit –dry-run and git pull –rebase (which makes it look like you were smart & pulled when you were supposed to.)

Robert Bernier’s Welcome To Total Security gave a fun historical overview of Postgres, and very good coverage of both beginner and more advanced information.

Frances Hocutt’s Why are these people following me. The big lesson for me from this talk was “Just because the abuse isn’t aimed at you, doesn’t mean you don’t suffer from it”, as I am currently dealing with a similar situation. Also: “Assume good faith” doesn’t mean “ignore evidence of bad faith”.

Eric Worden’s start_date, end_date: Calculate! included a pretty appalling example of a temporal table design, and how to fix it. This just convinced me even more that range types are The Right Way to handle these requirements.

On a personal note: I’ve been doing Pg-only confs for the past few years and made a rookie presenter mistake with my talk: I assumed a level of familiarity with Postgres that my audience didn’t have. I’ll need to adjust for that next time.

Follow

Get every new post delivered to your Inbox.