10 October, 2016

PgOpen 2016 recap

by gorthx

It’s been a few weeks now since we wrapped up the most recent PostgresOpen in Dallas.

Marianne Bellotti’s keynote was a fantastic and very interesting look into the current state of government IT.  I hope we get to see her at more conferences!

Other quick highlights:
– 2ndQ has a running beta of WordPress on Postgres! (Simon’s keynote)
– Early decisions by the Postgres team (e.g. to spend time working on subselects) directly resulted in my choosing Postgres over MySQL a few years later. (From Greg Stark’s “A Look at the Elephant’s Tail”)
– Keith Fiske (“Managing OS Provided PostgreSQL Packages”) gave a rundown of the vagaries of using Pg packages on different OSes, something that has occasionally driven me around the bend.

I got the most bang for my buck from the combination of Magnus’s Haganders “A Look at the Elephant’s Trunk” and Denish Patel’s “Advanced Postgres Monitoring” (slide deck here). Denish has a thorough list of metrics you should collect, and there are some changes coming in 9.6 to pg_stat_activity that may break current monitoring scripts that use it.

(Also: I am VERY EXCITED about pg_stat_progress_vacuum!)

(And I learned that NumPy is pronounced num-pie and not “rhymes with lumpy”.)

Bonus speaker tip: Many years ago I was in a talk in which the presenter asked “Does everybody understand [x]?” When noone responded, he continued “Just in case we have some people here who are too shy to raise their hands, [x] is…” and gave a brief explanation, enough for newer folks to follow along for the rest of the session. This is an excellent inclusion tactic. Scott Meade demonstrated another way to handle this situation: “So we’re at level set, [x] refers to …”

Thanks to our attendees, speakers, and sponsors for making this conference what it is. The committee is meeting this week to make plans for next year.  Watch the conference website and twitter feed for updates.

31 August, 2016

My Picks for PgOpen 2016

by gorthx

We’re back in Dallas for our 6th year of PostgresOpen!  This year, we’re out at the Westin Galleria, which I’m told has an ice rink. (Is it too late for someone bring the Slonik costume?)

In addition to choosing the talks for the program, part of my conference committee duties involve being a room host. I won’t get to see everything I want, but here’s five that are at the top of my “wish list”:

  • Denish Patel’s Advanced Postgres monitoring – always looking for ways to do this better
  • *Of course* I will be in Grant McAlister’s RDS talk
  • Shaun Thomas clearly and simply explains even advanced topics, and I usually come out of the room thinking “BOY am I glad I don’t have to deal with THAT situation”. This year is Elephant Herd as a Service
  • Sounds like there are some “interesting” stories behind Ilya Kosmodemiansky’s PostgreSQL Worst Practices
  • And we are really excited to have Alexander Korotkov and Oleg Bartunov join our conference to talk about RUM indexes!

A couple of other sessions I recommend:

Check out Dimitri Fontaine’s You’d Better Have Tested Backups ; it’s pretty hair-raising.

Want to contribute back to Postgres, but don’t know where to start? Review some patches! David Steele will give you some pointers.

We will be having Lightning Talks, as usual. If you want to get the jump on everyone else, you can sign up here.

What’s new this year: instead of an auction, we’re having a golf tournament.  I’ve never golfed before and am looking forward to giving it a try. Come join us, it’ll be a hoot. (There is a dress code so pack accordingly if you think you might want to play.)

See you in Dallas in a few weeks! Stop by the reg desk and say hello.

22 August, 2016

Easy clock dimension table

by gorthx

We’re currently redesigning our data warehouse, and we’re experimenting with a clock (or time, or time-of-day) dimension to represent the time of day, separate from the date dimension we already use. This table should contain a record for each minute of a day, allowing us to easily determine business hours in various timezones, etc etc.

I’m not entirely sure it’s going to work for us just yet; we have a lot of timestamp data, and seems like it’s more trouble than it’s worth1. But it never hurts to proof-of-concept something, right? (Except for all those times the proof-of-concept ends up being the production system…)

Anyway, somebody commented that “you could have the analyst whip up the dataset in just a few hours” … :sideeye: or, I could do it myself in Postgres in just a few minutes:

CREATE TABLE clock_dimension (
 id serial primary key
 , full_timestamp time
 , hour_of_day_24 integer
 , hour_of_day_12 integer
 , am_pm text
 , minute_of_hour integer
 , business_hours_weekday boolean
 , business_hours_weekend boolean

-- fill it with values
INSERT INTO clock_dimension (full_timestamp) VALUES (
generate_series('2016-08-05 00:00'::timestamp, '2016-08-05 23:59'::timestamp, '1 minute')::time
UPDATE clock_dimension SET
hour_of_day_24 = date_part('hour', full_timestamp)
, hour_of_day_12 = CASE
WHEN date_part('hour', full_timestamp) > 12
THEN date_part('hour', full_timestamp) - 12
WHEN date_part('hour', full_timestamp) = 0
ELSE date_part('hour', full_timestamp)
, am_pm = CASE
WHEN date_part('hour', full_timestamp) >=12
THEN 'pm'
ELSE 'am'
, minute_of_hour = date_part('minute', full_timestamp)
, business_hours_weekday = CASE
WHEN full_timestamp BETWEEN '08:00' AND '21:00'
THEN 't'::boolean
ELSE 'f'::boolean
, business_hours_weekend = 'f'

Sample data:

 id  | full_timestamp | hour_of_day_24 | hour_of_day_12 | am_pm | minute_of_hour | business_hours_weekday | business_hours_weekend 
 901 | 15:00:00       |             15 |              3 | pm    |              0 | t                      | f
 902 | 15:01:00       |             15 |              3 | pm    |              1 | t                      | f
 903 | 15:02:00       |             15 |              3 | pm    |              2 | t                      | f
 904 | 15:03:00       |             15 |              3 | pm    |              3 | t                      | f
 905 | 15:04:00       |             15 |              3 | pm    |              4 | t                      | f
 906 | 15:05:00       |             15 |              3 | pm    |              5 | t                      | f
 907 | 15:06:00       |             15 |              3 | pm    |              6 | t                      | f
 908 | 15:07:00       |             15 |              3 | pm    |              7 | t                      | f
 909 | 15:08:00       |             15 |              3 | pm    |              8 | t                      | f
 910 | 15:09:00       |             15 |              3 | pm    |              9 | t                      | f
 911 | 15:10:00       |             15 |              3 | pm    |             10 | t                      | f
 912 | 15:11:00       |             15 |              3 | pm    |             11 | t                      | f
 913 | 15:12:00       |             15 |              3 | pm    |             12 | t                      | f
 914 | 15:13:00       |             15 |              3 | pm    |             13 | t                      | f
 915 | 15:14:00       |             15 |              3 | pm    |             14 | t                      | f
 916 | 15:15:00       |             15 |              3 | pm    |             15 | t                      | f

1 – see also http://www.kimballgroup.com/2004/02/design-tip-51-latest-thinking-on-time-dimension-tables/

1 August, 2016

Upgrade Ubuntu 15.04 to 16.04

by gorthx

I use Ubuntu on my home machines, and tend to rely on the GUI Software Updater for upgrades & such.  Life happens and I left one of them on 15.04 past EOL, and the Software Updater didn’t provide an upgrade path directly to 16.04.  I had to upgrade to 15.10 first, which is also EOL.

Here’s what I pieced together from various forums around the net.  (I took this after-the-fact from my command-line history;  I’ll take better notes next time.)

apt-get update
apt-get upgrade # make sure system is as up-to-date as possible
apt-get autoremove # clean up some leftover crap

apt-get dist-upgrade # recommended next step on one of the forums,
                     # but didn't actually do anything
                     # and I probably could have skipped this

do-release-upgrade -d # supposely this should have worked without the -d
                      # but it didn't
                      # -d means "latest development release"

And booyah, I’m on 15.10 and could continue on to 16.04.


27 June, 2016

Slow Query Checklist

by gorthx

Every slow query situation has its own quirks.  Here’s my generic template for troubleshooting them.

First, I start with the holy trinity of IT Troubleshooting Questions:

  1. What are you trying to do?
  • What data do you want?
  • What is the end result you want?
  1. How does your result differ from what you expected?
  • “I thought I could do a 7-way join on unindexed fields in tables with over 3 million records each in less than 2 seconds.  I’m pretty disappointed with Postgres’s performance.”
  1. What changed?
  • Has it always been this slow?
  • Did this happen suddenly, or did it slowly degrade?
  • If suddenly, what changed?  (“This started right after our last deploy.”)

Continue reading

20 June, 2016

RDS: log vacuum messages

by gorthx

Last year I commented that “autovacuum log messages are still missing in [RDS] 9.4“.

Amazon fixed that for versions 9.4.5 and up with the custom GUC rds.force_autovacuum_logging_level.

I discovered this week that it’s also available on earlier versions of RDS – I have it enabled on a 9.3.10 instance. The default value in the config is one of the debug levels; just change it to ‘log’. No restart required (assuming you already have logging configured… and you do, don’t you?)

Works as expected:

2016-06-16 18:36:41 UTC::@:[31403]:LOG: automatic vacuum of table "my_db.public.my_dimension": index scans: 1
 pages: 0 removed, 29457 remain
 tuples: 3454 removed, 429481 remain, 0 are dead but not yet removable
 buffer usage: 64215 hits, 8056 misses, 22588 dirtied
 avg read rate: 1.018 MB/s, avg write rate: 2.855 MB/s
 system usage: CPU 0.10s/0.88u sec elapsed 61.80 sec

This makes me very happy – no more cron job to monitor vacuums!

30 March, 2016

This month’s RDS tip, a pgloader exercise, and bonus recruiter spam

by gorthx

1. Several months back, AWS re-worked the web console. You can get sparkline-style graphs for CPU usage, memory, storage, and database connections at the top level, instead of having to drill down to the Cloudwatch metrics for the instance.

I find this really handy – but for one quirk with the graphs:

There’s white space to the right of the red line. Therefore, I was interpreting the red line to mean “trouble is brewing & I should do something about this”.

Turns out that red line is the limit; there’s nowhere else to go. Whoops!

2. This week I finally had a reason to try out pgloader: One of my analysts needed some help loading some ugly fixed-width data1.

Installing was super-easy on my mac (`brew install pgloader`).  I worked through the examples before starting to work with my actual dataset2.

The data to be loaded came bundled as several text files: the data, plus three or four additional files describing the layout.  I wrote a truly glorious string of cut, sed, paste, and awk to create a pgloader control file that would work. And it did!


field1 | field2 | field3 
 [null] | D | IMITRI
 [null] | G | ABRIELLE
 [null] | M | ARK
 [null] | S | ELENA

Uh, what?

The data descriptions had character counts starting at 1, and pgloader expects them to start at 0 (as they should).  (For extra fun, the first column in all records of this dataset was nothing but spaces.)

3. This week’s hilarious recruiter spam:
“Hey there Gabrielle! I was doing my homework on sites like Meetup, GitHub, etc. and I noticed your Java skills.”

I’m pretty sure you didn’t.

1 – Is that redundant?
2 – 600 columns of fixed-width data?  Who does that?!
3 – Why couldn’t it all be in one file? Again: who does this?!4
4 – People who hate DBAs, that’s who.

3 February, 2016

RDS OS update

by gorthx

I’ve some more info for the AWS RDS OS update I mentioned last week.

The announcement states that this update will take an outage. In my experience, it’ll be the usual failover time for a Multi-AZ instance (a minute or so), and 8-10 minutes for no-Multi-AZ instances. According to my event logs, the entire update process takes 20-25 minutes.

Since I had n > 1 instances to handle, I preferred to review & schedule updates via the cli, because I can then write a quick bash wrapper for each piece & handle them in bulk.

Here’s the command:

aws rds describe-pending-maintenance-actions

To review upcoming maintenance for only specific instances, pass it a comma-separated list of instance names to the filters switch, like so:

aws rds describe-pending-maintenance-actions \
--filters Name=db-instance-id,Values=mydb1,mydb2

Here’s an example of what you’ll get back:

PENDINGMAINTENANCEACTIONS arn:aws:rds:[az]:[acct_no]:db:[dbname]
PENDINGMAINTENANCEACTIONDETAILS system-update 2016-02-26T19:41:22Z 2016-02-26T19:41:22Z Performance improvements and security updates

(The timestamps are in UTC, so you may have to do some math.)

Neither of those timestamps were in my maintenance window, so I dug into the docs  a bit further.

There are three possible dates available:
AutoAppliedAfterDate – AWS will apply the maintenance in the next maint window after this date
ForcedApplyDate – AWS will apply the maintenance at this time, regardless of maintenance window (eeeek!)
CurrentApplyDate – When AWS will apply it, if you’ve opted-in.

Without field headers, I can’t really figure out which two of the possible three dates are displayed. That “ForcedApplyDate” is a bit concerning, so I used the ‘output to JSON’ option to get some more info:

aws rds describe-pending-maintenance-actions \
--filters Name=db-instance-id,Values=mydb1 \
--output json


"PendingMaintenanceActionDetails": [
    "Action": "system-update",
    "Description": "Performance improvements and security updates",
    "CurrentApplyDate": "2016-02-26T19:41:22Z",
    "AutoAppliedAfterDate": "2016-02-26T19:41:22Z",
    "OptInStatus": ""

So that clears that up, a little. Now to get this stuff scheduled!

aws rds apply-pending-maintenance-action \
--resource-identifier arn:aws:rds:[region]:[account]:db:[db-name] \
--apply-action system-update \ 
--opt-in-type next-maintenance

Since I now know which fields I want, I can use ‘output as text’ to verify my scheduling action, filtered with the –query switch:

aws rds describe-pending-maintenance-actions \
--filters Name=db-instance-id,Values=mydb1 \
--output text \
--query 'PendingMaintenanceActions[*].{OptIn:PendingMaintenanceActionDetails[*].OptInStatus,Date:PendingMaintenanceActionDetails[*].CurrentApplyDate}'


DATE 2016-01-22T17:04:00Z
OPTIN next-maintenance

(If you are wondering “how in Sam Hill did you create that query filter from that JSON output?”, wonder no more: here’s a link to the docs.  Getting the –query filter correct usually takes some back & forth with reviewing the JSON output, so don’t be discouraged if it takes a few tries.)

27 January, 2016

Scale 14x recap

by gorthx

I went to SCALE mainly for PostgreSQL@Scale – Joe Conway’s grown this annual add-on into a two-day, two-track event.

The new venue is a huge improvement, even though the walk from the hotel was a bit longer than I expected. There are a lot more accessible-on-foot food options.

For my session, I gave an updated version of last year’s “RDS Postgres: A Journey down the Amazon”.

If you’re using RDS and aren’t keeping up with your monitoring, please be aware of the current required OS update.  You can view & schedule required updates via the web console; here are the corresponding cli commands:

aws rds describe-pending-maintenance-actions \
--filters Name=db-instance-id,Values=mydb,mydb1,mydb2
aws rds apply-pending-maintenance-action \
--resource-identifier arn:aws:rds:[region]:[account]:db:[db-name] \
--apply-action system-update \ 
--opt-in-type next-maintenance

I find JSON and text output more useful for these types of tasks; I’ll cover that in a later post.

Accessing RDS Pg logs is kind of a sticking point for some people, me included. (I _really_ want an easy way to get them into Splunk.)
Here’s Denish Patel’s work using the rds cli, and a quick bash script I threw together, which uses the aws cli.

Based on the quick headcount I took at the beginning of the session, attendees were about 2/3 devs, 1/3 DBAs. We also had a couple of people from the Amazon RDS team; it’s always nice to know I have backup in the audience! ;)

A couple of folks had questions after my talk and I didn’t catch up with them in the booth. If that is you, feel free to leave a comment here, or email me.

Other talks I attended:
Jim Mlodgenski’s Debugging PL/PgSQL contained some good tips on use of RAISE NOTICE and what not to do. I installed pl_profiler and started using it the next afternoon.

Xof’s JSON Home Improvement included advice on when to use JSON vs JSONB, or maybe even neither :) If you’ve been reading my blog for any length of time, you know that these kinds of discussions just end in me making plans for more benchmarking…

Peter Geoghegan’s UPSERT use cases  covered a good range of material from beginning to “We can do WHAT now?” and made me realize I can always improve my SQL skills.

I’ve been hearing about ToroDB  but hadn’t attended one of Alvaro’s talks yet. Very basically, it’s a Pg-backed Mongo interface. They take the json data and explode it out into sub-documents. Then there’s a table that stores the structure _in json_. It both blew my mind & had me smacking my forehead “of course!”

It is in alpha, so this applies.

Slides for talks are (or will be) on the Pg wiki.  We do rely on speakers to load their own slides, so if you are after something in particular, please ask & I will remind them. :)

I’m already looking forward to next year, and hope I get to go ice skating then!

20 January, 2016

Postgres FDW, one year later

by gorthx

I’ve been using Postgres’ foreign data wrappers for about a year now.

Stuff I’ve learned/done:
– Those of you on 9.5 can use the new IMPORT FOREIGN SCHEMA. If you’re not quite there yet, Leo and Regina’s function to script table creation as been saving me a ton of time.

I modified it slightly (gist here):
– I name my foreign servers after their source database, and organize the foreign tables into schemas named for the source db/foreign server, so I tweaked the function to handle that
– added a parameter to specify updateable true/false, since I explicitly set that for each table
– changed the output to my preferred formatting.

Updating a server definition is much easier than I expected:

ALTER SERVER my_fdw_server
SET dbname 'new_db_name'
, SET host 'new_host_name'

You can do the same thing with the table definitions;  no need to drop & re-create them.

– Handling a serial field on the source table is interesting.
Say I have a table that looks like this:

id serial primary key
, name text
, color text

I create a foreign table on that, including all fields (exercise left to the reader). Then try to update it as I normally would, allowing Pg to handle the incrementing of the id field:

INSERT INTO table1 (name, color)
VALUES ('bob', 'blue');
ERROR: null value in column "id" violates not-null constraint

I don’t want to pass in a value for id, because that negates the entire purpose of have a serial datatype.

My solution was to leave the id field out of the foreign table definition.

Watch your datatypes, as CREATE FOREIGN TABLE doesn’t do any verification. “Fun” things can happen when you define your foreign table with timestamp without tz (to match the source table), then finally get approval to correct that on the source table, and don’t also change it in your foreign table definition.

Tags: ,