Archive for ‘PostgreSQL’

10 October, 2016

PgOpen 2016 recap

by gorthx

It’s been a few weeks now since we wrapped up the most recent PostgresOpen in Dallas.

Marianne Bellotti’s keynote was a fantastic and very interesting look into the current state of government IT.  I hope we get to see her at more conferences!

Other quick highlights:
– 2ndQ has a running beta of WordPress on Postgres! (Simon’s keynote)
– Early decisions by the Postgres team (e.g. to spend time working on subselects) directly resulted in my choosing Postgres over MySQL a few years later. (From Greg Stark’s “A Look at the Elephant’s Tail”)
– Keith Fiske (“Managing OS Provided PostgreSQL Packages”) gave a rundown of the vagaries of using Pg packages on different OSes, something that has occasionally driven me around the bend.

I got the most bang for my buck from the combination of Magnus’s Haganders “A Look at the Elephant’s Trunk” and Denish Patel’s “Advanced Postgres Monitoring” (slide deck here). Denish has a thorough list of metrics you should collect, and there are some changes coming in 9.6 to pg_stat_activity that may break current monitoring scripts that use it.

(Also: I am VERY EXCITED about pg_stat_progress_vacuum!)

(And I learned that NumPy is pronounced num-pie and not “rhymes with lumpy”.)

Bonus speaker tip: Many years ago I was in a talk in which the presenter asked “Does everybody understand [x]?” When noone responded, he continued “Just in case we have some people here who are too shy to raise their hands, [x] is…” and gave a brief explanation, enough for newer folks to follow along for the rest of the session. This is an excellent inclusion tactic. Scott Meade demonstrated another way to handle this situation: “So we’re at level set, [x] refers to …”

Thanks to our attendees, speakers, and sponsors for making this conference what it is. The committee is meeting this week to make plans for next year.  Watch the conference website and twitter feed for updates.

Tags:
31 August, 2016

My Picks for PgOpen 2016

by gorthx

We’re back in Dallas for our 6th year of PostgresOpen!  This year, we’re out at the Westin Galleria, which I’m told has an ice rink. (Is it too late for someone bring the Slonik costume?)

In addition to choosing the talks for the program, part of my conference committee duties involve being a room host. I won’t get to see everything I want, but here’s five that are at the top of my “wish list”:

  • Denish Patel’s Advanced Postgres monitoring – always looking for ways to do this better
  • *Of course* I will be in Grant McAlister’s RDS talk
  • Shaun Thomas clearly and simply explains even advanced topics, and I usually come out of the room thinking “BOY am I glad I don’t have to deal with THAT situation”. This year is Elephant Herd as a Service
  • Sounds like there are some “interesting” stories behind Ilya Kosmodemiansky’s PostgreSQL Worst Practices
  • And we are really excited to have Alexander Korotkov and Oleg Bartunov join our conference to talk about RUM indexes!

A couple of other sessions I recommend:

Check out Dimitri Fontaine’s You’d Better Have Tested Backups ; it’s pretty hair-raising.

Want to contribute back to Postgres, but don’t know where to start? Review some patches! David Steele will give you some pointers.

We will be having Lightning Talks, as usual. If you want to get the jump on everyone else, you can sign up here.

What’s new this year: instead of an auction, we’re having a golf tournament.  I’ve never golfed before and am looking forward to giving it a try. Come join us, it’ll be a hoot. (There is a dress code so pack accordingly if you think you might want to play.)

See you in Dallas in a few weeks! Stop by the reg desk and say hello.

22 August, 2016

Easy clock dimension table

by gorthx

We’re currently redesigning our data warehouse, and we’re experimenting with a clock (or time, or time-of-day) dimension to represent the time of day, separate from the date dimension we already use. This table should contain a record for each minute of a day, allowing us to easily determine business hours in various timezones, etc etc.

I’m not entirely sure it’s going to work for us just yet; we have a lot of timestamp data, and seems like it’s more trouble than it’s worth1. But it never hurts to proof-of-concept something, right? (Except for all those times the proof-of-concept ends up being the production system…)

Anyway, somebody commented that “you could have the analyst whip up the dataset in just a few hours” … :sideeye: or, I could do it myself in Postgres in just a few minutes:

CREATE TABLE clock_dimension (
 id serial primary key
 , full_timestamp time
 , hour_of_day_24 integer
 , hour_of_day_12 integer
 , am_pm text
 , minute_of_hour integer
 , business_hours_weekday boolean
 , business_hours_weekend boolean
 );

-- fill it with values
INSERT INTO clock_dimension (full_timestamp) VALUES (
generate_series('2016-08-05 00:00'::timestamp, '2016-08-05 23:59'::timestamp, '1 minute')::time
);
UPDATE clock_dimension SET
hour_of_day_24 = date_part('hour', full_timestamp)
, hour_of_day_12 = CASE
WHEN date_part('hour', full_timestamp) > 12
THEN date_part('hour', full_timestamp) - 12
WHEN date_part('hour', full_timestamp) = 0
THEN 12
ELSE date_part('hour', full_timestamp)
END
, am_pm = CASE
WHEN date_part('hour', full_timestamp) >=12
THEN 'pm'
ELSE 'am'
END
, minute_of_hour = date_part('minute', full_timestamp)
, business_hours_weekday = CASE
WHEN full_timestamp BETWEEN '08:00' AND '21:00'
THEN 't'::boolean
ELSE 'f'::boolean
END
, business_hours_weekend = 'f'
;

Sample data:

 id  | full_timestamp | hour_of_day_24 | hour_of_day_12 | am_pm | minute_of_hour | business_hours_weekday | business_hours_weekend 
-----+----------------+----------------+----------------+-------+----------------+------------------------+------------------------
 901 | 15:00:00       |             15 |              3 | pm    |              0 | t                      | f
 902 | 15:01:00       |             15 |              3 | pm    |              1 | t                      | f
 903 | 15:02:00       |             15 |              3 | pm    |              2 | t                      | f
 904 | 15:03:00       |             15 |              3 | pm    |              3 | t                      | f
 905 | 15:04:00       |             15 |              3 | pm    |              4 | t                      | f
 906 | 15:05:00       |             15 |              3 | pm    |              5 | t                      | f
 907 | 15:06:00       |             15 |              3 | pm    |              6 | t                      | f
 908 | 15:07:00       |             15 |              3 | pm    |              7 | t                      | f
 909 | 15:08:00       |             15 |              3 | pm    |              8 | t                      | f
 910 | 15:09:00       |             15 |              3 | pm    |              9 | t                      | f
 911 | 15:10:00       |             15 |              3 | pm    |             10 | t                      | f
 912 | 15:11:00       |             15 |              3 | pm    |             11 | t                      | f
 913 | 15:12:00       |             15 |              3 | pm    |             12 | t                      | f
 914 | 15:13:00       |             15 |              3 | pm    |             13 | t                      | f
 915 | 15:14:00       |             15 |              3 | pm    |             14 | t                      | f
 916 | 15:15:00       |             15 |              3 | pm    |             15 | t                      | f



1 – see also http://www.kimballgroup.com/2004/02/design-tip-51-latest-thinking-on-time-dimension-tables/

27 June, 2016

Slow Query Checklist

by gorthx

Every slow query situation has its own quirks.  Here’s my generic template for troubleshooting them.

First, I start with the holy trinity of IT Troubleshooting Questions:

  1. What are you trying to do?
  • What data do you want?
  • What is the end result you want?
  1. How does your result differ from what you expected?
  • “I thought I could do a 7-way join on unindexed fields in tables with over 3 million records each in less than 2 seconds.  I’m pretty disappointed with Postgres’s performance.”
  1. What changed?
  • Has it always been this slow?
  • Did this happen suddenly, or did it slowly degrade?
  • If suddenly, what changed?  (“This started right after our last deploy.”)

read more »

20 June, 2016

RDS: log vacuum messages

by gorthx

Last year I commented that “autovacuum log messages are still missing in [RDS] 9.4“.

Amazon fixed that for versions 9.4.5 and up with the custom GUC rds.force_autovacuum_logging_level.

I discovered this week that it’s also available on earlier versions of RDS – I have it enabled on a 9.3.10 instance. The default value in the config is one of the debug levels; just change it to ‘log’. No restart required (assuming you already have logging configured… and you do, don’t you?)

Works as expected:

2016-06-16 18:36:41 UTC::@:[31403]:LOG: automatic vacuum of table "my_db.public.my_dimension": index scans: 1
 pages: 0 removed, 29457 remain
 tuples: 3454 removed, 429481 remain, 0 are dead but not yet removable
 buffer usage: 64215 hits, 8056 misses, 22588 dirtied
 avg read rate: 1.018 MB/s, avg write rate: 2.855 MB/s
 system usage: CPU 0.10s/0.88u sec elapsed 61.80 sec

This makes me very happy – no more cron job to monitor vacuums!

3 February, 2016

RDS OS update

by gorthx

I’ve some more info for the AWS RDS OS update I mentioned last week.

The announcement states that this update will take an outage. In my experience, it’ll be the usual failover time for a Multi-AZ instance (a minute or so), and 8-10 minutes for no-Multi-AZ instances. According to my event logs, the entire update process takes 20-25 minutes.

Since I had n > 1 instances to handle, I preferred to review & schedule updates via the cli, because I can then write a quick bash wrapper for each piece & handle them in bulk.

Here’s the command:

aws rds describe-pending-maintenance-actions

To review upcoming maintenance for only specific instances, pass it a comma-separated list of instance names to the filters switch, like so:

aws rds describe-pending-maintenance-actions \
--filters Name=db-instance-id,Values=mydb1,mydb2

Here’s an example of what you’ll get back:

PENDINGMAINTENANCEACTIONS arn:aws:rds:[az]:[acct_no]:db:[dbname]
PENDINGMAINTENANCEACTIONDETAILS system-update 2016-02-26T19:41:22Z 2016-02-26T19:41:22Z Performance improvements and security updates

(The timestamps are in UTC, so you may have to do some math.)

Neither of those timestamps were in my maintenance window, so I dug into the docs  a bit further.

There are three possible dates available:
AutoAppliedAfterDate – AWS will apply the maintenance in the next maint window after this date
ForcedApplyDate – AWS will apply the maintenance at this time, regardless of maintenance window (eeeek!)
CurrentApplyDate – When AWS will apply it, if you’ve opted-in.

Without field headers, I can’t really figure out which two of the possible three dates are displayed. That “ForcedApplyDate” is a bit concerning, so I used the ‘output to JSON’ option to get some more info:

aws rds describe-pending-maintenance-actions \
--filters Name=db-instance-id,Values=mydb1 \
--output json

(excerpt):

"PendingMaintenanceActionDetails": [
  {
    "Action": "system-update",
    "Description": "Performance improvements and security updates",
    "CurrentApplyDate": "2016-02-26T19:41:22Z",
    "AutoAppliedAfterDate": "2016-02-26T19:41:22Z",
    "OptInStatus": ""
  }
],

So that clears that up, a little. Now to get this stuff scheduled!

aws rds apply-pending-maintenance-action \
--resource-identifier arn:aws:rds:[region]:[account]:db:[db-name] \
--apply-action system-update \ 
--opt-in-type next-maintenance

Since I now know which fields I want, I can use ‘output as text’ to verify my scheduling action, filtered with the –query switch:

aws rds describe-pending-maintenance-actions \
--filters Name=db-instance-id,Values=mydb1 \
--output text \
--query 'PendingMaintenanceActions[*].{OptIn:PendingMaintenanceActionDetails[*].OptInStatus,Date:PendingMaintenanceActionDetails[*].CurrentApplyDate}'

Output:

DATE 2016-01-22T17:04:00Z
OPTIN next-maintenance

(If you are wondering “how in Sam Hill did you create that query filter from that JSON output?”, wonder no more: here’s a link to the docs.  Getting the –query filter correct usually takes some back & forth with reviewing the JSON output, so don’t be discouraged if it takes a few tries.)

27 January, 2016

Scale 14x recap

by gorthx

I went to SCALE mainly for PostgreSQL@Scale – Joe Conway’s grown this annual add-on into a two-day, two-track event.

The new venue is a huge improvement, even though the walk from the hotel was a bit longer than I expected. There are a lot more accessible-on-foot food options.

For my session, I gave an updated version of last year’s “RDS Postgres: A Journey down the Amazon”.

If you’re using RDS and aren’t keeping up with your monitoring, please be aware of the current required OS update.  You can view & schedule required updates via the web console; here are the corresponding cli commands:

aws rds describe-pending-maintenance-actions \
--filters Name=db-instance-id,Values=mydb,mydb1,mydb2
aws rds apply-pending-maintenance-action \
--resource-identifier arn:aws:rds:[region]:[account]:db:[db-name] \
--apply-action system-update \ 
--opt-in-type next-maintenance

I find JSON and text output more useful for these types of tasks; I’ll cover that in a later post.

Accessing RDS Pg logs is kind of a sticking point for some people, me included. (I _really_ want an easy way to get them into Splunk.)
Here’s Denish Patel’s work using the rds cli, and a quick bash script I threw together, which uses the aws cli.

Based on the quick headcount I took at the beginning of the session, attendees were about 2/3 devs, 1/3 DBAs. We also had a couple of people from the Amazon RDS team; it’s always nice to know I have backup in the audience! ;)

A couple of folks had questions after my talk and I didn’t catch up with them in the booth. If that is you, feel free to leave a comment here, or email me.

Other talks I attended:
Jim Mlodgenski’s Debugging PL/PgSQL contained some good tips on use of RAISE NOTICE and what not to do. I installed pl_profiler and started using it the next afternoon.

Xof’s JSON Home Improvement included advice on when to use JSON vs JSONB, or maybe even neither :) If you’ve been reading my blog for any length of time, you know that these kinds of discussions just end in me making plans for more benchmarking…

Peter Geoghegan’s UPSERT use cases  covered a good range of material from beginning to “We can do WHAT now?” and made me realize I can always improve my SQL skills.

I’ve been hearing about ToroDB  but hadn’t attended one of Alvaro’s talks yet. Very basically, it’s a Pg-backed Mongo interface. They take the json data and explode it out into sub-documents. Then there’s a table that stores the structure _in json_. It both blew my mind & had me smacking my forehead “of course!”

It is in alpha, so this applies.

Slides for talks are (or will be) on the Pg wiki.  We do rely on speakers to load their own slides, so if you are after something in particular, please ask & I will remind them. :)

I’m already looking forward to next year, and hope I get to go ice skating then!

20 January, 2016

Postgres FDW, one year later

by gorthx

I’ve been using Postgres’ foreign data wrappers for about a year now.

Stuff I’ve learned/done:
– Those of you on 9.5 can use the new IMPORT FOREIGN SCHEMA. If you’re not quite there yet, Leo and Regina’s function to script table creation as been saving me a ton of time.

I modified it slightly (gist here):
– I name my foreign servers after their source database, and organize the foreign tables into schemas named for the source db/foreign server, so I tweaked the function to handle that
– added a parameter to specify updateable true/false, since I explicitly set that for each table
– changed the output to my preferred formatting.

Updating a server definition is much easier than I expected:

BEGIN;
ALTER SERVER my_fdw_server
 OPTIONS (
SET dbname 'new_db_name'
, SET host 'new_host_name'
);

You can do the same thing with the table definitions;  no need to drop & re-create them.

– Handling a serial field on the source table is interesting.
Say I have a table that looks like this:

CREATE TABLE table1 (
id serial primary key
, name text
, color text
);

I create a foreign table on that, including all fields (exercise left to the reader). Then try to update it as I normally would, allowing Pg to handle the incrementing of the id field:

INSERT INTO table1 (name, color)
VALUES ('bob', 'blue');
ERROR: null value in column "id" violates not-null constraint

I don’t want to pass in a value for id, because that negates the entire purpose of have a serial datatype.

My solution was to leave the id field out of the foreign table definition.

Watch your datatypes, as CREATE FOREIGN TABLE doesn’t do any verification. “Fun” things can happen when you define your foreign table with timestamp without tz (to match the source table), then finally get approval to correct that on the source table, and don’t also change it in your foreign table definition.

Tags: ,
28 September, 2015

PgOpen recap

by gorthx

We’ve wrapped up another PostgresOpen! The official numbers aren’t in yet, but we were up a significant percentage from last year. We had so many last-minute registrants and walk-ups that we ran out of extra badges.

Even better, I got to attend most of the talks on my list.

I can’t pick a favorite, but judging by the amount of notes I took, I learned the most in Joe Celko’s talk on measurement scales. The concepts and nomenclature were vaguely familiar from my long-ago education in the sciences, and it was interesting to tie them in with what I’m doing now.

Grant gave an excellent introduction to RDS Pg. I even learned some things. If you’re new to RDS Postgres, you should take this tutorial next time it’s offered. I continue to be impressed with the RDS team members; not only are they a whip-smart group, they’re also very personable and kind.

I learned about \ef (allows you to edit functions in place) in Keith Fiske’s Don’t Forget the Elephant, and was also reminded of how far Postgres has come, even just since 9.0. So many useful features!

Stella’s SQL Guru talk gave me some good pointers on window functions.

Slides for most of the talks are available on the wiki, and videos are on the way.

Between the social events and prepping for the Tour des Fleurs, I did not have time to see much of Dallas. I’ll likely run the TDF again next year, if the conf coincides with it. LMK if you’d like to join. The course is mostly flat; there’s one small hill.

Great to see everyone and hope to see you again next year!

30 August, 2015

My “Must-see” List for PgOpen

by gorthx

PostgresOpen is coming up in just a few weeks! (Disclosure: I am on the conference committee.) We are still working out the room assignments; your conference badge/printed program will have the right info.

The committee is very excited to have snagged Lacey Williams Henschel to give our keynote, on Open Source & Higher Ed. I’m looking forward to meeting her! (The real reason I’m at the reg desk is to try to meet everyone ;) )

As in previous years, four tutorials are scheduled for the day before the conference proper. As anyone who’s talked to me in the past year could guess, I’m most excited about Grant McAlister’s RDS tutorial. There’s always more to learn about this quickly maturing platform.

My picks for the rest of the conference are as follows:
Thursday sessions:
– Kevin Kempter’s Monitoring Postgres

– Joe Conway’s R talk

– Sarah Conway’s security talk was SRO last year; this year she’s back to focus on security in RDS. (In the same time slot, Jim Mlodgenski’s FDW talk is a good intro to that subject.)

– I’ve been using SQL for a while now :koff: but I’m still not that great at some things like window functions, so I’ll be in Stella Nisenbaum’s SQL Guru talk.

– Then I’ll finish out my day with David Steele’s Audit Logging.

Friday sessions:
– I’ve been enjoying the “History of Postgres”-type talks we’ve had at a few confs lately, and am looking forward to Keith Fiske’s version, Don’t Forget the Elephant.

– Shaun Thomas Highly Available Postgres.

– Of course I’ll be in Peter Don’t-even-try-to-pronounce-it Geoghegan’s UPSERT use cases.

– I’ve caught Magnus Hagander’s Tardis talk at NYC earlier this year. If you haven’t had the chance, you should check it out!

– And last, Phil Vacca’s Text Search.

We will be having a lightning talk session, as in previous years. Signups will open soon; watch the conference twitter stream for announcements.

It’s not too late to register, but our hotel room rate and reservation block expires on Sept 2, so act fast! If you’re a member of a PUG, ask your group leader for the conference discount code. (If you’re a PUG leader and haven’t received the code, please email the conference committee.