Posts tagged ‘PostgreSQL’

4 May, 2015

SCALE 13x and “You are not the database Superuser”

by gorthx

Another followup from my talk at SCALE 13x.

This isn’t to pick on RDS in particular; you’ll likely have these or similar issues with other “database as managed service” options.

Specific problems you should be aware of:

1a – As I discussed in my post about upgrading an RDS db to Pg 9.4, you can’t pg_dumpall; access to roles & credentials is restricted.
1b – … which means you can’t extract roles & credentials. (I’m told the RDS devs are working on a way to handle this.)

2 – Lots of in pg_stat_activity. You can’t see what the actual superuser is doing, of course; so far this is an annoyance more than a true problem, because I wouldn’t be able to smash a query the actual superuser is running, anyway.

3 – You can’t VACUUM a database, only individual tables.

4 – Oops: “FATAL: remaining connection slots are reserved for non-replication superuser connections”. This does not include you :), so you are effectively locked out of your instance. You can try calling Amazon support for this; my experience with this was that even a request for information about existing connections had to be escalated, and I was told they wouldn’t terminate connections for me. So you’re looking at killing them from the app side, or restarting the instance (with a higher max_connections, perhaps). This makes troubleshooting this issue particularly challenging.

5 – If you’re fond of using SET ROLE [rolename] to troubleshoot, you have to do the extra step of adding yourself to the role:
gabrielle=> SET ROLE db_rw;
ERROR: permission denied to set role "db_rw"
gabrielle=> GRANT db_rw TO gabrielle ;
gabrielle=> SET ROLE db_rw;

23 March, 2015

Upgrading an existing RDS database to Postgres 9.4

by gorthx

Last Thursday, I had this short and one-sided conversation with myself:
“Oh, cool, Pg 9.4 is out for RDS. I’ll upgrade my new database before I have to put it into production next week, because who knows when else I’ll get a chance. Even though I can’t use pg_dumpall, this will take me what, 20 minutes, tops.”

Here is the process I came up with. It did take a bit longer than 20 minutes & I hope these notes save someone else some time. Instance info: t2.small, 10G GP SSD attached storage.

If you have a lot of instances you want to upgrade, you could script this entire thing with the API + psql.

– If you’re using a non-default parameter group, create an equivalent for 9.4. You can’t use 9.3 parameter groups with 9.4 instances.
– Launch a new 9.4 instance (e.g. gabs-db-94) with the same specs as the original, but with the new parameter group.
– Get role information from your existing 9.3 instance. (See “problem areas”, below.)
– Create roles in the default database. In postgres, roles are available to all dbs in the cluster, so when you load your dumps in, all the table permissions should get set correctly if the roles already exist.

Update: Make sure you use the 9.4 pg_dump. This may have contributed to my #3 ‘problem area’ below.
– For each database:

pg_dump -Fc -v -h [endpoint of 9.3 instance] -U [master username] [database] > [database].dump
pg_restore -C -v -h [endpoint of 9.4 instance] -U [master username] -d [master database] [database].dump

– Rename 9.3 instance to e.g. gabs-db-93 (no periods!)
– Rename gabs-db-94 to gabs-db
– Don’t destroy the 9.3 instance for at least another week after you’re sure the new one is working. And/or when you do, don’t skip the ‘create final snapshot’ step.

Et voila.

Three specific RDS-related problem areas:
1. New (?) password requirements:
This may not be a 9.4 thing, but sometime in the past month or so, Amazon restricted which special characters are allowed in the master password. If you are using non-alphanumerics in your master passwords, try them out ahead of time.

2. Roles:
Without pg_dumpall, you don’t get role information in your database dump. (Or tablespaces, though I believe you can’t use those on RDS anyway.) This includes things like search_path, log settings specific for that user, etc. (Is there anything else that comes with pg_dumpall and not pg_dump? I don’t know.)

Try this query:

SELECT r.rolname, d.datname, rs.setconfig
FROM pg_db_role_setting rs
LEFT JOIN pg_roles r ON r.oid = rs.setrole
LEFT JOIN pg_database d ON d.oid = rs.setdatabase
WHERE r.rolname NOT LIKE 'rds%'

That will show you anything specifically set for a role. (Credit here:

Roles that don’t have non-default options won’t be in that result set, so make sure you check pg_roles or \du as well. If you script this, filter rds% out of your result set.

Passwords are not included. I don’t have a workaround for that, other than know all the passwords, or reset them all. I’m sure both options are equally popular.

3. Extensions:
This is the really strange one. After the upgrade, some extensions worked, some didn’t. I had to DROP and re-CREATE tablefunc and pg_trgm, both of which worked in my 9.3 instance.

Additional notes:

You’ll have a (short) outage during the time you’re swapping the 9.4 instance in by renaming it. I don’t see a way around this yet.

autovacuum log messages are still missing in 9.4; log_lock_waits works now.

For your amusement, the stuff I messed up:
1. First, I decided using the web console would be a faster way to create the new instance, instead of the cli (which I’m more familiar with.) When I use the cli, I explicitly set all the options available for a new instance. I was careless on the web console and accepted some dumb defaults and had to start over…twice. Lesson: go with what you know.

2. My database has multiple schemas. I missed transferring the search_path for my main ‘worker’ role, and basically broke half of my ETL jobs. This was actually a positive thing, because I was working toward explicitly schema-qualifying all SQL, and I found every place I was not doing that. :koff:

3. After the upgrade & changes, I wanted to dump the schemas out to version control. “Aborting due to server mismatch.” D’oh! Of course I didn’t have 9.4 installed locally. That opened up a whole ‘nother can of worms with trying to upgrade pg on my mac, but I’ll save that for over a beer. Sheesh.

9 March, 2015

PDXPUG Lab Recap: postgres_fdw

by gorthx

Back in January, PDXPUG had a lab night to try out the postgres_fdw.

Our labs are even more casual than our meetings: I don’t set an agenda, I invite the attendees to choose specific questions or topics they want to investigate. Here’s what we came up with for our FDW lab:

– What is it
– Use cases; who has one?
– Best practices?
– Security concerns?
– Performance concerns? (We didn’t get to this.)
– Any diffs between 9.3 and 9.4? (… or this either.)
– Test drive!

Our main focus was the Test Drive. Setup went pretty well for most folks, though one attendee had permissions issues & fell back to file_fdw.

There are four steps (outlined in the docs)
0. Install the extension ;)
1. Create a foreign server
2. Create a user mapping
3. Create your foreign table(s)

Useful commands:
\des to describe foreign servers
\deu to describe user mappings
\det to describe foreign tables

My use case: quick & dirty ETL solution. I pull data from several postgres datasources1 into a staging database (for data conformance and other tasks), and then push it out to a data mart.

To keep it all reasonably organized, I put the foreign tables for each data source in a schema named for that datasource, e.g. the data from HR goes to schema ‘hr’, data from Accounting goes to ‘accounting’, etc. (This has tripped me up a couple of times, which I’ll cover in a later post.)

Here’s a quick example: I have a ‘users’ table in the accounting database, in the public schema. I want to put it in the ‘accounting’ schema locally.

dbname ‘accounting’
, host ‘’
, port ‘5432’
, updatable ‘false’

password ‘[password]’
, user ‘[username]’

CREATE FOREIGN TABLE accounting.users (
id integer
, username text
SERVER accounting
schema_name ‘public’
, table_name ‘users’
, updatable ‘false’

Note that you don’t get an error if the schema & table you specify doesn’t exist in the remote database, so check your work. I just run a quick SELECT … LIMIT 1 on the foreign table. This will also help pick up permissions problems.

According to the docs, the default for updatable is ‘true’, and a table setting overrides a server setting. I originally intepreted that to mean that a table’s default setting of true would override a server’s explicit setting of false, which seemed odd, and my tests don’t bear that out. (Nice to see that Pg once again does the sensible thing.) Regardless, I explicitly set this value at both the server and table level, enabling it only on tables I want to update. Of course, you should also set appropriate permissions on the remote table itself.

If you have a lot of tables you need to create fdw for, script the process! This looks like a good option:

You can ALTER FOREIGN TABLE for the field names as you would a regular table, but I haven’t figured out how/if you can update the options (like, if somebody changes the target hostname or dbname2).

The credentials associated with user mappings are pg_dumped as cleartext. Ideally, I guess you’d use .pgpass to manage the passwords for you, but I don’t have that option on RDS. To avoid accidentally committing schema dumps that include credentials to my git repo, I added this to my pre-commit hook:

sed -i .bak "s/password '.*',/password '[pwd]',/" schema_dump.sql

(The .bak is required on OSX.)

As usual, I picked up some additional “off-topic” info: check out SELECT * [sequence] or \d [sequence]!

Thanks to Matt S for hanging out afterwards and introducing me to the differences between Oracle’s and Pg’s query planners.

1 – and a few others that are being a pain in my ass :koff:salesforce:koff:
2 – of course, why worry about that, because that NEVER happens, right?

Tags: ,
12 January, 2015

No more cursing* with ddlgenerator

by gorthx

Do you have to load data from funky files into postgres frequently? I do. I get a few requests per week of the “hey, can you give me a mass data extract for things that match the criteria in this spreadsheet” variety. Frequently, these are windows-formatted all-in-one-line “csv” files.

A year or so ago, I would have reformatted the file, reviewed the data, figured out appropriate data types for the various fields (and sometimes there are dozens of fields), written some SQL to create the table, tried to load the data, cursed a bit, lather, rinse, repeat.

I heard about Catherine’s ddl generator at last year’s PgOpen, and these days, all I do is this:

ddlgenerator --inserts postgres [datafile.csv] > datafile.sql

…then log into my database, and run:

\i datafile.sql

et voila.

As I said on twitter, it’s not just the time but the annoyance this tools saves me that I really value.

Try it out!

* at least, not about this.

Tags: , ,
15 December, 2014

Simple test for autovacuum log messages

by gorthx

I had reason recently to suspect that autovacuum jobs weren’t being properly logged on my RDS instance. Instead of compulsively re-running “SELECT relname, last_autovacuum FROM pg_stat_user_tables” while I waited for one to happen, I set up this quick test:

connect to database and do the following:

-- verify settings
-- log_autovacuum_min_duration should be 0 ("log all") for the purposes of this test
SHOW log_autovacuum_min_duration;
-- vac messages are LOG level messages, so the default value 'warning' should suffice here
SHOW log_min_messages;

-- set up a table with a really low vac threshold
(id serial primary key
, name text)
WITH (autovacuum_vacuum_threshold = 2, autovacuum_vacuum_scale_factor = 0)

-- add some data
INSERT INTO vac_test(name)

-- check the stats/vac jobs
SELECT * FROM pg_stat_user_tables WHERE relname = 'vac_test';

-- remove enough data to trigger an autovac
DELETE FROM vac_test WHERE id < 4;

-- check stats/vac jobs again
SELECT * FROM pg_stat_user_tables WHERE relname = 'vac_test';
-- assumes 9.3+
-- wait until you see an autovacuum job in the table
-- it'll help if you have autovacuum_naptime set to something short

Then go check the logs. You should have a helpful message about an automatic vacuum:

2014-12-07 21:01:32 PST LOG: automatic vacuum of table "postgres.public.vac_test": index scans: 1
pages: 0 removed, 1 remain
tuples: 3 removed, 6 remain
buffer usage: 60 hits, 4 misses, 4 dirtied
avg read rate: 33.422 MB/s, avg write rate: 33.422 MB/s
system usage: CPU 0.00s/0.00u sec elapsed 0.00 sec

If you don’t, well, that’s the $64,000 question, isn’t it.

8 December, 2014

PDXPUG lab report – BDR

by gorthx

For the last PDXPUG lab of the year, we tried out BiDirectional Replication (BDR). Five of us just set up VMs on our laptops and followed the instructions on the wiki.

We only had about 90 minutes time for this lab, so the goal was to get a basic configuration up & running, understand the available configuration parameters, and then (time permitting) break it – because that’s how you learn to put it back together.

Alexander said it worked very well in his tests; Robert set about breaking it and found an interesting edge case involving updates to primary keys. (Advisable or not, we all have a customer who’s going to do it!)

Maher and I were doing pretty well with our setups until we tried configuring BDR between our two machines. After wrestling with VMWare’s network settings and getting absolutely nowhere, I realized this all felt very familiar … Oh right, CentOS’s pre-configured firewall1. Which does not allow Postgres ports, natch. Once we fixed that, our machines could at last communicate correctly with each other, but we ran out of time before we could get BDR working between them. (Which led to some jokes about “NDR”.)

Craig Ringer posted yesterday about the work that’s gone into this project thus far, and some of the side benefits. BDR is a particularly tricky problem to solve; kudos to the team for all the hard work.

The Quick Start guide is very easy to follow. I’m also very happy with the quality of the log messages available from BDR. I encourage you to check it out for yourself!

1 – Took me a bit of poking around to find it; it was moved from “System Administration” to “Sundry” in CentOS 7.

Tags: ,
3 November, 2014

PgConf.EU recap

by gorthx

I’m safely home from PgConf.EU. Madrid at this time of year was glorious, particularly to this Portlander. (I came home to a steady 12*C and rainy for the next week or … so ;))

We had over 300 attendees, making this the biggest Postgres conference to date, I hear. Of course, I couldn’t get to every talk I wanted (does that ever happen?), but here are some highlights:

Performance Archaeology was a thorough review of how Postgres performance has improved (or not) from version to version. I’m a sucker for benchmarks, and it makes me very happy that Tomas Vondra did this work :)

“Who’s the Fairest of Them All? Pg Interface Performance Comparison” was good from an informational standpoint (ODBC pretty much sucks) but also from a test design standpoint (hey, a valid use case for a Cartesian join!) Most relevant tip for me: complaints about db performance usually turn out to be caused by running queries returning one row at a time, one connection each – and usually from an ORM.

The demo of 3D rendering from Vincent Picavet’s “PostGIS Latest News” looked very promising. There’s a docker container available on his github; make sure you follow the setup instructions. I’m also excited about SP-GiST, spatial GiST indexes, which will provide faster reads and is 3X faster to build. It’s a WIP, and so far it only works on points.

XoF’s talk on “Finding and Repairing Data Corruption” covered some case histories from PgExperts. You all know I like the “war stories”; one thing I like especially about XoF’s talks is he includes “oh yeah, btw, don’t do [x] to try to fix this because you’ll make things worse”. Additional recommendation: disable autovacuum while you’re debugging corruption, because you don’t want it kicking off & changing things.

As usual, Simon and Alvaro packed a ton of info into “Locks Unpicked”. The most immediately useful tip for me was how to avoid the ACCESS EXCLUSIVE lock when adding an FK; do it in two steps: 1. ALTER TABLE [blahdeblah] NOT VALID 2. ALTER TABLE [blahdeblah] VALIDATE CONSTRAINT.

I didn’t get to attend Dimitri’s “You Better Have Tested Backups”, but I was in on a rehearsal. My reaction can be summed up with “You had to do what?!” If this talk didn’t scare you, I don’t know what will.

Craig Ringer’s talk about usability started with a round of “Error Message Jeopardy”, and included a reminder that we were all new once, and have forgotten how much we know. I personally accidentally tried to run psql on a -FC pg_dump just last week, and really appreciate the addition of the HINT message! I also hadn’t heard about the update problems on Yosemite.

Stephen Frost’s “Hacking Postgres” was one of my favorites. We got a tour of the source tree, backend components, and some background about the community coding conventions. (“Programming in Postgres may not always be standard C.”)
General advice:
– check the mailing list for people working on similar problems
– create your patch as a context diff or git –diff
– read your actual patch before you submit it, just in case you did something dorky.

Don’t forget to post your slides & leave conference and speaker feedback.

It was wonderful meeting people I’ve only known from the mailing lists & IRC. (For example.) Thanks very much to the PgConf.EU and SPI for helping me out! I hope to see you next year in Vienna.

6 October, 2014

RDS: Three weeks in

by gorthx

I’ve spent the past few weeks learning my way around Amazon’s RDS offering (specifically Postgres, and a bit of elasticache). It’s a mixed bag so far; for every feature I think “Hey, this is neat!” I find at least one or two others that are not so thrilling.

One of the things that may annoy you if you’re used to running your own Pg server is not having “real” superuser access to your own cluster. There’s an rdsadmin database which you won’t have access to, and a couple of rds users as well. This is part of Amazon’s security implementation to protect all their users from destroying each other.

You need to exclude the rdsadmin database out of any management queries you run, like so:
SELECT datname, pg_size_pretty(pg_database_size(datname))
FROM pg_database
WHERE datname NOT IN ('postgres','template0','template1', 'rdsadmin')
ORDER BY datname;

Otherwise you’ll get a permissions error.

Today, I had this bit of fun:
pg_dumpall --globals-only
ERROR: permission denied for relation pg_authid

So no dumping your roles. (No pg_dumpall, period.) I don’t have high hopes for an alternative.

I also haven’t found a place to report bugs – the “Feedback” button only allows you to contact tech support if you have a contract. So far I’m relying on the “I know someone who knows someone” method and the RDS user forums; AWS team members respond fairly quickly to posts there.

Aside from all that, I’m primarily interested in automating my instance management, so I’ve focused on the CLI tools.

My first thought was “Egads, I can’t get away from Java, can I”, but installing the toolkit turned out to be the easiest part. I set my $JAVA_HOME as outlined here to avoid that annoying “Unable to find $JAVA_HOME” error.

The CLI support is much more extensive than I expected – you can manage pretty much everything, very easily, once you find your way around and learn the appropriate incantation. Which is my biggest complaint: the docs on the web and the cli help don’t always match each other, and sometimes neither are correct. It cost me a significant amount of startup time flipping back and forth between the cli help, the web help, and just flat-out experimenting and cursing that mysterious “Malformed input” error message. (I probably have unrealistically high standards after working with the Pg docs for so many years.)

Fun stuff you can do:
RDS offers “event subscriptions” to help you keep tabs on your instance health (failover, storage, etc)1. They’re pretty easy to configure from the web console, but once you’ve done so, there’s no way to view or edit them except from the CLI. (At least, not that I can find.)

You can grab your Cloudwatch metrics, if you need to “roll your own” and integrate them into an existing monitoring system. (Yes, I briefly considered this!)

Log files are also accessible through the CLI for watching or downloading. It’s a huge improvement on the tiny green-text-on-black background on the web console. There’s no glob expression matching, so you have to request them one at a time. If you request a log file that doesn’t exist, you don’t get an error.

log_line_prefix is one of the unconfigurable GUCs on RDS, so if you are planning to use pgbadger 2, specify Amazon’s format as outlined at the bottom of this page.

1 – Be warned that restoring a snapshot will generate an “instance deleted” alert for your newly-restored instance. Your on-call person may not appreciate this.

2 – Alternatively, the pg_stat_statements extension is available on RDS, so you could get query stats this way.

29 September, 2014

My PgConf.EU Schedule

by gorthx

Yep, I’m headed to Madrid! I’ll be reprising my Autovacuum talk from SCALE, and am really looking forward to meeting some new folks. I’ll be helping out at the conference in some capacity, so come say hello.

For reference, the conference schedule is here:

Other talks I plan to attend:

Performance Archaeology sounds pretty cool!

Joe Conway’s Who’s the Fairest of Them All, since I didn’t get to catch it at PgOpen.

Open Postgres Monitoring

I’m interested in hearing Devrim’s opinions about Pg filesystems. (And pgpool, but that’s a discussion for the pub. ;) )

Next, a case study from Dimitri, this one on backups. (This sounds like one of those talks that will have people muttering “oh crap!!” and running out of the room.)

Dmitri again, on pgloader, because I always have data loading needs.

I want to see all three sessions in the next time slot (Logical decoding, PostGIS, and authentication), so I’ll wait until the day of to make up my mind.

Hmm Bruce’s indexing talk, or Christophe’s on Data Corruption?

I hope I never have to join 1 million tables.

Locks unpicked, Analytical Postgres, and of course the Lightning Talks will finish out the day.

Unit testing with PgTAP

Disaster Planning and Recovery

Logical decoding for auditing

Replication of a single database? Sign me up!

Saturday I plan to do touristy things: check out the park, a museum or two, and hopefully a fabric shop, before my flight out. If anyone has any recs, I’d love to hear them.

22 September, 2014

PgOpen 2014 – quick recap

by gorthx

Many thanks to the speakers, my fellow conference committee members, and especially our chair, Kris Pennella, for organizing the best PgOpen yet.

(Speakers: please upload your slides or a link to your slides to the wiki.)

I came back with a big to-do/to-try list: check out Catherine Devlin’s DDL generator, familiarize myself with the FILTER aggregates in 9.4, make a web interface to the PDXPUG talks db (on a tiny little heroku instance), re-do the examples from the PostGIS tutorial, etc. Plus apparently I have a tiny little patch to write (flw). Many thanks to Denish Patel of OmniTI and Will Leinweber of Heroku for the personalized help sessions.

All in all, it was a wonderful conference & I’m looking forward to 2015’s version. If you’re interested in being on next year’s committee, let us know at program2014 at