3 February, 2016

RDS OS update

by gorthx

I’ve some more info for the AWS RDS OS update I mentioned last week.

The announcement states that this update will take an outage. In my experience, it’ll be the usual failover time for a Multi-AZ instance (a minute or so), and 8-10 minutes for no-Multi-AZ instances. According to my event logs, the entire update process takes 20-25 minutes.

Since I had n > 1 instances to handle, I preferred to review & schedule updates via the cli, because I can then write a quick bash wrapper for each piece & handle them in bulk.

Here’s the command:

aws rds describe-pending-maintenance-actions

To review upcoming maintenance for only specific instances, pass it a comma-separated list of instance names to the filters switch, like so:

aws rds describe-pending-maintenance-actions \
--filters Name=db-instance-id,Values=mydb1,mydb2

Here’s an example of what you’ll get back:

PENDINGMAINTENANCEACTIONS arn:aws:rds:[az]:[acct_no]:db:[dbname]
PENDINGMAINTENANCEACTIONDETAILS system-update 2016-02-26T19:41:22Z 2016-02-26T19:41:22Z Performance improvements and security updates

(The timestamps are in UTC, so you may have to do some math.)

Neither of those timestamps were in my maintenance window, so I dug into the docs  a bit further.

There are three possible dates available:
AutoAppliedAfterDate – AWS will apply the maintenance in the next maint window after this date
ForcedApplyDate – AWS will apply the maintenance at this time, regardless of maintenance window (eeeek!)
CurrentApplyDate – When AWS will apply it, if you’ve opted-in.

Without field headers, I can’t really figure out which two of the possible three dates are displayed. That “ForcedApplyDate” is a bit concerning, so I used the ‘output to JSON’ option to get some more info:

aws rds describe-pending-maintenance-actions \
--filters Name=db-instance-id,Values=mydb1 \
--output json

(excerpt):

"PendingMaintenanceActionDetails": [
  {
    "Action": "system-update",
    "Description": "Performance improvements and security updates",
    "CurrentApplyDate": "2016-02-26T19:41:22Z",
    "AutoAppliedAfterDate": "2016-02-26T19:41:22Z",
    "OptInStatus": ""
  }
],

So that clears that up, a little. Now to get this stuff scheduled!

aws rds apply-pending-maintenance-action \
--resource-identifier arn:aws:rds:[region]:[account]:db:[db-name] \
--apply-action system-update \ 
--opt-in-type next-maintenance

Since I now know which fields I want, I can use ‘output as text’ to verify my scheduling action, filtered with the –query switch:

aws rds describe-pending-maintenance-actions \
--filters Name=db-instance-id,Values=mydb1 \
--output text \
--query 'PendingMaintenanceActions[*].{OptIn:PendingMaintenanceActionDetails[*].OptInStatus,Date:PendingMaintenanceActionDetails[*].CurrentApplyDate}'

Output:

DATE 2016-01-22T17:04:00Z
OPTIN next-maintenance

(If you are wondering “how in Sam Hill did you create that query filter from that JSON output?”, wonder no more: here’s a link to the docs.  Getting the –query filter correct usually takes some back & forth with reviewing the JSON output, so don’t be discouraged if it takes a few tries.)

27 January, 2016

Scale 14x recap

by gorthx

I went to SCALE mainly for PostgreSQL@Scale – Joe Conway’s grown this annual add-on into a two-day, two-track event.

The new venue is a huge improvement, even though the walk from the hotel was a bit longer than I expected. There are a lot more accessible-on-foot food options.

For my session, I gave an updated version of last year’s “RDS Postgres: A Journey down the Amazon”.

If you’re using RDS and aren’t keeping up with your monitoring, please be aware of the current required OS update.  You can view & schedule required updates via the web console; here are the corresponding cli commands:

aws rds describe-pending-maintenance-actions \
--filters Name=db-instance-id,Values=mydb,mydb1,mydb2
aws rds apply-pending-maintenance-action \
--resource-identifier arn:aws:rds:[region]:[account]:db:[db-name] \
--apply-action system-update \ 
--opt-in-type next-maintenance

I find JSON and text output more useful for these types of tasks; I’ll cover that in a later post.

Accessing RDS Pg logs is kind of a sticking point for some people, me included. (I _really_ want an easy way to get them into Splunk.)
Here’s Denish Patel’s work using the rds cli, and a quick bash script I threw together, which uses the aws cli.

Based on the quick headcount I took at the beginning of the session, attendees were about 2/3 devs, 1/3 DBAs. We also had a couple of people from the Amazon RDS team; it’s always nice to know I have backup in the audience! ;)

A couple of folks had questions after my talk and I didn’t catch up with them in the booth. If that is you, feel free to leave a comment here, or email me.

Other talks I attended:
Jim Mlodgenski’s Debugging PL/PgSQL contained some good tips on use of RAISE NOTICE and what not to do. I installed pl_profiler and started using it the next afternoon.

Xof’s JSON Home Improvement included advice on when to use JSON vs JSONB, or maybe even neither :) If you’ve been reading my blog for any length of time, you know that these kinds of discussions just end in me making plans for more benchmarking…

Peter Geoghegan’s UPSERT use cases  covered a good range of material from beginning to “We can do WHAT now?” and made me realize I can always improve my SQL skills.

I’ve been hearing about ToroDB  but hadn’t attended one of Alvaro’s talks yet. Very basically, it’s a Pg-backed Mongo interface. They take the json data and explode it out into sub-documents. Then there’s a table that stores the structure _in json_. It both blew my mind & had me smacking my forehead “of course!”

It is in alpha, so this applies.

Slides for talks are (or will be) on the Pg wiki.  We do rely on speakers to load their own slides, so if you are after something in particular, please ask & I will remind them. :)

I’m already looking forward to next year, and hope I get to go ice skating then!

20 January, 2016

Postgres FDW, one year later

by gorthx

I’ve been using Postgres’ foreign data wrappers for about a year now.

Stuff I’ve learned/done:
– Those of you on 9.5 can use the new IMPORT FOREIGN SCHEMA. If you’re not quite there yet, Leo and Regina’s function to script table creation as been saving me a ton of time.

I modified it slightly (gist here):
– I name my foreign servers after their source database, and organize the foreign tables into schemas named for the source db/foreign server, so I tweaked the function to handle that
– added a parameter to specify updateable true/false, since I explicitly set that for each table
– changed the output to my preferred formatting.

Updating a server definition is much easier than I expected:

BEGIN;
ALTER SERVER my_fdw_server
 OPTIONS (
SET dbname 'new_db_name'
, SET host 'new_host_name'
);

You can do the same thing with the table definitions;  no need to drop & re-create them.

– Handling a serial field on the source table is interesting.
Say I have a table that looks like this:

CREATE TABLE table1 (
id serial primary key
, name text
, color text
);

I create a foreign table on that, including all fields (exercise left to the reader). Then try to update it as I normally would, allowing Pg to handle the incrementing of the id field:

INSERT INTO table1 (name, color)
VALUES ('bob', 'blue');
ERROR: null value in column "id" violates not-null constraint

I don’t want to pass in a value for id, because that negates the entire purpose of have a serial datatype.

My solution was to leave the id field out of the foreign table definition.

Watch your datatypes, as CREATE FOREIGN TABLE doesn’t do any verification. “Fun” things can happen when you define your foreign table with timestamp without tz (to match the source table), then finally get approval to correct that on the source table, and don’t also change it in your foreign table definition.

Tags: ,
28 September, 2015

PgOpen recap

by gorthx

We’ve wrapped up another PostgresOpen! The official numbers aren’t in yet, but we were up a significant percentage from last year. We had so many last-minute registrants and walk-ups that we ran out of extra badges.

Even better, I got to attend most of the talks on my list.

I can’t pick a favorite, but judging by the amount of notes I took, I learned the most in Joe Celko’s talk on measurement scales. The concepts and nomenclature were vaguely familiar from my long-ago education in the sciences, and it was interesting to tie them in with what I’m doing now.

Grant gave an excellent introduction to RDS Pg. I even learned some things. If you’re new to RDS Postgres, you should take this tutorial next time it’s offered. I continue to be impressed with the RDS team members; not only are they a whip-smart group, they’re also very personable and kind.

I learned about \ef (allows you to edit functions in place) in Keith Fiske’s Don’t Forget the Elephant, and was also reminded of how far Postgres has come, even just since 9.0. So many useful features!

Stella’s SQL Guru talk gave me some good pointers on window functions.

Slides for most of the talks are available on the wiki, and videos are on the way.

Between the social events and prepping for the Tour des Fleurs, I did not have time to see much of Dallas. I’ll likely run the TDF again next year, if the conf coincides with it. LMK if you’d like to join. The course is mostly flat; there’s one small hill.

Great to see everyone and hope to see you again next year!

30 August, 2015

My “Must-see” List for PgOpen

by gorthx

PostgresOpen is coming up in just a few weeks! (Disclosure: I am on the conference committee.) We are still working out the room assignments; your conference badge/printed program will have the right info.

The committee is very excited to have snagged Lacey Williams Henschel to give our keynote, on Open Source & Higher Ed. I’m looking forward to meeting her! (The real reason I’m at the reg desk is to try to meet everyone ;) )

As in previous years, four tutorials are scheduled for the day before the conference proper. As anyone who’s talked to me in the past year could guess, I’m most excited about Grant McAlister’s RDS tutorial. There’s always more to learn about this quickly maturing platform.

My picks for the rest of the conference are as follows:
Thursday sessions:
– Kevin Kempter’s Monitoring Postgres

– Joe Conway’s R talk

– Sarah Conway’s security talk was SRO last year; this year she’s back to focus on security in RDS. (In the same time slot, Jim Mlodgenski’s FDW talk is a good intro to that subject.)

– I’ve been using SQL for a while now :koff: but I’m still not that great at some things like window functions, so I’ll be in Stella Nisenbaum’s SQL Guru talk.

– Then I’ll finish out my day with David Steele’s Audit Logging.

Friday sessions:
– I’ve been enjoying the “History of Postgres”-type talks we’ve had at a few confs lately, and am looking forward to Keith Fiske’s version, Don’t Forget the Elephant.

– Shaun Thomas Highly Available Postgres.

– Of course I’ll be in Peter Don’t-even-try-to-pronounce-it Geoghegan’s UPSERT use cases.

– I’ve caught Magnus Hagander’s Tardis talk at NYC earlier this year. If you haven’t had the chance, you should check it out!

– And last, Phil Vacca’s Text Search.

We will be having a lightning talk session, as in previous years. Signups will open soon; watch the conference twitter stream for announcements.

It’s not too late to register, but our hotel room rate and reservation block expires on Sept 2, so act fast! If you’re a member of a PUG, ask your group leader for the conference discount code. (If you’re a PUG leader and haven’t received the code, please email the conference committee.

6 July, 2015

More AWS + redis fun

by gorthx

Part 1.

1. I “upgraded” one of my smaller test clusters to a t2.medium, which has better specs than the m1.small it was on previously. It came up in a weird not-accessible state, and when I started troubleshooting it, I noticed backups weren’t configured. Turns out backup and restore is not supported on cache.t1.* and cache.t2.* instance types. Reference: http://docs.aws.amazon.com/AmazonElastiCache/latest/UserGuide/ManagingSnapshots.html (Update: duh, “t” stands for “testing”, as in “don’t use this in production.”)

2. We started getting OOM errors on one of our clusters, another m1.small. The dataset was only 900M, so I was a bit mystified. Apparently, when you configure redis to be persistent (ie you won’t lose your data if it restarts) (at least that’s the way it’s supposed to work, :koff: ) it can actually take up to twice the memory of the dataset.

Reference: http://redis.io/topics/admin, see bold text: “If you are using Redis in a very write-heavy application, while saving an RDB file on disk or rewriting the AOF log Redis may use up to 2 times the memory normally used.”

Familiarize yourself with these:
http://redis.io/topics/memory-optimization
http://docs.aws.amazon.com/AmazonElastiCache/latest/UserGuide/BestPractices.html#BestPractices.BGSAVE

The bad news: INFO (at least on 2.6.13) doesn’t tell you the max memory configured. Nor is that available via describe-cache-cluster or describe-cache-parameters; you have to infer it from the instance class. Kind of a bummer!

Of course, another option is not to store data you like in an in-memory database, but that’s a discussion for another time.

3. Taking a final snapshot for a cluster is now supported!
aws elasticache delete-cache-cluster \
–cache-cluster-id gabrielles-redis \
–final-snapshot-identifier gabrielles-redis-hinky-2015-07-01

25 May, 2015

LFNW Conference Report

by gorthx

At the end of April, Mark Wong & I headed up to Bellingham for LinuxFest. This was my first time at the conference in several years, and though the expo hall was a bit smaller than I remembered, they had 1849 registered attendees! That is a truly excellent number for a community-run conference in a far corner of the US.

Somehow we had three Postgres booths: PgUS, 2nd Quadrant, and SEAPUG. This caused some confusion among the attendees. As in past years, booth visitors ran the gamut between experienced folks, random people who dropped in to see what was going on, and That Guy Who Just Wants to Argue^W Discuss Why MySQL Is Better [tm]. I hadn’t actually seen him in a while, it was good to catch up. :koff:

Talks I attended:
Emily Dunham’s Thinking in git, where I learned about git commit –dry-run and git pull –rebase (which makes it look like you were smart & pulled when you were supposed to.)

Robert Bernier’s Welcome To Total Security gave a fun historical overview of Postgres, and very good coverage of both beginner and more advanced information.

Frances Hocutt’s Why are these people following me. The big lesson for me from this talk was “Just because the abuse isn’t aimed at you, doesn’t mean you don’t suffer from it”, as I am currently dealing with a similar situation. Also: “Assume good faith” doesn’t mean “ignore evidence of bad faith”.

Eric Worden’s start_date, end_date: Calculate! included a pretty appalling example of a temporal table design, and how to fix it. This just convinced me even more that range types are The Right Way to handle these requirements.

On a personal note: I’ve been doing Pg-only confs for the past few years and made a rookie presenter mistake with my talk: I assumed a level of familiarity with Postgres that my audience didn’t have. I’ll need to adjust for that next time.

18 May, 2015

Simple test for lock_waits log messages

by gorthx

I do a lot of “is this !#@(* really working correctly” testing. (See also: simple test for autovacuum log messages.)

Here’s a simple test to verify that log_lock_waits is configured correctly. This is essential for troubleshooting lock contention.


-- verify settings
-- log_lock_waits should be 'on'; it is off by default.
SHOW log_lock_waits;
-- lock_wait messages are LOG level messages,
-- so the default value 'warning' should suffice here
SHOW log_min_messages;
-- if you're reading this, this is most likely still set to the default 1s. That is ok. :)
SHOW deadlock_timeout;

DROP TABLE IF EXISTS lock_wait_test;
CREATE TABLE lock_wait_test
(id serial primary key
, name text)
;

-- add some data
INSERT INTO lock_wait_test(name)
VALUES
('ankeny'),('burnside'),('couch'),('davis'),('everett'),('flanders'),('glisan'),('hoyt'),('irving')
;

-- take a lock
BEGIN;
LOCK lock_wait_test IN ACCESS EXCLUSIVE MODE;

-- open up another session and run this:
SELECT * FROM lock_wait_test;

Wait deadlock_timeout, then cancel your select query.

Then go check the logs. You should have a helpful message about your lock wait, and the waiting query:
2015-05-02 16:59:23 PDT LOG: process 7617 still waiting for AccessShareLock on relation 16550 of database 12066 after 1000.161 ms at character 15
2015-05-02 16:59:23 PDT STATEMENT: select * FROM lock_wait_test;

Roll your lock back, and repeat starting with taking a lock. Instead of canceling the SELECT query, roll back your lock. This time you’ll see these messages:
2015-05-02 17:02:23 PDT LOG: process 7617 still waiting for AccessShareLock on relation 16550 of database 12066 after 1000.112 ms at character 15
2015-05-02 17:02:23 PDT STATEMENT: select * FROM lock_wait_test;
2015-05-02 17:02:26 PDT LOG: process 7617 acquired AccessShareLock on relation 16550 of database 12066 after 3920.210 ms at character 15
2015-05-02 17:02:26 PDT STATEMENT: select * FROM lock_wait_test



Thanks to Magnus for the suggestion.

4 May, 2015

SCALE 13x and “You are not the database Superuser”

by gorthx

Another followup from my talk at SCALE 13x.

This isn’t to pick on RDS in particular; you’ll likely have these or similar issues with other “database as managed service” options.

Specific problems you should be aware of:

1a – As I discussed in my post about upgrading an RDS db to Pg 9.4, you can’t pg_dumpall; access to roles & credentials is restricted.
1b – … which means you can’t extract roles & credentials. (I’m told the RDS devs are working on a way to handle this.)

2 – Lots of in pg_stat_activity. You can’t see what the actual superuser is doing, of course; so far this is an annoyance more than a true problem, because I wouldn’t be able to smash a query the actual superuser is running, anyway.

3 – You can’t VACUUM a database, only individual tables.

4 – Oops: “FATAL: remaining connection slots are reserved for non-replication superuser connections”. This does not include you :), so you are effectively locked out of your instance. You can try calling Amazon support for this; my experience with this was that even a request for information about existing connections had to be escalated, and I was told they wouldn’t terminate connections for me. So you’re looking at killing them from the app side, or restarting the instance (with a higher max_connections, perhaps). This makes troubleshooting this issue particularly challenging.

5 – If you’re fond of using SET ROLE [rolename] to troubleshoot, you have to do the extra step of adding yourself to the role:
gabrielle=> SET ROLE db_rw;
ERROR: permission denied to set role "db_rw"
gabrielle=> GRANT db_rw TO gabrielle ;
GRANT ROLE
gabrielle=> SET ROLE db_rw;

23 March, 2015

Upgrading an existing RDS database to Postgres 9.4

by gorthx

Last Thursday, I had this short and one-sided conversation with myself:
“Oh, cool, Pg 9.4 is out for RDS. I’ll upgrade my new database before I have to put it into production next week, because who knows when else I’ll get a chance. Even though I can’t use pg_dumpall, this will take me what, 20 minutes, tops.”

Here is the process I came up with. It did take a bit longer than 20 minutes & I hope these notes save someone else some time. Instance info: t2.small, 10G GP SSD attached storage.

If you have a lot of instances you want to upgrade, you could script this entire thing with the API + psql.

Prep:
– If you’re using a non-default parameter group, create an equivalent for 9.4. You can’t use 9.3 parameter groups with 9.4 instances.
– Launch a new 9.4 instance (e.g. gabs-db-94) with the same specs as the original, but with the new parameter group.
– Get role information from your existing 9.3 instance. (See “problem areas”, below.)
– Create roles in the default database. In postgres, roles are available to all dbs in the cluster, so when you load your dumps in, all the table permissions should get set correctly if the roles already exist.

Upgrade:
Update: Make sure you use the 9.4 pg_dump. This may have contributed to my #3 ‘problem area’ below.
– For each database:

pg_dump -Fc -v -h [endpoint of 9.3 instance] -U [master username] [database] > [database].dump
pg_restore -C -v -h [endpoint of 9.4 instance] -U [master username] -d [master database] [database].dump

– TEST
– Rename 9.3 instance to e.g. gabs-db-93 (no periods!)
– Rename gabs-db-94 to gabs-db
– Don’t destroy the 9.3 instance for at least another week after you’re sure the new one is working. And/or when you do, don’t skip the ‘create final snapshot’ step.

Et voila.

Three specific RDS-related problem areas:
1. New (?) password requirements:
This may not be a 9.4 thing, but sometime in the past month or so, Amazon restricted which special characters are allowed in the master password. If you are using non-alphanumerics in your master passwords, try them out ahead of time.

2. Roles:
Without pg_dumpall, you don’t get role information in your database dump. (Or tablespaces, though I believe you can’t use those on RDS anyway.) This includes things like search_path, log settings specific for that user, etc. (Is there anything else that comes with pg_dumpall and not pg_dump? I don’t know.)

Try this query:

SELECT r.rolname, d.datname, rs.setconfig
FROM pg_db_role_setting rs
LEFT JOIN pg_roles r ON r.oid = rs.setrole
LEFT JOIN pg_database d ON d.oid = rs.setdatabase
WHERE r.rolname NOT LIKE 'rds%'
;

That will show you anything specifically set for a role. (Credit here: http://dba.stackexchange.com/questions/56023/what-is-the-search-path-for-a-given-database-and-user)

Roles that don’t have non-default options won’t be in that result set, so make sure you check pg_roles or \du as well. If you script this, filter rds% out of your result set.

Passwords are not included. I don’t have a workaround for that, other than know all the passwords, or reset them all. I’m sure both options are equally popular.

3. Extensions:
This is the really strange one. After the upgrade, some extensions worked, some didn’t. I had to DROP and re-CREATE tablefunc and pg_trgm, both of which worked in my 9.3 instance.

Additional notes:

You’ll have a (short) outage during the time you’re swapping the 9.4 instance in by renaming it. I don’t see a way around this yet.

autovacuum log messages are still missing in 9.4; log_lock_waits works now.

For your amusement, the stuff I messed up:
1. First, I decided using the web console would be a faster way to create the new instance, instead of the cli (which I’m more familiar with.) When I use the cli, I explicitly set all the options available for a new instance. I was careless on the web console and accepted some dumb defaults and had to start over…twice. Lesson: go with what you know.

2. My database has multiple schemas. I missed transferring the search_path for my main ‘worker’ role, and basically broke half of my ETL jobs. This was actually a positive thing, because I was working toward explicitly schema-qualifying all SQL, and I found every place I was not doing that. :koff:

3. After the upgrade & changes, I wanted to dump the schemas out to version control. “Aborting due to server mismatch.” D’oh! Of course I didn’t have 9.4 installed locally. That opened up a whole ‘nother can of worms with trying to upgrade pg on my mac, but I’ll save that for over a beer. Sheesh.

Follow

Get every new post delivered to your Inbox.