6 July, 2015

More AWS + redis fun

by gorthx

Part 1.

1. I “upgraded” one of my smaller test clusters to a t2.medium, which has better specs than the m1.small it was on previously. It came up in a weird not-accessible state, and when I started troubleshooting it, I noticed backups weren’t configured. Turns out backup and restore is not supported on cache.t1.* and cache.t2.* instance types. Reference: http://docs.aws.amazon.com/AmazonElastiCache/latest/UserGuide/ManagingSnapshots.html

2. We started getting OOM errors on one of our clusters, another m1.small. The dataset was only 900M, so I was a bit mystified. Apparently, when you configure redis to be persistent (ie you won’t lose your data if it restarts) (at least that’s the way it’s supposed to work, :koff: ) it can actually take up to twice the memory of the dataset.

Reference: http://redis.io/topics/admin, see bold text: “If you are using Redis in a very write-heavy application, while saving an RDB file on disk or rewriting the AOF log Redis may use up to 2 times the memory normally used.”

Familiarize yourself with these:
http://redis.io/topics/memory-optimization
http://docs.aws.amazon.com/AmazonElastiCache/latest/UserGuide/BestPractices.html#BestPractices.BGSAVE

The bad news: INFO (at least on 2.6.13) doesn’t tell you the max memory configured. Nor is that available via describe-cache-cluster or describe-cache-parameters; you have to infer it from the instance class. Kind of a bummer!

Of course, another option is not to store data you like in an in-memory database, but that’s a discussion for another time.

3. Taking a final snapshot for a cluster is now supported!
aws elasticache delete-cache-cluster \
–cache-cluster-id gabrielles-redis \
–final-snapshot-identifier gabrielles-redis-hinky-2015-07-01

25 May, 2015

LFNW Conference Report

by gorthx

At the end of April, Mark Wong & I headed up to Bellingham for LinuxFest. This was my first time at the conference in several years, and though the expo hall was a bit smaller than I remembered, they had 1849 registered attendees! That is a truly excellent number for a community-run conference in a far corner of the US.

Somehow we had three Postgres booths: PgUS, 2nd Quadrant, and SEAPUG. This caused some confusion among the attendees. As in past years, booth visitors ran the gamut between experienced folks, random people who dropped in to see what was going on, and That Guy Who Just Wants to Argue^W Discuss Why MySQL Is Better [tm]. I hadn’t actually seen him in a while, it was good to catch up. :koff:

Talks I attended:
Emily Dunham’s Thinking in git, where I learned about git commit –dry-run and git pull –rebase (which makes it look like you were smart & pulled when you were supposed to.)

Robert Bernier’s Welcome To Total Security gave a fun historical overview of Postgres, and very good coverage of both beginner and more advanced information.

Frances Hocutt’s Why are these people following me. The big lesson for me from this talk was “Just because the abuse isn’t aimed at you, doesn’t mean you don’t suffer from it”, as I am currently dealing with a similar situation. Also: “Assume good faith” doesn’t mean “ignore evidence of bad faith”.

Eric Worden’s start_date, end_date: Calculate! included a pretty appalling example of a temporal table design, and how to fix it. This just convinced me even more that range types are The Right Way to handle these requirements.

On a personal note: I’ve been doing Pg-only confs for the past few years and made a rookie presenter mistake with my talk: I assumed a level of familiarity with Postgres that my audience didn’t have. I’ll need to adjust for that next time.

18 May, 2015

Simple test for lock_waits log messages

by gorthx

I do a lot of “is this !#@(* really working correctly” testing. (See also: simple test for autovacuum log messages.)

Here’s a simple test to verify that log_lock_waits is configured correctly. This is essential for troubleshooting lock contention.


-- verify settings
-- log_lock_waits should be 'on'; it is off by default.
SHOW log_lock_waits;
-- lock_wait messages are LOG level messages,
-- so the default value 'warning' should suffice here
SHOW log_min_messages;
-- if you're reading this, this is most likely still set to the default 1s. That is ok. :)
SHOW deadlock_timeout;

DROP TABLE IF EXISTS lock_wait_test;
CREATE TABLE lock_wait_test
(id serial primary key
, name text)
;

-- add some data
INSERT INTO lock_wait_test(name)
VALUES
('ankeny'),('burnside'),('couch'),('davis'),('everett'),('flanders'),('glisan'),('hoyt'),('irving')
;

-- take a lock
BEGIN;
LOCK lock_wait_test IN ACCESS EXCLUSIVE MODE;

-- open up another session and run this:
SELECT * FROM lock_wait_test;

Wait deadlock_timeout, then cancel your select query.

Then go check the logs. You should have a helpful message about your lock wait, and the waiting query:
2015-05-02 16:59:23 PDT LOG: process 7617 still waiting for AccessShareLock on relation 16550 of database 12066 after 1000.161 ms at character 15
2015-05-02 16:59:23 PDT STATEMENT: select * FROM lock_wait_test;

Roll your lock back, and repeat starting with taking a lock. Instead of canceling the SELECT query, roll back your lock. This time you’ll see these messages:
2015-05-02 17:02:23 PDT LOG: process 7617 still waiting for AccessShareLock on relation 16550 of database 12066 after 1000.112 ms at character 15
2015-05-02 17:02:23 PDT STATEMENT: select * FROM lock_wait_test;
2015-05-02 17:02:26 PDT LOG: process 7617 acquired AccessShareLock on relation 16550 of database 12066 after 3920.210 ms at character 15
2015-05-02 17:02:26 PDT STATEMENT: select * FROM lock_wait_test



Thanks to Magnus for the suggestion.

4 May, 2015

SCALE 13x and “You are not the database Superuser”

by gorthx

Another followup from my talk at SCALE 13x.

This isn’t to pick on RDS in particular; you’ll likely have these or similar issues with other “database as managed service” options.

Specific problems you should be aware of:

1a – As I discussed in my post about upgrading an RDS db to Pg 9.4, you can’t pg_dumpall; access to roles & credentials is restricted.
1b – … which means you can’t extract roles & credentials. (I’m told the RDS devs are working on a way to handle this.)

2 – Lots of in pg_stat_activity. You can’t see what the actual superuser is doing, of course; so far this is an annoyance more than a true problem, because I wouldn’t be able to smash a query the actual superuser is running, anyway.

3 – You can’t VACUUM a database, only individual tables.

4 – Oops: “FATAL: remaining connection slots are reserved for non-replication superuser connections”. This does not include you :), so you are effectively locked out of your instance. You can try calling Amazon support for this; my experience with this was that even a request for information about existing connections had to be escalated, and I was told they wouldn’t terminate connections for me. So you’re looking at killing them from the app side, or restarting the instance (with a higher max_connections, perhaps). This makes troubleshooting this issue particularly challenging.

5 – If you’re fond of using SET ROLE [rolename] to troubleshoot, you have to do the extra step of adding yourself to the role:
gabrielle=> SET ROLE db_rw;
ERROR: permission denied to set role "db_rw"
gabrielle=> GRANT db_rw TO gabrielle ;
GRANT ROLE
gabrielle=> SET ROLE db_rw;

23 March, 2015

Upgrading an existing RDS database to Postgres 9.4

by gorthx

Last Thursday, I had this short and one-sided conversation with myself:
“Oh, cool, Pg 9.4 is out for RDS. I’ll upgrade my new database before I have to put it into production next week, because who knows when else I’ll get a chance. Even though I can’t use pg_dumpall, this will take me what, 20 minutes, tops.”

Here is the process I came up with. It did take a bit longer than 20 minutes & I hope these notes save someone else some time. Instance info: t2.small, 10G GP SSD attached storage.

If you have a lot of instances you want to upgrade, you could script this entire thing with the API + psql.

Prep:
– If you’re using a non-default parameter group, create an equivalent for 9.4. You can’t use 9.3 parameter groups with 9.4 instances.
– Launch a new 9.4 instance (e.g. gabs-db-94) with the same specs as the original, but with the new parameter group.
– Get role information from your existing 9.3 instance. (See “problem areas”, below.)
– Create roles in the default database. In postgres, roles are available to all dbs in the cluster, so when you load your dumps in, all the table permissions should get set correctly if the roles already exist.

Upgrade:
Update: Make sure you use the 9.4 pg_dump. This may have contributed to my #3 ‘problem area’ below.
– For each database:

pg_dump -Fc -v -h [endpoint of 9.3 instance] -U [master username] [database] > [database].dump
pg_restore -C -v -h [endpoint of 9.4 instance] -U [master username] -d [master database] [database].dump

– TEST
– Rename 9.3 instance to e.g. gabs-db-93 (no periods!)
– Rename gabs-db-94 to gabs-db
– Don’t destroy the 9.3 instance for at least another week after you’re sure the new one is working. And/or when you do, don’t skip the ‘create final snapshot’ step.

Et voila.

Three specific RDS-related problem areas:
1. New (?) password requirements:
This may not be a 9.4 thing, but sometime in the past month or so, Amazon restricted which special characters are allowed in the master password. If you are using non-alphanumerics in your master passwords, try them out ahead of time.

2. Roles:
Without pg_dumpall, you don’t get role information in your database dump. (Or tablespaces, though I believe you can’t use those on RDS anyway.) This includes things like search_path, log settings specific for that user, etc. (Is there anything else that comes with pg_dumpall and not pg_dump? I don’t know.)

Try this query:

SELECT r.rolname, d.datname, rs.setconfig
FROM pg_db_role_setting rs
LEFT JOIN pg_roles r ON r.oid = rs.setrole
LEFT JOIN pg_database d ON d.oid = rs.setdatabase
WHERE r.rolname NOT LIKE 'rds%'
;

That will show you anything specifically set for a role. (Credit here: http://dba.stackexchange.com/questions/56023/what-is-the-search-path-for-a-given-database-and-user)

Roles that don’t have non-default options won’t be in that result set, so make sure you check pg_roles or \du as well. If you script this, filter rds% out of your result set.

Passwords are not included. I don’t have a workaround for that, other than know all the passwords, or reset them all. I’m sure both options are equally popular.

3. Extensions:
This is the really strange one. After the upgrade, some extensions worked, some didn’t. I had to DROP and re-CREATE tablefunc and pg_trgm, both of which worked in my 9.3 instance.

Additional notes:

You’ll have a (short) outage during the time you’re swapping the 9.4 instance in by renaming it. I don’t see a way around this yet.

autovacuum log messages are still missing in 9.4; log_lock_waits works now.

For your amusement, the stuff I messed up:
1. First, I decided using the web console would be a faster way to create the new instance, instead of the cli (which I’m more familiar with.) When I use the cli, I explicitly set all the options available for a new instance. I was careless on the web console and accepted some dumb defaults and had to start over…twice. Lesson: go with what you know.

2. My database has multiple schemas. I missed transferring the search_path for my main ‘worker’ role, and basically broke half of my ETL jobs. This was actually a positive thing, because I was working toward explicitly schema-qualifying all SQL, and I found every place I was not doing that. :koff:

3. After the upgrade & changes, I wanted to dump the schemas out to version control. “Aborting due to server mismatch.” D’oh! Of course I didn’t have 9.4 installed locally. That opened up a whole ‘nother can of worms with trying to upgrade pg on my mac, but I’ll save that for over a beer. Sheesh.

9 March, 2015

PDXPUG Lab Recap: postgres_fdw

by gorthx

Back in January, PDXPUG had a lab night to try out the postgres_fdw.

Our labs are even more casual than our meetings: I don’t set an agenda, I invite the attendees to choose specific questions or topics they want to investigate. Here’s what we came up with for our FDW lab:

– What is it
– Use cases; who has one?
– Best practices?
– Security concerns?
– Performance concerns? (We didn’t get to this.)
– Any diffs between 9.3 and 9.4? (… or this either.)
– Test drive!

Our main focus was the Test Drive. Setup went pretty well for most folks, though one attendee had permissions issues & fell back to file_fdw.

There are four steps (outlined in the docs)
0. Install the extension ;)
1. Create a foreign server
2. Create a user mapping
3. Create your foreign table(s)

Useful commands:
\des to describe foreign servers
\deu to describe user mappings
\det to describe foreign tables

My use case: quick & dirty ETL solution. I pull data from several postgres datasources1 into a staging database (for data conformance and other tasks), and then push it out to a data mart.

To keep it all reasonably organized, I put the foreign tables for each data source in a schema named for that datasource, e.g. the data from HR goes to schema ‘hr’, data from Accounting goes to ‘accounting’, etc. (This has tripped me up a couple of times, which I’ll cover in a later post.)

Here’s a quick example: I have a ‘users’ table in the accounting database, in the public schema. I want to put it in the ‘accounting’ schema locally.

CREATE SERVER accounting FOREIGN DATA WRAPPER postgres_fdw OPTIONS (
dbname ‘accounting’
, host ‘server.boozemckidney.com’
, port ‘5432’
, updatable ‘false’
)
;

CREATE USER MAPPING FOR public SERVER accounting OPTIONS (
password ‘[password]’
, user ‘[username]’
)
;

CREATE FOREIGN TABLE accounting.users (
id integer
, username text
)
SERVER accounting
OPTIONS (
schema_name ‘public’
, table_name ‘users’
, updatable ‘false’
)
;

Note that you don’t get an error if the schema & table you specify doesn’t exist in the remote database, so check your work. I just run a quick SELECT … LIMIT 1 on the foreign table. This will also help pick up permissions problems.

According to the docs, the default for updatable is ‘true’, and a table setting overrides a server setting. I originally intepreted that to mean that a table’s default setting of true would override a server’s explicit setting of false, which seemed odd, and my tests don’t bear that out. (Nice to see that Pg once again does the sensible thing.) Regardless, I explicitly set this value at both the server and table level, enabling it only on tables I want to update. Of course, you should also set appropriate permissions on the remote table itself.

If you have a lot of tables you need to create fdw for, script the process! This looks like a good option: http://www.postgresonline.com/journal/archives/322-Generating-Create-Foreign-Table-Statements-for-postgres_fdw.html

You can ALTER FOREIGN TABLE for the field names as you would a regular table, but I haven’t figured out how/if you can update the options (like, if somebody changes the target hostname or dbname2).

The credentials associated with user mappings are pg_dumped as cleartext. Ideally, I guess you’d use .pgpass to manage the passwords for you, but I don’t have that option on RDS. To avoid accidentally committing schema dumps that include credentials to my git repo, I added this to my pre-commit hook:

sed -i .bak "s/password '.*',/password '[pwd]',/" schema_dump.sql

(The .bak is required on OSX.)

As usual, I picked up some additional “off-topic” info: check out SELECT * [sequence] or \d [sequence]!

Thanks to Matt S for hanging out afterwards and introducing me to the differences between Oracle’s and Pg’s query planners.



1 – and a few others that are being a pain in my ass :koff:salesforce:koff:
2 – of course, why worry about that, because that NEVER happens, right?

Tags: ,
2 March, 2015

SCALE 13x and “Yes, you still need a DBA”

by gorthx

Last weekend, I got to give a talk at SCALE‘s LA PgDay for the second year in a row. PGDay was two days this year, and unfortunately I had to miss Thursday – looks like I missed some great talks, too, dangit!

I reprised my RDS talk from PDXPUG’s January meeting. It certainly sparked some interesting discussions later.

(If you are too shy to ask questions in my talk, it is totally ok to come up afterwards and ask them, or stop by the Postgres booth & talk to me. That’s what I’m there for!)

I mentioned something in my talk that I want to throw out here because it’s been coming up in conversations recently:

If you have production data that you want to protect, you need a database adminstrator. Even if you are using a managed database service (RDS, Heroku, what have you), you need someone to:

– Choose an appropriate instance size for your workload
– Configure Postgres appropriately (as much as is possible on a managed service, anyway)
– Secure and audit databases
– Ensure data quality
– Tune queries (e.g. figure out what in [Sam Hill] the ORM is doing)
– Mentor devs
– …add your own here…

None of these tasks go away just because you’re using a managed database solution. You still need someone who can hop in there & get her hands dirty. You may disagree, but please keep this list handy http://www.postgresql.org/support/professional_support/.

2 February, 2015

Updating My Linux Command line Toolbox, episode 3

by gorthx

Part 2

This week’s tips:

1. ulimit -a will show you all settings, plus the units.

2. crontab -l -u [user] will read out another user’s crontab for you (assuming you have the right perms)

3. and what I call “diff-on-the-fly” – pass the output of shell commands to diff. I like this one because I don’t make a bunch of “temporary” files that I forget to clean up later.

diff <([shell commands]) <([other shell commands])

For example, I need to compare ids in two files, but they’re in different fields in each file, and not in the same order:

diff <(cut -d"," -f1 file1 | sort -u) <(cut -d"," -f3 file2 | sort -u)

12 January, 2015

No more cursing* with ddlgenerator

by gorthx

Do you have to load data from funky files into postgres frequently? I do. I get a few requests per week of the “hey, can you give me a mass data extract for things that match the criteria in this spreadsheet” variety. Frequently, these are windows-formatted all-in-one-line “csv” files.

A year or so ago, I would have reformatted the file, reviewed the data, figured out appropriate data types for the various fields (and sometimes there are dozens of fields), written some SQL to create the table, tried to load the data, cursed a bit, lather, rinse, repeat.

I heard about Catherine’s ddl generator https://github.com/catherinedevlin/ddl-generator at last year’s PgOpen, and these days, all I do is this:

ddlgenerator --inserts postgres [datafile.csv] > datafile.sql

…then log into my database, and run:

\i datafile.sql

et voila.

As I said on twitter, it’s not just the time but the annoyance this tools saves me that I really value.

Try it out!



* at least, not about this.

Tags: , ,
29 December, 2014

Two Elasticache Redis Tips

by gorthx

This is all in the Amazon docs about Elasticache Redis. In case you don’t read them, or have inherited a system, and want to keep your data, allow me to share what I learned this week:

1. Make sure you have appendonly enabled1. It is not enabled by default. Turning this on writes all changes to your data to an external file (AOF, for ‘append only file’). Without this, your database only exists in memory. So guess what happens when your instance reboots2? Say bye-bye to your data and hello to restoring from a backup.

Which brings me to item #2:

2. You can’t restore to an existing instance, of course – you have to delete the instance first, then restore. However, deleting the instance also deletes all automated snapshots associated with that instance. This can be a bit surprising. (The ‘are you sure you want to delete this instance’ message does not includes this information.) What I’ve done to get around this is make a manual copy of the automated snapshot I want to restore, prior to deleting the instance. Manual snapshots will stick around until you delete them, regardless of the status of the instance. IME.

You also probably want to configure the Multi-AZ failover, now that it’s available. (https://forums.aws.amazon.com/ann.jspa?annID=2709)


1 – http://docs.aws.amazon.com/AmazonElastiCache/latest/UserGuide/CacheParameterGroups.Redis.html
2 – Keep in mind that this is a managed service; you do not have control over whether it reboots or not.

Follow

Get every new post delivered to your Inbox.