Posts tagged ‘amazon’

20 June, 2016

RDS: log vacuum messages

by gorthx

Last year I commented that “autovacuum log messages are still missing in [RDS] 9.4“.

Amazon fixed that for versions 9.4.5 and up with the custom GUC rds.force_autovacuum_logging_level.

I discovered this week that it’s also available on earlier versions of RDS – I have it enabled on a 9.3.10 instance. The default value in the config is one of the debug levels; just change it to ‘log’. No restart required (assuming you already have logging configured… and you do, don’t you?)

Works as expected:

2016-06-16 18:36:41 UTC::@:[31403]:LOG: automatic vacuum of table "my_db.public.my_dimension": index scans: 1
 pages: 0 removed, 29457 remain
 tuples: 3454 removed, 429481 remain, 0 are dead but not yet removable
 buffer usage: 64215 hits, 8056 misses, 22588 dirtied
 avg read rate: 1.018 MB/s, avg write rate: 2.855 MB/s
 system usage: CPU 0.10s/0.88u sec elapsed 61.80 sec

This makes me very happy – no more cron job to monitor vacuums!

6 July, 2015

More AWS + redis fun

by gorthx

Part 1.

1. I “upgraded” one of my smaller test clusters to a t2.medium, which has better specs than the m1.small it was on previously. It came up in a weird not-accessible state, and when I started troubleshooting it, I noticed backups weren’t configured. Turns out backup and restore is not supported on cache.t1.* and cache.t2.* instance types. Reference: http://docs.aws.amazon.com/AmazonElastiCache/latest/UserGuide/ManagingSnapshots.html (Update: duh, “t” stands for “testing”, as in “don’t use this in production.”)

2. We started getting OOM errors on one of our clusters, another m1.small. The dataset was only 900M, so I was a bit mystified. Apparently, when you configure redis to be persistent (ie you won’t lose your data if it restarts) (at least that’s the way it’s supposed to work, :koff: ) it can actually take up to twice the memory of the dataset.

Reference: http://redis.io/topics/admin, see bold text: “If you are using Redis in a very write-heavy application, while saving an RDB file on disk or rewriting the AOF log Redis may use up to 2 times the memory normally used.”

Familiarize yourself with these:
http://redis.io/topics/memory-optimization
http://docs.aws.amazon.com/AmazonElastiCache/latest/UserGuide/BestPractices.html#BestPractices.BGSAVE

The bad news: INFO (at least on 2.6.13) doesn’t tell you the max memory configured. Nor is that available via describe-cache-cluster or describe-cache-parameters; you have to infer it from the instance class. Kind of a bummer!

Of course, another option is not to store data you like in an in-memory database, but that’s a discussion for another time.

3. Taking a final snapshot for a cluster is now supported!
aws elasticache delete-cache-cluster \
–cache-cluster-id gabrielles-redis \
–final-snapshot-identifier gabrielles-redis-hinky-2015-07-01

4 May, 2015

SCALE 13x and “You are not the database Superuser”

by gorthx

Another followup from my talk at SCALE 13x.

This isn’t to pick on RDS in particular; you’ll likely have these or similar issues with other “database as managed service” options.

Specific problems you should be aware of:

1a – As I discussed in my post about upgrading an RDS db to Pg 9.4, you can’t pg_dumpall; access to roles & credentials is restricted.
1b – … which means you can’t extract roles & credentials. (I’m told the RDS devs are working on a way to handle this.)

2 – Lots of in pg_stat_activity. You can’t see what the actual superuser is doing, of course; so far this is an annoyance more than a true problem, because I wouldn’t be able to smash a query the actual superuser is running, anyway.

3 – You can’t VACUUM a database, only individual tables.

4 – Oops: “FATAL: remaining connection slots are reserved for non-replication superuser connections”. This does not include you :), so you are effectively locked out of your instance. You can try calling Amazon support for this; my experience with this was that even a request for information about existing connections had to be escalated, and I was told they wouldn’t terminate connections for me. So you’re looking at killing them from the app side, or restarting the instance (with a higher max_connections, perhaps). This makes troubleshooting this issue particularly challenging.

5 – If you’re fond of using SET ROLE [rolename] to troubleshoot, you have to do the extra step of adding yourself to the role:
gabrielle=> SET ROLE db_rw;
ERROR: permission denied to set role "db_rw"
gabrielle=> GRANT db_rw TO gabrielle ;
GRANT ROLE
gabrielle=> SET ROLE db_rw;

23 March, 2015

Upgrading an existing RDS database to Postgres 9.4

by gorthx

Last Thursday, I had this short and one-sided conversation with myself:
“Oh, cool, Pg 9.4 is out for RDS. I’ll upgrade my new database before I have to put it into production next week, because who knows when else I’ll get a chance. Even though I can’t use pg_dumpall, this will take me what, 20 minutes, tops.”

Here is the process I came up with. It did take a bit longer than 20 minutes & I hope these notes save someone else some time. Instance info: t2.small, 10G GP SSD attached storage.

If you have a lot of instances you want to upgrade, you could script this entire thing with the API + psql.

Prep:
– If you’re using a non-default parameter group, create an equivalent for 9.4. You can’t use 9.3 parameter groups with 9.4 instances.
– Launch a new 9.4 instance (e.g. gabs-db-94) with the same specs as the original, but with the new parameter group.
– Get role information from your existing 9.3 instance. (See “problem areas”, below.)
– Create roles in the default database. In postgres, roles are available to all dbs in the cluster, so when you load your dumps in, all the table permissions should get set correctly if the roles already exist.

Upgrade:
Update: Make sure you use the 9.4 pg_dump. This may have contributed to my #3 ‘problem area’ below.
– For each database:

pg_dump -Fc -v -h [endpoint of 9.3 instance] -U [master username] [database] > [database].dump
pg_restore -C -v -h [endpoint of 9.4 instance] -U [master username] -d [master database] [database].dump

– TEST
– Rename 9.3 instance to e.g. gabs-db-93 (no periods!)
– Rename gabs-db-94 to gabs-db
– Don’t destroy the 9.3 instance for at least another week after you’re sure the new one is working. And/or when you do, don’t skip the ‘create final snapshot’ step.

Et voila.

Three specific RDS-related problem areas:
1. New (?) password requirements:
This may not be a 9.4 thing, but sometime in the past month or so, Amazon restricted which special characters are allowed in the master password. If you are using non-alphanumerics in your master passwords, try them out ahead of time.

2. Roles:
Without pg_dumpall, you don’t get role information in your database dump. (Or tablespaces, though I believe you can’t use those on RDS anyway.) This includes things like search_path, log settings specific for that user, etc. (Is there anything else that comes with pg_dumpall and not pg_dump? I don’t know.)

Try this query:

SELECT r.rolname, d.datname, rs.setconfig
FROM pg_db_role_setting rs
LEFT JOIN pg_roles r ON r.oid = rs.setrole
LEFT JOIN pg_database d ON d.oid = rs.setdatabase
WHERE r.rolname NOT LIKE 'rds%'
;

That will show you anything specifically set for a role. (Credit here: http://dba.stackexchange.com/questions/56023/what-is-the-search-path-for-a-given-database-and-user)

Roles that don’t have non-default options won’t be in that result set, so make sure you check pg_roles or \du as well. If you script this, filter rds% out of your result set.

Passwords are not included. I don’t have a workaround for that, other than know all the passwords, or reset them all. I’m sure both options are equally popular.

3. Extensions:
This is the really strange one. After the upgrade, some extensions worked, some didn’t. I had to DROP and re-CREATE tablefunc and pg_trgm, both of which worked in my 9.3 instance.

Additional notes:

You’ll have a (short) outage during the time you’re swapping the 9.4 instance in by renaming it. I don’t see a way around this yet.

autovacuum log messages are still missing in 9.4; log_lock_waits works now.

For your amusement, the stuff I messed up:
1. First, I decided using the web console would be a faster way to create the new instance, instead of the cli (which I’m more familiar with.) When I use the cli, I explicitly set all the options available for a new instance. I was careless on the web console and accepted some dumb defaults and had to start over…twice. Lesson: go with what you know.

2. My database has multiple schemas. I missed transferring the search_path for my main ‘worker’ role, and basically broke half of my ETL jobs. This was actually a positive thing, because I was working toward explicitly schema-qualifying all SQL, and I found every place I was not doing that. :koff:

3. After the upgrade & changes, I wanted to dump the schemas out to version control. “Aborting due to server mismatch.” D’oh! Of course I didn’t have 9.4 installed locally. That opened up a whole ‘nother can of worms with trying to upgrade pg on my mac, but I’ll save that for over a beer. Sheesh.

29 December, 2014

Two Elasticache Redis Tips

by gorthx

This is all in the Amazon docs about Elasticache Redis. In case you don’t read them, or have inherited a system, and want to keep your data, allow me to share what I learned this week:

1. Make sure you have appendonly enabled1. It is not enabled by default. Turning this on writes all changes to your data to an external file (AOF, for ‘append only file’). Without this, your database only exists in memory. So guess what happens when your instance reboots2? Say bye-bye to your data and hello to restoring from a backup.

Which brings me to item #2:

2. You can’t restore to an existing instance, of course – you have to delete the instance first, then restore. However, deleting the instance also deletes all automated snapshots associated with that instance. This can be a bit surprising. (The ‘are you sure you want to delete this instance’ message does not includes this information.) What I’ve done to get around this is make a manual copy of the automated snapshot I want to restore, prior to deleting the instance. Manual snapshots will stick around until you delete them, regardless of the status of the instance. IME.

You also probably want to configure the Multi-AZ failover, now that it’s available. (https://forums.aws.amazon.com/ann.jspa?annID=2709)


1 – http://docs.aws.amazon.com/AmazonElastiCache/latest/UserGuide/CacheParameterGroups.Redis.html
2 – Keep in mind that this is a managed service; you do not have control over whether it reboots or not.