Posts tagged ‘databases’

6 July, 2015

More AWS + redis fun

by gorthx

Part 1.

1. I “upgraded” one of my smaller test clusters to a t2.medium, which has better specs than the m1.small it was on previously. It came up in a weird not-accessible state, and when I started troubleshooting it, I noticed backups weren’t configured. Turns out backup and restore is not supported on cache.t1.* and cache.t2.* instance types. Reference: http://docs.aws.amazon.com/AmazonElastiCache/latest/UserGuide/ManagingSnapshots.html (Update: duh, “t” stands for “testing”, as in “don’t use this in production.”)

2. We started getting OOM errors on one of our clusters, another m1.small. The dataset was only 900M, so I was a bit mystified. Apparently, when you configure redis to be persistent (ie you won’t lose your data if it restarts) (at least that’s the way it’s supposed to work, :koff: ) it can actually take up to twice the memory of the dataset.

Reference: http://redis.io/topics/admin, see bold text: “If you are using Redis in a very write-heavy application, while saving an RDB file on disk or rewriting the AOF log Redis may use up to 2 times the memory normally used.”

Familiarize yourself with these:
http://redis.io/topics/memory-optimization
http://docs.aws.amazon.com/AmazonElastiCache/latest/UserGuide/BestPractices.html#BestPractices.BGSAVE

The bad news: INFO (at least on 2.6.13) doesn’t tell you the max memory configured. Nor is that available via describe-cache-cluster or describe-cache-parameters; you have to infer it from the instance class. Kind of a bummer!

Of course, another option is not to store data you like in an in-memory database, but that’s a discussion for another time.

3. Taking a final snapshot for a cluster is now supported!
aws elasticache delete-cache-cluster \
–cache-cluster-id gabrielles-redis \
–final-snapshot-identifier gabrielles-redis-hinky-2015-07-01

Advertisements
13 May, 2011

PDXPUG is hosting a PgDay in Portland*

by gorthx

Yep, we’re doing it again! One day of PostgreSQL-specific talks, conveniently located at the Oregon Convention Center the day before OSCON.

The short version: Sunday, July 24, 2011, Portland, Oregon. 5 or so sessions. After party at Gotham Tavern.

The long version: http://pugs.postgresql.org/node/1663

Sign up for the sessions: http://bit.ly/f1F2Mz

Submit your talk proposal: http://bit.ly/g93YNO – the deadline is May 23, 10 days from now. Plenty of time for you procrastinators!

*Really, where else would we do it?

6 November, 2009

Refactoring!

by gorthx

Last night at the hackathon,  we refactored one of our queries from my review of Refactoring SQL Applications.*

First, we had a duplicate field name in the original select.  Not a problem if you’re just doing a select, but if you want to create a table (temp or otherwise) from the data, it won’t work.  So we replaced the first num_rows with rows_in_bytes.

Also, reading over this 5 months after the original attemp, I realize it’s a lot clearer if we don’t use table aliases in the outer SELECTs.

Then, we got some advice from Greg Smith that we shouldn’t do joins on pg_class.relname – this can screw you up if you have different schemas with identical table names.  You want to use oids (which I’d always thought was not desirable, but I’m assured it’s ok if you’re doing it with the system tables – you don’t want your application to depend on them, though. :) )  So, instead, we match pg_namespace.oid with pg_class.relnamespace.

Selena’s illustration of how this works:
SELECT relname, relkind FROM pg_class
JOIN pg_namespace ON pg_namespace.oid = pg_class.relnamespace WHERE relkind = 'r' AND pg_namespace.nspname = 'public';

The new & improved version of the query can be found on the Pg wiki.

I wanted to compare the new query against the old, so I created a couple of temp tables containing the results… and discovered we had a couple of data discrepancies:  a few of our tables were listed twice in the original query results, with different values for num_rows, only one of which was correct for the current schema:

portal=# SELECT tablename, rows_in_bytes, num_rows FROM index_experiment_1
WHERE tablename IN  ('detectorid_count','stations','test_agg')
ORDER BY 1;
tablename     | rows_in_bytes | num_rows
------------------+---------------+----------
detectorid_count | 0 bytes       |        0
detectorid_count | 631 bytes     |      631
stations         | 22 bytes      |       22
stations         | 350 bytes     |      350
test_agg         | 0 bytes       |        0
test_agg         | 1386 bytes    |     1386
(6 rows)

portal=# SELECT count(*) from detectorid_count;
count
——-
0
(1 row)

portal=# SELECT count(*) from stations;
count
——-
350
(1 row)

portal=# SELECT count(*) from test_agg ;
count
——-
0
(1 row)

It turns out we’d run into the exact problem that Greg had warned us about.  The additional rows were from identically-named tables in other namespaces.

Find your namespaces:
portal=# SELECT nspname from pg_namespace order by 1;
nspname
--------------------
information_schema
pg_catalog
pg_temp_1
pg_temp_2
pg_toast
pg_toast_temp_1
pg_toast_temp_2
public
selena
wendell
(10 rows)

Find your data:
portal=# SELECT count(*) from selena.detectorid_count ;
count
-------
631
(1 row)

portal=# SELECT count(*) from wendell.stations ;
count
——-
22
(1 row)

portal=# SELECT count(*) from selena.test_agg ;
count
——-
1386
(1 row)

Note that these match the additional data from our original query.

Thanks, Greg!


* No, I haven’t finished reading it yet…I don’t read during the summer, I ride my bike.

19 October, 2009

PGWest: Saturday

by gorthx

This past weekend was the 3rd annual PgWest.  The conference moved up to Seattle this year, and I think it was the biggest it’s ever been.  As usual, there were more interesting talks scheduled than I had time to attend.  (This is the 21st century;  where’s my time machine?)

For my first tech conferences a few years ago, I only went to sessions that were meaningful for my job.  I’ve since had a much better time (and learned more) by choosing which sessions I’ll attend based on the following criteria, in this order:
1) topic interestingness
2) speaker interestingess
3) relevance to my job duties

(See Tips #1 and #2 in Skud’s recent Ten tips for tech conference attendees post.)

So, right out of the gate at PgWest, I’m in a python talk* – Adrian K’s (of LinuxFestNW fame) discussion on Dabo.  Dabo’s a python desktop framework;  I program primarily in Perl, and I’ve never touched a desktop app.  Adrian’s example project was a management system for a plant nursery, which I *do* understand, so I had a point of reference into the material (the methods & options used to track plants made sense to me).  I really wanted to talk to him more about this app, but never caught up with him.  (The hallway track felt kind of rushed for me this time.)  I got a good idea for form validation – if user tries to enter a blank value where one is not allowed, they get a pop-up immediately and the original text (if there was any) is put back in the field, forcing the user to accept the original input or enter something new before they can proceed to the next field.  This is a step up from giving the user the error message after they’ve submitted the form.

Next we were on to JD’s keynote, featuring the usual heckling of and by the podium.

Then Mark’s & my talk about pg_proctab, which ended with some live demos & some audience participation, the way I like it.

A bunch of us went to lunch at Honeyhole Sandwiches, where I tried the “Texas Tease” – BBQ chicken.  The sandwich was excellent.  I *highly* recommend the fries.

Scott Bailey’s Temporal Data talk was *packed*.  He talked about the “period” datatype, featured in both his own (Chronos) and Jeff Davis’s PgTemporal project.  You can do unions & intersects on time periods.  I am thinking this would be a useful datatype for searching large tables of log entries.

Based on Scott’s talk, I decided to go to Jeff’s “Not Just UNIQUE” talk, because he would be discussing this in a little more detail.  This meant I missed the session on backup & recovery.  (See comment above about more material than I can fit in my schedule.)

I spent the last session partly in the hackers’ lounge, working on some pg_proctab wrapper scripts with Mark.

Then it was off to the EDB-sponsored after-party, where I caught up with Lloyd Albin, who spoke at PDXPUG about a year ago.  He brought me up-to-date on the work he’s done on the project, including a twitter feed to let clients know of updates, which I think is really cool.

*Which I was late to, because we were installing the snacks in the Hackers’ Lounge (thanks, Mark!)

8 October, 2009

Are you going to PgWest?

by gorthx

At a loss for what to do next weekend?  Grab your rain gear & head on up to Seattle for PgWest 2009.

There’ll be three days of talks & tutorials plus a hackers’ lounge.   After-party plans are nebulous at this time, but we are researching options.  (Psst–pub crawl!)

Come join the fun!

At a loss for what to do next weekend?  Grab your rain gear & head on up to Seattle for PgWest 2009: http://www.postgresqlconference.org/2009/west/.

Three days of talks & tutorials http://www.postgresqlconference.org/2009/west/schedule plus a hackers’ lounge.  http://wiki.postgresql.org/wiki/Hackers%27_Lounge.   After-party plans are nebulous at this time.  (Psst–pub crawl!)

Come join the fun!

6 June, 2009

Book Review (part I): Refactoring SQL Applications, with bonus queries

by gorthx

It’s taking me quite a while to wade through Stephan Faroult’s Refactoring SQL Applications. I just finished Chapter 2 & figured I’d better just go ahead with the review.

It’s quite humorous – I mean, there’s a section called “Queries of Death” – but this is some dense material, make no mistake. I tried to keep my copy nice so I could loan it to others, but I had to give up and get out The Pen, and it’s been highlighted and scribbled on.

Small gripe: the layout of the example queries makes them hard to read (capitalizing the conditionals would help). I’d also like to see more examples of result sets.

The section about statistics sparked a lively discussion on #pdxpug about cardinality vs selectivity*. What I thought I knew about indexes has been thrown on its head – don’t base your decisions just on whether or not the column in question is searched on.

One of the recommendations for “Sanity Checks” is to take a good look at your indexes. For starters, check for tables with no indexes, or a lot of indexes. There’s a sample query to pull the number of rows, indexes, and some info about those indexes for each table. Faroult only shows sample queries for Oracle, SQL Server, and MySQL, so Selena & I put our heads together & came up with an equivalent for PostgreSQL:

(Only works on 8.3; ditch the pg_size_pretty if you’re on an earlier version)

SELECT
    t.tablename,
    pg_size_pretty(c.reltuples::bigint) AS num_rows,
    c.reltuples AS num_rows,
    count(indexname) AS number_of_indexes,
    CASE WHEN x.is_unique = 1 THEN 'Y'
       ELSE 'N'
    END AS unique,
    SUM(case WHEN number_of_columns = 1 THEN 1
              ELSE 0
            END) AS single_column,
    SUM(case WHEN number_of_columns IS NULL THEN 0
             WHEN number_of_columns = 1 THEN 0
             ELSE 1
           END) AS multi_column
FROM pg_tables t
LEFT OUTER JOIN pg_class c ON t.tablename=c.relname
LEFT OUTER JOIN
       (SELECT indrelid,
           max(CAST(indisunique AS integer)) AS is_unique
       FROM pg_index
       GROUP BY indrelid) x
       ON c.oid = x.indrelid
LEFT OUTER JOIN
    ( SELECT c.relname as ctablename, ipg.relname as indexname, x.indnatts as number_of_columns FROM pg_index x
           JOIN pg_class c ON c.oid = x.indrelid
           JOIN pg_class ipg on ipg.oid = x.indexrelid  )
    as foo
    ON t.tablename = foo.ctablename
WHERE t.schemaname='public'
GROUP BY t.tablename, c.reltuples, x.is_unique
order by 2;

It took quite a bit of chocolate to wrap that up…afterwards, Selena decided that it would be neat to look at table & index sizes and see which indexes were being scanned and how many tuples fetched:

SELECT
    t.tablename,
    indexname,
    c.reltuples AS num_rows,
    pg_size_pretty(pg_relation_size(t.tablename)) AS table_size,
    pg_size_pretty(pg_relation_size(indexrelname)) AS index_size,
    CASE WHEN x.is_unique = 1  THEN 'Y'
       ELSE 'N'
    END AS unique,
    idx_scan AS number_of_scans,
    idx_tup_read AS tuples_read,
    idx_tup_fetch AS tuples_fetched
FROM pg_tables t
LEFT OUTER JOIN pg_class c ON t.tablename=c.relname
LEFT OUTER JOIN
       (SELECT indrelid,
           max(CAST(indisunique AS integer)) AS is_unique
       FROM pg_index
       GROUP BY indrelid) x
       ON c.oid = x.indrelid
LEFT OUTER JOIN
    ( SELECT c.relname as ctablename, ipg.relname as indexname, x.indnatts as number_of_columns, idx_scan, idx_tup_read, idx_tup_fetch,indexrelname FROM pg_index x
           JOIN pg_class c ON c.oid = x.indrelid
           JOIN pg_class ipg ON ipg.oid = x.indexrelid
           JOIN pg_stat_all_indexes psai ON x.indexrelid = psai.indexrelid )
    as foo
    ON t.tablename = foo.ctablename
WHERE t.schemaname='public'
order by 1,2;



cardinality: size of the relation (“number of rows in [something]”)
selectivity: percent of the relation that’s selected
cardinality * selectivity = number of tuples in your results set.

14 April, 2009

Book Review: The Manga Guide to Databases

by gorthx

Everybody’s reading it!

I admit, I was skeptical at first. But, The Manga Guide to Databases is not just a lightweight illustrated treatment of RDBMSs. Detailed text follows each chapter in the story of Princess Ruruna and her servant Cain as they attempt to bring order to the chaos of the Kingdom of Koa’s fruit export business. Don’t skip those text sections, there’s additional info in there (I learned something new: the definitions of DDL, DML, and DCL as types of SQL queries) and the study questions are relevant to the material. I like that the book covers different data models, and includes a discussion of some relational operations. The explanations of transactions and ACID principles were very concise & clear. My only complaint is the somewhat confusing explanation of joins.

This book gets bonus points for having a thorough index and a “frequently used SQL statements” cheat sheet.

Bottom line: Entertaining yet educational.

Tags:
3 April, 2009

Friday Happy Hour: Gimme some sugar, baby.

by gorthx

Time for some more fun with managing user data, of the “who was connected where and when” type. I’m going to use PostgreSQL row constructors & subqueries to filter my data.

I have a table that contains switch names & ports which are connected to other switches:
testytest=# SELECT switch_name, switch_port, connected_to
FROM switch_connections;
switch_name | switch_port | connected_to
-------------+-------------+--------------
switch-1 | 1 | switch-2
switch-1 | 2 | switch-3
switch-2 | 1 | switch-1
switch-3 | 1 | switch-1
(4 rows)

Another table contains hostnames found on each switch port at a given point in time:

read more »

6 March, 2009

Friday Happy Hour: PostgreSQL & mac addresses

by gorthx

Postgres has a datatype just for storing mac addresses. Let’s check it out!

read more »