Posts tagged ‘book reviews’

15 December, 2010

PostgreSQL 9 Admin Cookbook

by gorthx

The PostgreSQL community has been waiting for a “cookbook”, and I’m really excited that we finally have one. The PostgreSQL 9 Admin Cookbook contains many tips & techniques I’m going to put to immediate use. Like, pgloader and the ON_ERROR_STOP option to psql. This book also reminded me that there’s still a lot I don’t know about Postgres: for example, I’ve never worked with pg_controldata or the quote_ident() function. If you’re a Pg admin (or wannabe), you should give this book at least a run-through, even if you think you already know everything.

Most of the “recipes” in this cookbook will stand on their own; some require material from previous or other referenced sections to make sense. I am a sucker for conversational style, and while the book gets off to a rough start, it does even out after a bit. Big blocks of SQL are formatted consistently in a style that, while it’s not one I use myself, is easy to read. (This is very important.)

The authors give a great explanation of why they prefer their filesystem set up a specific way, and how to do it (of course, it probably helps that I agree with them on this point); and good advice about schema & relation names. Specific problem-solving tools I found useful are: the list of steps to troubleshoot failed connections; specific things to do if a backend is hung, or a query is taking too long; and generating test data and taking random samples of real-world data. They also give warnings where something you do might cause application downtime.

Sadly, this book suffers from inadequate editing. I often struggled against the grammar and organization to understand what the authors were trying to express. This frustrated me and slowed me down; a detriment in a technical manual. I also found that I couldn’t rely on the (seemingly auto-generated) index to find what I need, although the TOC and chapter headings helped out a lot there.

Bottom line: would I buy a hard copy? At the listed pricepoint (US$45 at this time), probably not. However, it’s a great first effort and I eagerly await the second edition.


Disclaimer: I received a free review copy of this book from the publisher, in pdf format.

6 June, 2009

Book Review (part I): Refactoring SQL Applications, with bonus queries

by gorthx

It’s taking me quite a while to wade through Stephan Faroult’s Refactoring SQL Applications. I just finished Chapter 2 & figured I’d better just go ahead with the review.

It’s quite humorous – I mean, there’s a section called “Queries of Death” – but this is some dense material, make no mistake. I tried to keep my copy nice so I could loan it to others, but I had to give up and get out The Pen, and it’s been highlighted and scribbled on.

Small gripe: the layout of the example queries makes them hard to read (capitalizing the conditionals would help). I’d also like to see more examples of result sets.

The section about statistics sparked a lively discussion on #pdxpug about cardinality vs selectivity*. What I thought I knew about indexes has been thrown on its head – don’t base your decisions just on whether or not the column in question is searched on.

One of the recommendations for “Sanity Checks” is to take a good look at your indexes. For starters, check for tables with no indexes, or a lot of indexes. There’s a sample query to pull the number of rows, indexes, and some info about those indexes for each table. Faroult only shows sample queries for Oracle, SQL Server, and MySQL, so Selena & I put our heads together & came up with an equivalent for PostgreSQL:

(Only works on 8.3; ditch the pg_size_pretty if you’re on an earlier version)

SELECT
    t.tablename,
    pg_size_pretty(c.reltuples::bigint) AS num_rows,
    c.reltuples AS num_rows,
    count(indexname) AS number_of_indexes,
    CASE WHEN x.is_unique = 1 THEN 'Y'
       ELSE 'N'
    END AS unique,
    SUM(case WHEN number_of_columns = 1 THEN 1
              ELSE 0
            END) AS single_column,
    SUM(case WHEN number_of_columns IS NULL THEN 0
             WHEN number_of_columns = 1 THEN 0
             ELSE 1
           END) AS multi_column
FROM pg_tables t
LEFT OUTER JOIN pg_class c ON t.tablename=c.relname
LEFT OUTER JOIN
       (SELECT indrelid,
           max(CAST(indisunique AS integer)) AS is_unique
       FROM pg_index
       GROUP BY indrelid) x
       ON c.oid = x.indrelid
LEFT OUTER JOIN
    ( SELECT c.relname as ctablename, ipg.relname as indexname, x.indnatts as number_of_columns FROM pg_index x
           JOIN pg_class c ON c.oid = x.indrelid
           JOIN pg_class ipg on ipg.oid = x.indexrelid  )
    as foo
    ON t.tablename = foo.ctablename
WHERE t.schemaname='public'
GROUP BY t.tablename, c.reltuples, x.is_unique
order by 2;

It took quite a bit of chocolate to wrap that up…afterwards, Selena decided that it would be neat to look at table & index sizes and see which indexes were being scanned and how many tuples fetched:

SELECT
    t.tablename,
    indexname,
    c.reltuples AS num_rows,
    pg_size_pretty(pg_relation_size(t.tablename)) AS table_size,
    pg_size_pretty(pg_relation_size(indexrelname)) AS index_size,
    CASE WHEN x.is_unique = 1  THEN 'Y'
       ELSE 'N'
    END AS unique,
    idx_scan AS number_of_scans,
    idx_tup_read AS tuples_read,
    idx_tup_fetch AS tuples_fetched
FROM pg_tables t
LEFT OUTER JOIN pg_class c ON t.tablename=c.relname
LEFT OUTER JOIN
       (SELECT indrelid,
           max(CAST(indisunique AS integer)) AS is_unique
       FROM pg_index
       GROUP BY indrelid) x
       ON c.oid = x.indrelid
LEFT OUTER JOIN
    ( SELECT c.relname as ctablename, ipg.relname as indexname, x.indnatts as number_of_columns, idx_scan, idx_tup_read, idx_tup_fetch,indexrelname FROM pg_index x
           JOIN pg_class c ON c.oid = x.indrelid
           JOIN pg_class ipg ON ipg.oid = x.indexrelid
           JOIN pg_stat_all_indexes psai ON x.indexrelid = psai.indexrelid )
    as foo
    ON t.tablename = foo.ctablename
WHERE t.schemaname='public'
order by 1,2;



cardinality: size of the relation (“number of rows in [something]”)
selectivity: percent of the relation that’s selected
cardinality * selectivity = number of tuples in your results set.