Book Review (part I): Refactoring SQL Applications, with bonus queries

by gorthx

It’s taking me quite a while to wade through Stephan Faroult’s Refactoring SQL Applications. I just finished Chapter 2 & figured I’d better just go ahead with the review.

It’s quite humorous – I mean, there’s a section called “Queries of Death” – but this is some dense material, make no mistake. I tried to keep my copy nice so I could loan it to others, but I had to give up and get out The Pen, and it’s been highlighted and scribbled on.

Small gripe: the layout of the example queries makes them hard to read (capitalizing the conditionals would help). I’d also like to see more examples of result sets.

The section about statistics sparked a lively discussion on #pdxpug about cardinality vs selectivity*. What I thought I knew about indexes has been thrown on its head – don’t base your decisions just on whether or not the column in question is searched on.

One of the recommendations for “Sanity Checks” is to take a good look at your indexes. For starters, check for tables with no indexes, or a lot of indexes. There’s a sample query to pull the number of rows, indexes, and some info about those indexes for each table. Faroult only shows sample queries for Oracle, SQL Server, and MySQL, so Selena & I put our heads together & came up with an equivalent for PostgreSQL:

(Only works on 8.3; ditch the pg_size_pretty if you’re on an earlier version)

SELECT
    t.tablename,
    pg_size_pretty(c.reltuples::bigint) AS num_rows,
    c.reltuples AS num_rows,
    count(indexname) AS number_of_indexes,
    CASE WHEN x.is_unique = 1 THEN 'Y'
       ELSE 'N'
    END AS unique,
    SUM(case WHEN number_of_columns = 1 THEN 1
              ELSE 0
            END) AS single_column,
    SUM(case WHEN number_of_columns IS NULL THEN 0
             WHEN number_of_columns = 1 THEN 0
             ELSE 1
           END) AS multi_column
FROM pg_tables t
LEFT OUTER JOIN pg_class c ON t.tablename=c.relname
LEFT OUTER JOIN
       (SELECT indrelid,
           max(CAST(indisunique AS integer)) AS is_unique
       FROM pg_index
       GROUP BY indrelid) x
       ON c.oid = x.indrelid
LEFT OUTER JOIN
    ( SELECT c.relname as ctablename, ipg.relname as indexname, x.indnatts as number_of_columns FROM pg_index x
           JOIN pg_class c ON c.oid = x.indrelid
           JOIN pg_class ipg on ipg.oid = x.indexrelid  )
    as foo
    ON t.tablename = foo.ctablename
WHERE t.schemaname='public'
GROUP BY t.tablename, c.reltuples, x.is_unique
order by 2;

It took quite a bit of chocolate to wrap that up…afterwards, Selena decided that it would be neat to look at table & index sizes and see which indexes were being scanned and how many tuples fetched:

SELECT
    t.tablename,
    indexname,
    c.reltuples AS num_rows,
    pg_size_pretty(pg_relation_size(t.tablename)) AS table_size,
    pg_size_pretty(pg_relation_size(indexrelname)) AS index_size,
    CASE WHEN x.is_unique = 1  THEN 'Y'
       ELSE 'N'
    END AS unique,
    idx_scan AS number_of_scans,
    idx_tup_read AS tuples_read,
    idx_tup_fetch AS tuples_fetched
FROM pg_tables t
LEFT OUTER JOIN pg_class c ON t.tablename=c.relname
LEFT OUTER JOIN
       (SELECT indrelid,
           max(CAST(indisunique AS integer)) AS is_unique
       FROM pg_index
       GROUP BY indrelid) x
       ON c.oid = x.indrelid
LEFT OUTER JOIN
    ( SELECT c.relname as ctablename, ipg.relname as indexname, x.indnatts as number_of_columns, idx_scan, idx_tup_read, idx_tup_fetch,indexrelname FROM pg_index x
           JOIN pg_class c ON c.oid = x.indrelid
           JOIN pg_class ipg ON ipg.oid = x.indexrelid
           JOIN pg_stat_all_indexes psai ON x.indexrelid = psai.indexrelid )
    as foo
    ON t.tablename = foo.ctablename
WHERE t.schemaname='public'
order by 1,2;



cardinality: size of the relation (“number of rows in [something]”)
selectivity: percent of the relation that’s selected
cardinality * selectivity = number of tuples in your results set.

Advertisements
%d bloggers like this: