Archive for November, 2009

6 November, 2009

Refactoring!

by gorthx

Last night at the hackathon,  we refactored one of our queries from my review of Refactoring SQL Applications.*

First, we had a duplicate field name in the original select.  Not a problem if you’re just doing a select, but if you want to create a table (temp or otherwise) from the data, it won’t work.  So we replaced the first num_rows with rows_in_bytes.

Also, reading over this 5 months after the original attemp, I realize it’s a lot clearer if we don’t use table aliases in the outer SELECTs.

Then, we got some advice from Greg Smith that we shouldn’t do joins on pg_class.relname – this can screw you up if you have different schemas with identical table names.  You want to use oids (which I’d always thought was not desirable, but I’m assured it’s ok if you’re doing it with the system tables – you don’t want your application to depend on them, though. :) )  So, instead, we match pg_namespace.oid with pg_class.relnamespace.

Selena’s illustration of how this works:
SELECT relname, relkind FROM pg_class
JOIN pg_namespace ON pg_namespace.oid = pg_class.relnamespace WHERE relkind = 'r' AND pg_namespace.nspname = 'public';

The new & improved version of the query can be found on the Pg wiki.

I wanted to compare the new query against the old, so I created a couple of temp tables containing the results… and discovered we had a couple of data discrepancies:  a few of our tables were listed twice in the original query results, with different values for num_rows, only one of which was correct for the current schema:

portal=# SELECT tablename, rows_in_bytes, num_rows FROM index_experiment_1
WHERE tablename IN  ('detectorid_count','stations','test_agg')
ORDER BY 1;
tablename     | rows_in_bytes | num_rows
------------------+---------------+----------
detectorid_count | 0 bytes       |        0
detectorid_count | 631 bytes     |      631
stations         | 22 bytes      |       22
stations         | 350 bytes     |      350
test_agg         | 0 bytes       |        0
test_agg         | 1386 bytes    |     1386
(6 rows)

portal=# SELECT count(*) from detectorid_count;
count
——-
0
(1 row)

portal=# SELECT count(*) from stations;
count
——-
350
(1 row)

portal=# SELECT count(*) from test_agg ;
count
——-
0
(1 row)

It turns out we’d run into the exact problem that Greg had warned us about.  The additional rows were from identically-named tables in other namespaces.

Find your namespaces:
portal=# SELECT nspname from pg_namespace order by 1;
nspname
--------------------
information_schema
pg_catalog
pg_temp_1
pg_temp_2
pg_toast
pg_toast_temp_1
pg_toast_temp_2
public
selena
wendell
(10 rows)

Find your data:
portal=# SELECT count(*) from selena.detectorid_count ;
count
-------
631
(1 row)

portal=# SELECT count(*) from wendell.stations ;
count
——-
22
(1 row)

portal=# SELECT count(*) from selena.test_agg ;
count
——-
1386
(1 row)

Note that these match the additional data from our original query.

Thanks, Greg!


* No, I haven’t finished reading it yet…I don’t read during the summer, I ride my bike.

Advertisements