Friday Happy Hour: Gimme some sugar, baby.

by gorthx

Time for some more fun with managing user data, of the “who was connected where and when” type. I’m going to use PostgreSQL row constructors & subqueries to filter my data.

I have a table that contains switch names & ports which are connected to other switches:
testytest=# SELECT switch_name, switch_port, connected_to
FROM switch_connections;
switch_name | switch_port | connected_to
-------------+-------------+--------------
switch-1 | 1 | switch-2
switch-1 | 2 | switch-3
switch-2 | 1 | switch-1
switch-3 | 1 | switch-1
(4 rows)

Another table contains hostnames found on each switch port at a given point in time:


testytest=# SELECT date_found, switch_name, switch_port, hostname
FROM host_history;
date_found | switch_name | switch_port | hostname
---------------------+-------------+-------------+-----------
2009-03-01 10:00:00 | switch-1 | 1 | yachted
2009-03-01 10:00:00 | switch-1 | 1 | yakking
...
2009-03-01 10:00:00 | switch-1 | 2 | keystroke
2009-03-01 10:00:00 | switch-1 | 2 | kibitzers
...
2009-03-01 10:00:00 | switch-1 | 3 | zooming
2009-03-01 10:00:00 | switch-1 | 4 | zincked
...
2009-03-01 10:00:00 | switch-2 | 1 | zincked
2009-03-01 10:00:00 | switch-2 | 1 | zooming
2009-03-01 10:00:00 | switch-2 | 2 | yakking
2009-03-01 10:00:00 | switch-2 | 4 | yachted
...
2009-03-01 10:00:00 | switch-3 | 1 | zincked
2009-03-01 10:00:00 | switch-3 | 1 | zooming
2009-03-01 10:00:00 | switch-3 | 2 | kibitzers
2009-03-01 10:00:00 | switch-3 | 3 | keystroke

[ Yes, I have skipped a huge part of the data-gathering here, in which I match up a mac address from the switch with an IP from the router’s arp cache, and then look up the hostname that matches that IP. It’s not really the point of this post. I would be happy to discuss it in another post, or over beer. :) ]

The key here is that the hostnames show up in two places: the port on the switch the user is actually connected to, and the port on any other switch that’s connected to that switch.
Example: hostname “yakking” appears in the table on both switch-1 port 1 and switch-2 port 2. We saw in our SELECT on the switch_connections table that switch-1 port 1 is actually connected to switch-2 port 1.

I’m only interested in the switch the user’s directly connected to, so I want to remove the entries from host_history that correspond to the entries in switch_connections. I start playing with some SELECT statements to see what I’m working with:

testytest=# SELECT DISTINCT switch_name, switch_port
FROM host_history
WHERE (
switch_name IN (SELECT switch_name FROM switch_connections)
AND switch_port IN (SELECT switch_port FROM switch_connections)
);
switch_name | switch_port
-------------+-------------
switch-1 | 1
switch-1 | 2
switch-2 | 1
switch-2 | 2
switch-3 | 1
switch-3 | 2

As you can see, that doesn’t return the desired results set. (Another reason to use transactions & maybe run a SELECT before you DELETE, to make sure you’re getting the results you want!) Section 9.19, “Subquery Expressions”, in the current Pg docs http://www.postgresql.org/docs/8.3/interactive/functions-subquery.html gives us a clue that we can pass a single subquery to our WHERE clause:

testytest=# SELECT DISTINCT switch_name, switch_port
FROM host_history
WHERE (switch_name, switch_port) IN
(SELECT switch_name, switch_port
FROM switch_connections)
;
switch_name | switch_port
-------------+-------------
switch-1 | 1
switch-1 | 2
switch-2 | 1
switch-3 | 1
(4 rows)

Now that’s more like it! (I recommend taking a look at EXPLAIN ANALYZE to see what it’s doing.)

From here it’s a short step to removing the data I don’t want:

DELETE FROM host_history
WHERE (switch_name, switch_port) IN
(SELECT switch_name, switch_port
FROM switch_connections)
;

Easy!

Advertisements
%d bloggers like this: