Debugging complex PostgreSQL queries with pgdebug

In a software project I was leading recently, I wanted to push the boundaries of what could be achieved within PostgreSQL, without resorting to application level code (quite a lot, it turns out). The project required complex reports, and I ended up writing queries like this:

with vars as (
    select *, start_tstamp at time zone timezone as start_time
        , least(clock_timestamp(),
            (end_tstamp + '23:59:59'::interval) at time zone timezone) as end_time
        , $4::bigint as user_id, $5::bigint as geometry_id, $6::bigint as role_id
    from (
        select $1::bigint as project_id, ($2::timestamp without time zone) as start_tstamp
            , ($3::timestamp without time zone) as end_tstamp
            , (select timezone from projects where id = $1) as timezone
    ) a
)
, last_user_sync as (
    select user_id, max(synced_at) as synced_at
    from users_projects
    where case when 0 = 0 then true else user_id = 0 end
    group by user_id
)
, relevant_events as (
    select row_number() over
       (partition by user_id, date_trunc('day', events.created_at at time zone (select timezone from vars))
            , geometry_id order by events.created_at) as row_num
        , user_id, events.project_id, geometry_id, geometry_name, type
        , date_trunc('day', events.created_at at time zone (select timezone from vars)) as day
        , events.created_at
        , lead(events.created_at, 1)
            over (partition by user_id,
                date_trunc('day', events.created_at at time zone (select timezone from vars)), geometry_id
                order by events.created_at rows between current row and 1 following) as pair_time
    from events
    join users on events.user_id = users.id
    join user_roles on user_roles.id = users.role_id
    where events.project_id = (select project_id from vars)
        and (type = 'entry' or type = 'exit')
        and (case when (select user_id from vars) = 0 then true else user_id = (select user_id from vars) end)
        and (case when (select geometry_id from vars) = 0 then true else geometry_id = (select geometry_id from vars) end)
        and (case when (select role_id from vars) = 0 then true else user_roles.id = (select role_id from vars) end)
        and events.created_at between (select start_time from vars) and
                                      (select end_time from vars)
)
, visit_intervals as (
    select relevant_events.user_id, relevant_events.day, geometry_id, geometry_name
        , case when type = 'exit' then relevant_events.day at time zone (select timezone from vars) at time zone 'UTC'
        else relevant_events.created_at end as start

        , case when type = 'exit' then relevant_events.created_at
        else coalesce(pair_time,
                      least(synced_at,
                            relevant_events.day at time zone (select timezone from vars) 
                            at time zone 'UTC' + '1 day'::interval - '1 second'::interval))
        end as finish
    from relevant_events
    join last_user_sync on relevant_events.user_id = last_user_sync.user_id
    where type = 'entry' or (type = 'exit' and row_num = 1)
)
, intervals_with_pos as (
    select *
    from visit_intervals vi
    join lateral (
        select (case when speed <= 0.1 then 0
                     when speed <= 1.39 then 1
                     when speed <= 4.17 then 2
                     when speed <= 11.11 then 3
                     else 4 end) as speed_band
            , lead(p.created_at, 1, vi.finish) over (order by p.created_at rows between current row and 1 following)
                - p.created_at as duration
            , ST_Distance(
                ST_SetSRID(ST_Point(lon, lat), 4326)::geography
                , ST_SetSRID(ST_Point(
                    (lead(lon, 1) over (order by p.created_at rows between current row and 1 following)),
                    (lead(lat, 1) over (order by p.created_at rows between current row and 1 following))
                    ), 4326)) as distance
            , p.created_at
            , speed
        from positions p
        where p.created_at >= vi.start and p.created_at < vi.finish
            and p.user_id = vi.user_id
    ) p1 on true
)
, speed_band_intervals as (
    select vis.user_id, geometry_id, day
        , sum(duration) as speed_band_time
        , sum(distance) as speed_band_distance
        , min(speed) as min_speed, max(speed) as max_speed, speed_band
        , first(geometry_name) as geometry_name
    from intervals_with_pos vis
    group by vis.user_id, day, geometry_id, speed_band
)
select user_id, geometry_id, speed_band_time, speed_band_distance, min_speed, max_speed
    , speed_band, geometry_name
    , to_char(day, 'DD/MM/YY HH24:MI:SS') as day
    , extract(epoch from speed_band_time) as duration
    , 100 * (extract(epoch from speed_band_time) / extract(epoch from sum(speed_band_time)
        over (partition by user_id, geometry_id, day))) as percentage
    , users.name as user_name, company as user_company
    , coalesce(nullif(user_roles.name, ''), users.description) as user_role
from speed_band_intervals
join users on users.id = user_id
join user_roles on users.role_id = user_roles.id
order by user_name, day, lower(geometry_name), speed_band

As I practised and developed these things gradually, I concluded that CTEs (those subqueries at the beginning of the query which are defined using with) and disciplined formatting allow queries up to about 200 lines long to remain readable and maintainable.

However, one problem with these is that if there was a bug, I could, for example, get an empty result sometimes:

 user_id | geometry_id | speed_band_time | ...
---------+-------------+-----------------+-...
(0 rows)

What went wrong? Which part of the query caused the problem? This output would, of course, give me no clue. So then I would have to start taking chunks of this query, CTE by CTE, and execute them in psql to find out which step caused the problem.

I wasn't aware of any tools that could help me with this, so I did it manually.

Now that I completed my latest book (PostgreSQL: Time and Temporal Data), I have a bit of time so I decided to bite the bullet and write a small tool to help with debugging complicated queries.

The idea is to break down the query into multiple queries which produce a result set for each CTE. So for this query:

with a as (
    select * from customers
)
, b as (
    select * from projects 
)
select * 
from b 
join a using (customer_id)

I need to run the following queries to see CTE results:

with a as (
    select * from customers
)
select * from a;

with a as (
    select * from customers
)
, b as (
    select * from projects
)
select * from b;

Since CTEs often rely on the preceding CTEs, I need to preserve all the CTE definitions prior to the one I'm interested in.

I made pgdebug to produce these queries automatically based on the input SQL. It can handle CTEs, recursive CTEs, set of rows defined with values, and subqueries in the from clause. I haven't seen a need to examine other subqueries (e.g. in the where clause), because they are typically simple and return a single result. It might be something to add support for in the future.

It also doesn't support data-modifying statements (insert, update, delete) even though technically they can also have CTEs.

Using pgdebug

pgdebug is based on Node.js, so you can install it from npm:

npm install -g pgdebug

Here is an example of applying pgdebug to a query with CTEs (the original query is included in the output):

~/t/pgdebug ❯❯❯ npx pgdebug -c "with a as (                                                                         ⏎
    select * from customers
)
, b as (
    select * from projects
)
select *
from b
join a using (customer_id)"
WITH a AS (SELECT * FROM "customers") SELECT * FROM "a";
WITH a AS (SELECT * FROM "customers"), b AS (SELECT * FROM "projects") SELECT * FROM "b";
with a as (
    select * from customers
)
, b as (
    select * from projects
)
select *
from b
join a using (customer_id)

Here is an example of a recursive CTE (note that the recursive qualifier is preserved in the output):

~/t/pgdebug ❯❯❯ npx pgdebug -c "with recursive t(n) as (
    values (1)
  union all
    select n+1 from t where n < 100
)
select sum(n) from t"
WITH RECURSIVE t ("n") AS ((VALUES (1)) UNION ALL (SELECT (("n") + (1)) FROM "t" WHERE (("n") < (100)))) SELECT * FROM "t";
with recursive t(n) as (
    values (1)
  union all
    select n+1 from t where n < 100
)
select sum(n) from t

And here is an example with subqueries in the from clause:

~/t/pgdebug ❯❯❯ npx pgdebug -c "select * from (select * from customers) c join (select * from projects) p using (customer_id)"
SELECT * FROM "customers";
SELECT * FROM "projects";
select * from (select * from customers) c join (select * from projects) p using (customer_id)

Running all the queries at once

Producing these queries isn't all that useful in itself as you still have to copy-paste and execute them, so I use pgdebug in combination with psql, first piping the queries into psql and then piping the output into less so I can scroll through it:

 echo "select * from (select zone_id, project_id, name from zones) z 
 join (select project_id, name from projects) p using (project_id)" 
 | npx pgdebug -i 
 | psql -q -e -d pgtime 
 | less

-q supresses psql messages, and -e makes it output the query SQL, which is handy to understand which result set I'm looking at. The result looks like this:

SELECT "zone_id",
"project_id",
"name" FROM "zones";
 zone_id | project_id |  name  
---------+------------+--------
      14 |          1 | 
       1 |          1 | Zone 1
       2 |          1 | Zone 2
       2 |          1 | 
(4 rows)

Time: 1.648 ms
SELECT "project_id",
"name" FROM "projects";
 project_id |   name    
------------+-----------
          1 | Highway 1
          2 | Highway 2
          2 | Highway 2
          1 | Highway 1
(4 rows)

Time: 0.290 ms
select * from (select zone_id, project_id, name from zones) z join (select project_id, name from projects) p using (project_id)
 project_id | zone_id |  name  |   name    
------------+---------+--------+-----------
          1 |      14 |        | Highway 1
          1 |      14 |        | Highway 1
          1 |       1 | Zone 1 | Highway 1
          1 |       1 | Zone 1 | Highway 1
          1 |       2 | Zone 2 | Highway 1
          1 |       2 | Zone 2 | Highway 1
          1 |       2 |        | Highway 1
          1 |       2 |        | Highway 1
(8 rows)

Time: 0.732 ms

With every intermediate result set available for viewing, finding where things have gone awry is much easier!

In terms of implementation, pgdebug relies on pg-query-parser, a Node.js wrapper for the PostgreSQL parser, and node-jq, a wrapper for the powerful jq tool for processing JSON data. I could have probably bludgeoned my way through with regexes, but these packages allowed me to write more robust code (and it's likely shorter too).

Finally, here is the documentation page which has the details of the command line options: pgdebug.

I hope you also find pgdebug useful in debugging queries!