Recursive Common Table Expressions with Postgres

This week, I’ve been helping a client with some performance problems with their API. We’ve been focusing on tuning some Postgres queries to reduce latency and improve response times.

One area of the app that was struggling was a query that processes a tree structure. The app allows users to create folders and upload files to them.

The database structure is designed in a way that resembles the following:

create table folders (
    id bigserial primary key,
    name text not null,
    parent_id bigint,
    foreign key (parent_id) references folders(id)
);

create table files (
    id bigserial primary key,
    name text not null,
    folder_id bigint,
    size bigint not null,
    foreign key (folder_id) references folders(id)
);

Here, we have a table called folders and a table for files in each folder.

The parent_id column in the folders table is a self-referencing foreign key that allows us to form a tree by pointing each folder to its parent folder.

The process I was optimizing involved finding all the files at a specific folder path, for example, in widgets/12345/attachments. The API used a simple while loop in application code to walk down the tree, basically doing the following:

select id from folders where name = 'widgets' and parent_id is null; -- result: 1
select id from folders where name = '12345' and parent_id = 1;       -- result: 2
select id from folders where name = 'attachments' and parent_id = 2; -- result: 3
select * from files where folder_id = 3;

This strategy takes multiple round trips to the server, and the deeper the path, the more queries are issued. The network latency can add up quickly and affect the user experience.

Recursive Common Table Expressions

I wanted to improve latency by making only a single network round trip from the application to the database. I needed a way to recursively walk the entire folder path in a single SQL query.

Postgres allows developers to define recursive common table expressions ( CTE) that union a base case with a recursive case. It will expand the table expression on demand until it has satisfied the entire query.

For this demo, I’ve loaded the folders and files of a default Create React App application into a Postgres database using the above table schema. I’ll show off a few queries using recursive CTEs that make quick work of common tree-related queries.

Listing all the folders

One thing you might want to do with a file database is list all the folders in a tree. A recursive CTE for this is straightforward:

with recursive cte as (
    select * from folders where parent_id is null and name = 'my-app'
    union all
    select f.* from folders f join cte on f.parent_id = cte.id
)
select * from cte;
id name parent_id
1 my-app
4850 .git 1
2 node_modules 1
4849 public 1
4883 src 1
4879 hooks 4850
4875 info 4850
4876 logs 4850
4851 objects 4850

I’ve cut this off short because there are 4,883 files in a default Create React App.

This CTE has a base case that looks for a root folder (its parent_id is null) with the name my-app. That row is union alled with the recursive case that finds rows in the folders table with a parent_id matching the rows already discovered. Postgres can expand this until it has discovered all possible rows.

Listing folders with their paths

Another common need for a database tree structure is visualizing each folder’s path. Since each folder has a name, we can recursively build up a path column by concatenating the names of all the parent folders with the current name:

with recursive cte as (
    select f.*, name as path from folders f where f.parent_id is null and f.name = 'my-app'
    union all
    select 
        f.*,
        -- concatenate the parent path with the current row's name to form a deeper path
        cte.path || '/' || f.name as path 
    from folders f join cte on f.parent_id = cte.id
)
select * from cte;
id path
1 my-app
4850 my-app/.git
2 my-app/node_modules
4849 my-app/public
4883 my-app/src
4879 my-app/.git/hooks
4875 my-app/.git/info
4876 my-app/.git/logs
4851 my-app/.git/objects

Computing total file size

By joining all the discovered rows from the folders table with the rows in the files table we can add up the sizes of every file in every folder to get a grand total:

with recursive cte as (
    select * from folders where parent_id is null and name = 'my-app'
    union all
    select f.* from folders f join cte on f.parent_id = cte.id
)
select sum(size) / 1024 / 1024  as total_mb from files join cte on folder_id = cte.id;
total_mb
236.7352590560913086

Getting files in a particular folder

The primary use case I was trying to improve in my client’s API was finding files in a particular folder. Given the input "my-app/node_modules/react", how would I use a recursive CTE to find the top-level files in the react folder (without recursing any deeper)?

At first, I thought I could use the path column from the previous query to find the id of the folder I wanted and then join to files from there. It would look something like this:

with recursive cte as (
    select f.*, name as path from folders f where f.parent_id is null
    union all
    select f.*, cte.path || '/' || f.name as path from folders f join cte on f.parent_id = cte.id
)
select f.* from files f join cte on f.folder_id = cte.id
           where cte.path = 'my-app/node_modules/react';

This query has the benefit of being very readable, but I didn’t believe it would scale well. Building up a string through repeated concatenation is expensive regarding CPU and memory use.

It also suffers from undirected recursion: Postgres can’t optimize the execution plan to prioritize descending into subfolders that follow the desired path; it will have to materialize all rows from the CTE and pass them through the final where filter one by one. I wanted something that could be a little smarter: once Postgres found the “my-app” row in the folders table, I wanted to direct it to see the child folder named node_modules and skip descending into src or public or anything else.

The next thing I experimented with was simply returning the folder rows along the desired path. If I split the input string into each path name, I could limit the CTE recursion to just that path. I used the first entry (my-app) in the base case and then the rest (node_modules, react) in the recursive case of the CTE:

with recursive cte as (
    select * from folders where parent_id is null and name = 'my-app'
    union all
    select f.* from folders f 
        join cte on f.parent_id = cte.id and f.name IN ('node_modules', 'react')
)
select * from cte;
id name parent_id
1 my-app null
2 node_modules 1
4553 react 2

That seemed to work. All I would then need to do is join into the folders table using the last id in the list:

with recursive cte as (
    select * from folders where parent_id is null and name = 'my-app'
    union all
    select f.* from folders f 
        join cte on f.parent_id = cte.id and f.name IN ('node_modules', 'react')
)
-- this is changed:
select * from files where folder_id = (select max(id) from cte);
id name folder_id size
35921 LICENSE 4553 1086
35922 jsx-runtime.js 4553 214
35923 index.js 4553 190
35924 README.md 4553 1162
35925 react.shared-subset.js 4553 218
35926 package.json 4553 999
35927 jsx-dev-runtime.js 4553 222

Since there is a foreign-key relationship from a folder to its parent, I knew that a child folder always has a greater id than its parent. I could use that property to find the id of the leaf folder quickly.

Using depth to enforce the path

One issue I noticed with the previous attempt is that because it used an IN clause, the order of the folders declared there doesn’t matter. If both my-app/node_modules/react and my-app/react/node_modules existed, the query would be confused and return unexpected results.

I needed to get more specific: make sure that my-app is at depth 1, node_modules is at depth 2, and react is at depth 3. I created a computed depth column in the CTE to enforce that:

with recursive cte as (
    select 
        f.*, 
        1 as depth -- start at depth 1
    from folders f where parent_id is null and name = 'my-app'
    union all
    select
        f.*, 
        cte.depth + 1 as depth -- increment depth on recursion
    from folders f join cte on f.parent_id = cte.id 
    -- ensure we're only looking at the correct folder at each depth
    where f.name = (array['node_modules', 'react'])[cte.depth]
)
select * from cte;
id name parent_id depth
1 my-app null 1
2 node_modules 1 2
4553 react 2 3

Similar to how we used concatenation to build up a path string earlier, I could use addition to track how deep the recursion was into the tree.

I also exploited the fact that Postgres arrays are 1-indexed to look up the expected child folder based on the current depth.

Performance

Minimizing the number of round trips between the app server and database removed the latency, but I also wanted to examine the query’s performance.

As designed right now, there are no indexes except the primary keys. That means that any lookups on other columns need to scan the entire table.

Working with the few thousand files in a Create React App is not too bad, but what happens with hundreds of thousands of files and folders?

To test this, I loaded many additional folders, including my entire Homebrew installation, into the database. After that load, there were 153,245 folders and 1,416,966 files in the database, enough to start seeing problems.

Let’s look at a query that tries to compute the total size of files in the homebrew directory:

with recursive cte as (
    select * from folders where parent_id is null and name = 'homebrew'
    union all
    select f.* from folders f join cte on f.parent_id = cte.id
)
select sum(size) / 1024 / 1024 || ' MB' as total_size from files join cte on folder_id = cte.id;

37832.376917839050 MB

This query took almost a second to run and return 37832.376917839050 MB. If we look at the query plan with EXPLAIN ANALYZE, we can see why:

Query Plan without Indexes
 "Aggregate  (cost=100809.04..100809.06 rows=1 width=32) (actual time=1171.187..1171.192 rows=1 loops=1)"
"  CTE cte"
"    ->  Recursive Union  (cost=0.00..34835.41 rows=471 width=26) (actual time=21.003..472.785 rows=128862 loops=1)"
"          ->  Seq Scan on folders  (cost=0.00..2991.56 rows=1 width=26) (actual time=20.991..30.660 rows=1 loops=1)"
"                Filter: ((parent_id IS NULL) AND (name = 'homebrew'::text))"
"                Rows Removed by Filter: 153244"
"          ->  Hash Join  (cost=0.33..3183.91 rows=47 width=26) (actual time=4.914..20.343 rows=6136 loops=21)"
"                Hash Cond: (f.parent_id = cte_1.id)"
"                ->  Seq Scan on folders f  (cost=0.00..2608.45 rows=153245 width=26) (actual time=0.003..7.846 rows=153245 loops=21)"
"                ->  Hash  (cost=0.20..0.20 rows=10 width=8) (actual time=0.954..0.954 rows=6136 loops=21)"
"                      Buckets: 32768 (originally 1024)  Batches: 1 (originally 1)  Memory Usage: 959kB"
"                      ->  WorkTable Scan on cte cte_1  (cost=0.00..0.20 rows=10 width=8) (actual time=0.052..0.429 rows=6136 loops=21)"
"  ->  Hash Join  (cost=51728.74..65924.02 rows=19845 width=8) (actual time=466.649..1114.565 rows=1218007 loops=1)"
"        Hash Cond: (cte.id = files.folder_id)"
"        ->  CTE Scan on cte  (cost=0.00..9.42 rows=471 width=8) (actual time=21.026..502.036 rows=128862 loops=1)"
"        ->  Hash  (cost=27097.66..27097.66 rows=1416966 width=16) (actual time=424.955..424.956 rows=1416966 loops=1)"
"              Buckets: 262144  Batches: 16  Memory Usage: 6284kB"
"              ->  Seq Scan on files  (cost=0.00..27097.66 rows=1416966 width=16) (actual time=0.138..196.440 rows=1416966 loops=1)"
"Planning Time: 0.433 ms"
"JIT:"
"  Functions: 23"
"  Options: Inlining false, Optimization false, Expressions true, Deforming true"
"  Timing: Generation 1.583 ms, Inlining 0.000 ms, Optimization 2.665 ms, Emission 16.294 ms, Total 20.542 ms"
"Execution Time: 1173.388 ms"

There are a few full table scans (Seq Scan) in the query plan, causing Postgres to need 1,173ms to return the result:

  • A scan of folders looking for a matching parent_id
  • A scan of files looking for a matching file_id

Similarly, the query looking for the files in my-app/node_modules/react is also slow:

Query Plan without indexes
 "Gather  (cost=33932.96..54245.19 rows=42 width=42) (actual time=55.200..90.228 rows=7 loops=1)"
"  Workers Planned: 2"
"  Params Evaluated: $2"
"  Workers Launched: 2"
"  CTE cte"
"    ->  Recursive Union  (cost=0.00..32932.70 rows=11 width=26) (actual time=0.018..51.927 rows=3 loops=1)"
"          ->  Seq Scan on folders  (cost=0.00..2991.56 rows=1 width=26) (actual time=0.017..12.531 rows=1 loops=1)"
"                Filter: ((parent_id IS NULL) AND (name = 'my-app'::text))"
"                Rows Removed by Filter: 153244"
"          ->  Hash Join  (cost=0.33..2994.10 rows=1 width=26) (actual time=3.887..13.118 rows=1 loops=3)"
"                Hash Cond: (f.parent_id = cte.id)"
"                ->  Seq Scan on folders f  (cost=0.00..2991.56 rows=588 width=26) (actual time=0.008..13.056 rows=584 loops=3)"
"                      Filter: (name = ANY ('{node_modules,react}'::text[]))"
"                      Rows Removed by Filter: 152661"
"                ->  Hash  (cost=0.20..0.20 rows=10 width=8) (actual time=0.005..0.005 rows=1 loops=3)"
"                      Buckets: 1024  Batches: 1  Memory Usage: 9kB"
"                      ->  WorkTable Scan on cte  (cost=0.00..0.20 rows=10 width=8) (actual time=0.002..0.002 rows=1 loops=3)"
"  InitPlan 2 (returns $2)"
"    ->  Aggregate  (cost=0.25..0.26 rows=1 width=8) (actual time=51.938..51.939 rows=1 loops=1)"
"          ->  CTE Scan on cte cte_1  (cost=0.00..0.22 rows=11 width=8) (actual time=0.019..51.929 rows=3 loops=1)"
"  ->  Parallel Seq Scan on files  (cost=0.00..20308.03 rows=18 width=42) (actual time=14.258..24.551 rows=2 loops=3)"
"        Filter: (folder_id = $2)"
"        Rows Removed by Filter: 472320"
"Planning Time: 0.387 ms"
"Execution Time: 90.269 ms"

Inside the recursive CTE, Postgres performs a full table scan of the folders table, searching for rows that match the needed parent_id and name. It also needs a full table scan of the files table to find rows with the matching folder_id. That files scan went parallel, using two worker processes. Postgres took 90ms to compute the result.

Combining these observations, I could intuit that the following indexes would help:

create index files_folder_id_index
    on public.files (folder_id);

create index folders_parent_id_name_index
    on public.folders (parent_id, name);

Now, these queries are much faster. Adding up the file sizes in homebrew completes in half the time at 548ms due to using Index Scans instead of full table scans.

Improved Query Plan
"Aggregate  (cost=5845.27..5845.29 rows=1 width=32) (actual time=547.493..547.495 rows=1 loops=1)"
"  CTE cte"
"    ->  Recursive Union  (cost=0.42..1552.87 rows=471 width=26) (actual time=0.188..153.911 rows=128862 loops=1)"
"          ->  Index Scan using folders_parent_id_name_index on folders  (cost=0.42..6.19 rows=1 width=26) (actual time=0.186..0.187 rows=1 loops=1)"
"                Index Cond: ((parent_id IS NULL) AND (name = 'homebrew'::text))"
"          ->  Nested Loop  (cost=0.42..154.20 rows=47 width=26) (actual time=0.020..6.507 rows=6136 loops=21)"
"                ->  WorkTable Scan on cte cte_1  (cost=0.00..0.20 rows=10 width=8) (actual time=0.000..0.345 rows=6136 loops=21)"
"                ->  Index Scan using folders_parent_id_name_index on folders f  (cost=0.42..15.35 rows=5 width=26) (actual time=0.001..0.001 rows=1 loops=128862)"
"                      Index Cond: (parent_id = cte_1.id)"
"  ->  Nested Loop  (cost=0.43..4242.78 rows=19845 width=8) (actual time=0.247..503.434 rows=1218007 loops=1)"
"        ->  CTE Scan on cte  (cost=0.00..9.42 rows=471 width=8) (actual time=0.190..181.808 rows=128862 loops=1)"
"        ->  Index Scan using files_folder_id_index on files  (cost=0.43..8.57 rows=42 width=16) (actual time=0.001..0.002 rows=9 loops=128862)"
"              Index Cond: (folder_id = cte.id)"
"Planning Time: 1.206 ms"
"Execution Time: 548.144 ms"

Likewise, the query to find the files in my-app/node_modules/react now completes in less than 1ms and no longer needs parallelization.

Improved Query Plan
"Index Scan using files_folder_id_index on files  (cost=1296.99..1305.72 rows=42 width=42) (actual time=0.233..0.238 rows=7 loops=1)"
"  Index Cond: (folder_id = $3)"
"  CTE cte"
"    ->  Recursive Union  (cost=0.42..1296.30 rows=11 width=26) (actual time=0.067..0.165 rows=3 loops=1)"
"          ->  Index Scan using folders_parent_id_name_index on folders  (cost=0.42..6.19 rows=1 width=26) (actual time=0.065..0.067 rows=1 loops=1)"
"                Index Cond: ((parent_id IS NULL) AND (name = 'my-app'::text))"
"          ->  Nested Loop  (cost=0.42..129.00 rows=1 width=26) (actual time=0.025..0.029 rows=1 loops=3)"
"                ->  WorkTable Scan on cte  (cost=0.00..0.20 rows=10 width=8) (actual time=0.000..0.001 rows=1 loops=3)"
"                ->  Index Scan using folders_parent_id_name_index on folders f  (cost=0.42..12.87 rows=1 width=26) (actual time=0.017..0.020 rows=1 loops=3)"
"                      Index Cond: ((parent_id = cte.id) AND (name = ANY ('{node_modules,react}'::text[])))"
"  InitPlan 2 (returns $3)"
"    ->  Aggregate  (cost=0.25..0.26 rows=1 width=8) (actual time=0.209..0.209 rows=1 loops=1)"
"          ->  CTE Scan on cte cte_1  (cost=0.00..0.22 rows=11 width=8) (actual time=0.069..0.168 rows=3 loops=1)"
"Planning Time: 0.643 ms"
"Execution Time: 0.303 ms"

Summary:

Query Before Indexes After Indexes
Total size of files in homebrew 1,173ms 548ms
Files in my-app/node_modules/react 90ms ~1ms