Recursive Common Table Expressions with Postgres
This week, I’ve been helping a client with some performance problems with their API. We’ve been focusing on tuning some Postgres queries to reduce latency and improve response times.
One area of the app that was struggling was a query that processes a tree structure. The app allows users to create folders and upload files to them.
The database structure is designed in a way that resembles the following:
create table folders (
id bigserial primary key,
name text not null,
parent_id bigint,
foreign key (parent_id) references folders(id)
);
create table files (
id bigserial primary key,
name text not null,
folder_id bigint,
size bigint not null,
foreign key (folder_id) references folders(id)
);
Here, we have a table called folders
and a table for files
in each folder.
The parent_id
column in the folders
table is a self-referencing foreign key that allows us to form a tree by pointing each folder to its parent folder.
The process I was optimizing involved finding all the files at a specific folder path, for example,
in widgets/12345/attachments
. The API used a simple while
loop in application code to walk down the tree, basically
doing the following:
select id from folders where name = 'widgets' and parent_id is null; -- result: 1
select id from folders where name = '12345' and parent_id = 1; -- result: 2
select id from folders where name = 'attachments' and parent_id = 2; -- result: 3
select * from files where folder_id = 3;
This strategy takes multiple round trips to the server, and the deeper the path, the more queries are issued. The network latency can add up quickly and affect the user experience.
Recursive Common Table Expressions
I wanted to improve latency by making only a single network round trip from the application to the database. I needed a
way to recursively walk the entire folder
path in a single SQL query.
Postgres allows developers to define recursive common table expressions ( CTE) that union a base case with a recursive case. It will expand the table expression on demand until it has satisfied the entire query.
For this demo, I’ve loaded the folders and files of a default Create React App application into a Postgres database using the above table schema. I’ll show off a few queries using recursive CTEs that make quick work of common tree-related queries.
Listing all the folders
One thing you might want to do with a file database is list all the folders in a tree. A recursive CTE for this is straightforward:
with recursive cte as (
select * from folders where parent_id is null and name = 'my-app'
union all
select f.* from folders f join cte on f.parent_id = cte.id
)
select * from cte;
id | name | parent_id |
---|---|---|
1 | my-app | |
4850 | .git | 1 |
2 | node_modules | 1 |
4849 | public | 1 |
4883 | src | 1 |
4879 | hooks | 4850 |
4875 | info | 4850 |
4876 | logs | 4850 |
4851 | objects | 4850 |
I’ve cut this off short because there are 4,883 files in a default Create React App.
This CTE has a base case that looks for a root folder (its parent_id
is null
) with the name my-app.
That row
is union all
ed with the recursive case that finds rows in the folders
table with a parent_id
matching the rows
already discovered. Postgres can expand this until it has discovered all possible rows.
Listing folders with their paths
Another common need for a database tree structure is visualizing each folder’s path. Since each folder has a name, we
can recursively build up a path
column by concatenating the names of all the parent folders with the current name:
with recursive cte as (
select f.*, name as path from folders f where f.parent_id is null and f.name = 'my-app'
union all
select
f.*,
-- concatenate the parent path with the current row's name to form a deeper path
cte.path || '/' || f.name as path
from folders f join cte on f.parent_id = cte.id
)
select * from cte;
id | path |
---|---|
1 | my-app |
4850 | my-app/.git |
2 | my-app/node_modules |
4849 | my-app/public |
4883 | my-app/src |
4879 | my-app/.git/hooks |
4875 | my-app/.git/info |
4876 | my-app/.git/logs |
4851 | my-app/.git/objects |
Computing total file size
By joining all the discovered rows from the folders
table with the rows in the files
table we can add up the sizes
of every file in every folder to get a grand total:
with recursive cte as (
select * from folders where parent_id is null and name = 'my-app'
union all
select f.* from folders f join cte on f.parent_id = cte.id
)
select sum(size) / 1024 / 1024 as total_mb from files join cte on folder_id = cte.id;
total_mb |
---|
236.7352590560913086 |
Getting files in a particular folder
The primary use case I was trying to improve in my client’s API was finding files in a particular folder. Given the
input "my-app/node_modules/react"
, how would I use a recursive CTE to find the top-level files in the react
folder
(without recursing any deeper)?
At first, I thought I could use the path
column from the previous query to find the id
of the folder I wanted
and then join to files
from there. It would look something like this:
with recursive cte as (
select f.*, name as path from folders f where f.parent_id is null
union all
select f.*, cte.path || '/' || f.name as path from folders f join cte on f.parent_id = cte.id
)
select f.* from files f join cte on f.folder_id = cte.id
where cte.path = 'my-app/node_modules/react';
This query has the benefit of being very readable, but I didn’t believe it would scale well. Building up a string through repeated concatenation is expensive regarding CPU and memory use.
It also suffers from undirected recursion: Postgres can’t optimize the execution plan to prioritize descending into
subfolders that follow the desired path; it will have to materialize all rows from the CTE and pass them through the
final where
filter one by one. I wanted something that could be a little smarter: once Postgres found the “my-app” row
in the folders
table, I wanted to direct it to see the child folder named node_modules
and skip descending
into src
or public
or anything else.
The next thing I experimented with was simply returning the folder
rows along the desired path. If I split the input string into
each path name, I could limit the CTE recursion to just that path. I used the first entry (my-app
) in the base case
and then the rest (node_modules
, react
) in the recursive case of the CTE:
with recursive cte as (
select * from folders where parent_id is null and name = 'my-app'
union all
select f.* from folders f
join cte on f.parent_id = cte.id and f.name IN ('node_modules', 'react')
)
select * from cte;
id | name | parent_id |
---|---|---|
1 | my-app | null |
2 | node_modules | 1 |
4553 | react | 2 |
That seemed to work. All I would then need to do is join into the folders
table using the last id
in the list:
with recursive cte as (
select * from folders where parent_id is null and name = 'my-app'
union all
select f.* from folders f
join cte on f.parent_id = cte.id and f.name IN ('node_modules', 'react')
)
-- this is changed:
select * from files where folder_id = (select max(id) from cte);
id | name | folder_id | size |
---|---|---|---|
35921 | LICENSE | 4553 | 1086 |
35922 | jsx-runtime.js | 4553 | 214 |
35923 | index.js | 4553 | 190 |
35924 | README.md | 4553 | 1162 |
35925 | react.shared-subset.js | 4553 | 218 |
35926 | package.json | 4553 | 999 |
35927 | jsx-dev-runtime.js | 4553 | 222 |
Since there is a foreign-key relationship from a folder
to its parent, I knew that a child folder
always has a
greater id
than its parent. I could use that property to find the id
of the leaf folder quickly.
Using depth to enforce the path
One issue I noticed with the previous attempt is that because it used an IN
clause, the order of the folders declared
there doesn’t matter. If both my-app/node_modules/react
and my-app/react/node_modules
existed, the query would be
confused and return unexpected results.
I needed to get more specific: make sure that my-app
is at depth 1, node_modules
is at depth 2, and react
is at
depth 3. I created a computed depth
column in the CTE to enforce that:
with recursive cte as (
select
f.*,
1 as depth -- start at depth 1
from folders f where parent_id is null and name = 'my-app'
union all
select
f.*,
cte.depth + 1 as depth -- increment depth on recursion
from folders f join cte on f.parent_id = cte.id
-- ensure we're only looking at the correct folder at each depth
where f.name = (array['node_modules', 'react'])[cte.depth]
)
select * from cte;
id | name | parent_id | depth |
---|---|---|---|
1 | my-app | null | 1 |
2 | node_modules | 1 | 2 |
4553 | react | 2 | 3 |
Similar to how we used concatenation to build up a path string earlier, I could use addition to track how deep the recursion was into the tree.
I also exploited the fact that Postgres arrays are 1-indexed to look up the expected child folder based on the current depth.
Performance
Minimizing the number of round trips between the app server and database removed the latency, but I also wanted to examine the query’s performance.
As designed right now, there are no indexes except the primary keys. That means that any lookups on other columns need to scan the entire table.
Working with the few thousand files in a Create React App is not too bad, but what happens with hundreds of thousands of files and folders?
To test this, I loaded many additional folders, including my entire Homebrew installation, into the database. After that load, there were 153,245 folders and 1,416,966 files in the database, enough to start seeing problems.
Let’s look at a query that tries to compute the total size of files in the homebrew
directory:
with recursive cte as (
select * from folders where parent_id is null and name = 'homebrew'
union all
select f.* from folders f join cte on f.parent_id = cte.id
)
select sum(size) / 1024 / 1024 || ' MB' as total_size from files join cte on folder_id = cte.id;
37832.376917839050 MB
This query took almost a second to run and return 37832.376917839050 MB
. If we look at the query plan
with EXPLAIN ANALYZE
, we can see why:
Query Plan without Indexes
"Aggregate (cost=100809.04..100809.06 rows=1 width=32) (actual time=1171.187..1171.192 rows=1 loops=1)"
" CTE cte"
" -> Recursive Union (cost=0.00..34835.41 rows=471 width=26) (actual time=21.003..472.785 rows=128862 loops=1)"
" -> Seq Scan on folders (cost=0.00..2991.56 rows=1 width=26) (actual time=20.991..30.660 rows=1 loops=1)"
" Filter: ((parent_id IS NULL) AND (name = 'homebrew'::text))"
" Rows Removed by Filter: 153244"
" -> Hash Join (cost=0.33..3183.91 rows=47 width=26) (actual time=4.914..20.343 rows=6136 loops=21)"
" Hash Cond: (f.parent_id = cte_1.id)"
" -> Seq Scan on folders f (cost=0.00..2608.45 rows=153245 width=26) (actual time=0.003..7.846 rows=153245 loops=21)"
" -> Hash (cost=0.20..0.20 rows=10 width=8) (actual time=0.954..0.954 rows=6136 loops=21)"
" Buckets: 32768 (originally 1024) Batches: 1 (originally 1) Memory Usage: 959kB"
" -> WorkTable Scan on cte cte_1 (cost=0.00..0.20 rows=10 width=8) (actual time=0.052..0.429 rows=6136 loops=21)"
" -> Hash Join (cost=51728.74..65924.02 rows=19845 width=8) (actual time=466.649..1114.565 rows=1218007 loops=1)"
" Hash Cond: (cte.id = files.folder_id)"
" -> CTE Scan on cte (cost=0.00..9.42 rows=471 width=8) (actual time=21.026..502.036 rows=128862 loops=1)"
" -> Hash (cost=27097.66..27097.66 rows=1416966 width=16) (actual time=424.955..424.956 rows=1416966 loops=1)"
" Buckets: 262144 Batches: 16 Memory Usage: 6284kB"
" -> Seq Scan on files (cost=0.00..27097.66 rows=1416966 width=16) (actual time=0.138..196.440 rows=1416966 loops=1)"
"Planning Time: 0.433 ms"
"JIT:"
" Functions: 23"
" Options: Inlining false, Optimization false, Expressions true, Deforming true"
" Timing: Generation 1.583 ms, Inlining 0.000 ms, Optimization 2.665 ms, Emission 16.294 ms, Total 20.542 ms"
"Execution Time: 1173.388 ms"
There are a few full table scans (Seq Scan
) in the query plan, causing Postgres to need 1,173ms to return the result:
- A scan of
folders
looking for a matchingparent_id
- A scan of
files
looking for a matchingfile_id
Similarly, the query looking for the files in my-app/node_modules/react
is also slow:
Query Plan without indexes
"Gather (cost=33932.96..54245.19 rows=42 width=42) (actual time=55.200..90.228 rows=7 loops=1)"
" Workers Planned: 2"
" Params Evaluated: $2"
" Workers Launched: 2"
" CTE cte"
" -> Recursive Union (cost=0.00..32932.70 rows=11 width=26) (actual time=0.018..51.927 rows=3 loops=1)"
" -> Seq Scan on folders (cost=0.00..2991.56 rows=1 width=26) (actual time=0.017..12.531 rows=1 loops=1)"
" Filter: ((parent_id IS NULL) AND (name = 'my-app'::text))"
" Rows Removed by Filter: 153244"
" -> Hash Join (cost=0.33..2994.10 rows=1 width=26) (actual time=3.887..13.118 rows=1 loops=3)"
" Hash Cond: (f.parent_id = cte.id)"
" -> Seq Scan on folders f (cost=0.00..2991.56 rows=588 width=26) (actual time=0.008..13.056 rows=584 loops=3)"
" Filter: (name = ANY ('{node_modules,react}'::text[]))"
" Rows Removed by Filter: 152661"
" -> Hash (cost=0.20..0.20 rows=10 width=8) (actual time=0.005..0.005 rows=1 loops=3)"
" Buckets: 1024 Batches: 1 Memory Usage: 9kB"
" -> WorkTable Scan on cte (cost=0.00..0.20 rows=10 width=8) (actual time=0.002..0.002 rows=1 loops=3)"
" InitPlan 2 (returns $2)"
" -> Aggregate (cost=0.25..0.26 rows=1 width=8) (actual time=51.938..51.939 rows=1 loops=1)"
" -> CTE Scan on cte cte_1 (cost=0.00..0.22 rows=11 width=8) (actual time=0.019..51.929 rows=3 loops=1)"
" -> Parallel Seq Scan on files (cost=0.00..20308.03 rows=18 width=42) (actual time=14.258..24.551 rows=2 loops=3)"
" Filter: (folder_id = $2)"
" Rows Removed by Filter: 472320"
"Planning Time: 0.387 ms"
"Execution Time: 90.269 ms"
Inside the recursive CTE, Postgres performs a full table scan of the folders
table, searching for rows that match the needed parent_id
and name
. It also needs a full table scan of the files
table to find rows with the matching folder_id
. That files
scan went parallel, using two worker processes. Postgres took 90ms to compute the result.
Combining these observations, I could intuit that the following indexes would help:
create index files_folder_id_index
on public.files (folder_id);
create index folders_parent_id_name_index
on public.folders (parent_id, name);
Now, these queries are much faster. Adding up the file sizes in homebrew
completes in half the time at 548ms due to
using Index Scans
instead of full table scans.
Improved Query Plan
"Aggregate (cost=5845.27..5845.29 rows=1 width=32) (actual time=547.493..547.495 rows=1 loops=1)"
" CTE cte"
" -> Recursive Union (cost=0.42..1552.87 rows=471 width=26) (actual time=0.188..153.911 rows=128862 loops=1)"
" -> Index Scan using folders_parent_id_name_index on folders (cost=0.42..6.19 rows=1 width=26) (actual time=0.186..0.187 rows=1 loops=1)"
" Index Cond: ((parent_id IS NULL) AND (name = 'homebrew'::text))"
" -> Nested Loop (cost=0.42..154.20 rows=47 width=26) (actual time=0.020..6.507 rows=6136 loops=21)"
" -> WorkTable Scan on cte cte_1 (cost=0.00..0.20 rows=10 width=8) (actual time=0.000..0.345 rows=6136 loops=21)"
" -> Index Scan using folders_parent_id_name_index on folders f (cost=0.42..15.35 rows=5 width=26) (actual time=0.001..0.001 rows=1 loops=128862)"
" Index Cond: (parent_id = cte_1.id)"
" -> Nested Loop (cost=0.43..4242.78 rows=19845 width=8) (actual time=0.247..503.434 rows=1218007 loops=1)"
" -> CTE Scan on cte (cost=0.00..9.42 rows=471 width=8) (actual time=0.190..181.808 rows=128862 loops=1)"
" -> Index Scan using files_folder_id_index on files (cost=0.43..8.57 rows=42 width=16) (actual time=0.001..0.002 rows=9 loops=128862)"
" Index Cond: (folder_id = cte.id)"
"Planning Time: 1.206 ms"
"Execution Time: 548.144 ms"
Likewise, the query to find the files in my-app/node_modules/react
now completes in less than 1ms and no longer needs parallelization.
Improved Query Plan
"Index Scan using files_folder_id_index on files (cost=1296.99..1305.72 rows=42 width=42) (actual time=0.233..0.238 rows=7 loops=1)"
" Index Cond: (folder_id = $3)"
" CTE cte"
" -> Recursive Union (cost=0.42..1296.30 rows=11 width=26) (actual time=0.067..0.165 rows=3 loops=1)"
" -> Index Scan using folders_parent_id_name_index on folders (cost=0.42..6.19 rows=1 width=26) (actual time=0.065..0.067 rows=1 loops=1)"
" Index Cond: ((parent_id IS NULL) AND (name = 'my-app'::text))"
" -> Nested Loop (cost=0.42..129.00 rows=1 width=26) (actual time=0.025..0.029 rows=1 loops=3)"
" -> WorkTable Scan on cte (cost=0.00..0.20 rows=10 width=8) (actual time=0.000..0.001 rows=1 loops=3)"
" -> Index Scan using folders_parent_id_name_index on folders f (cost=0.42..12.87 rows=1 width=26) (actual time=0.017..0.020 rows=1 loops=3)"
" Index Cond: ((parent_id = cte.id) AND (name = ANY ('{node_modules,react}'::text[])))"
" InitPlan 2 (returns $3)"
" -> Aggregate (cost=0.25..0.26 rows=1 width=8) (actual time=0.209..0.209 rows=1 loops=1)"
" -> CTE Scan on cte cte_1 (cost=0.00..0.22 rows=11 width=8) (actual time=0.069..0.168 rows=3 loops=1)"
"Planning Time: 0.643 ms"
"Execution Time: 0.303 ms"
Summary:
Query | Before Indexes | After Indexes |
---|---|---|
Total size of files in homebrew |
1,173ms | 548ms |
Files in my-app/node_modules/react |
90ms | ~1ms |