Promise.all is too much of a good thing
Sometimes, your application architecture or constraints force you to do a bunch of database lookups in rapid succession.
Running them sequentially, one after another, can take a long time. With great new syntax in Javascript,
like async
and await
, running multiple operations concurrently is easier than ever. By doing so, you can complete a
big operation faster, but you risk shooting yourself in the foot and bringing your entire system to its knees. Good
application performance needs a holistic view: making one operation faster at the expense of all the others is not a
good engineering trade-off.
How do developers sometimes hurt themselves with async
and await
? Do you have code like this in your codebase?
const customers = await getABigListOfCustomers();
for (const c of customers) {
const c.orders = await getOrdersForCustomer(c.id);
}
An engineer sees this and squints at it and thinks “I bet I can run all those getOrdersForCustomer
calls in parallel
and finish this much faster!”
So they rewrite it to this:
const customers = await getABigListOfCustomers();
const bigListOfPromises: Promise[] = customers.map(async (c) => {
c.orders = await getOrdersForCustomer(c.id);
});
await Promise.all(bigListOfPromises);
They use map
to convert the Big List of Customer
s into a Big List of Promise
s that will each run
the getOrdersForCustomer
function in the background and resolve once the orders have been loaded.
Then a call to Promise.all
lets the script wait until all that work is done before moving on to the next step.
It’s a classic fork-join!
Or if you’re cynical, the dreaded N+1 problem.
Problem Number 1:
Lets imagine that there are 50 things in the list. Can the database even handle 50 requests at once? Maybe, but in many cloud deployment scenarios there’s multiple copies of your app running. What if each of them does this at roughly the same time? Can the database handle 100? 150? 200?
While the database is chunking through all these parallel requests, its struggling to serve other straight-forward
queries. Now one big Promise.all
call in a reporting endpoint is ruining the experience for users trying to do simple,
lightweight operations like logging in.
The Promise.all
method is also frequently used when forced to call third-party APIs one-by-one for each item in a
list. Your vendor will also not be happy with you if you slam them with hundreds of concurrent requests. Ask me how I
know.
Problem Number 2:
This situation can also wreck the app-server performance too. Database connections are an expensive resource and are usually shared across requests through a connection pool. That connection pool has a limited size. If it runs out of connections, it throttles other requests: they have to wait in line for a connection to open up.
This week I was consulting with a client who had this problem. API endpoints that were normally quite fast were sometimes taking multiple seconds.
The telemetry spans for the request showed large gaps at the beginning of the request where the app appeared to be doing nothing. The first SQL query in the trace normally runs almost immediately when the request starts. What gives?
My diagnosis was that another request was using Promise.all
and kicking off 50+ database queries, which quickly
exhausted the connection pool. The pool was using its default size of only 10 connections, which means the Promise.all
immediately took all 10 , and then since it got there first, it held up the pool until all 50 got through, 10 at a time.
Subsequent requests had to wait a long time for a chance to use the database.
Managing too much parallelism
Running things in parallel is still generally a good idea, you just have to implement some sane limits. Running two at a time will usually halve the overall response time, but it doesn’t follow that running 50 at a time will cut the response time to 2%. There’s just too many other variables in play, as we’ve seen.
Your best bet is to try and load the data eagerly. Can you use some database JOIN
s to return all the data in one
query? If not, can you do it in two queries - one to load all the Customers and then one to get all the needed Orders,
before joining them together in application code?
These approaches will usually yield the best results, but if your application architecture doesn’t allow it, consider a library like p-map, which lets you configure a max concurrency. For example, of those 50 queries, a properly configured p-map would allow only 4 to run at a time. This avoids pool exhaustion and limits the strain on the database.
Converting the Promise.all
code to use p-map
is pretty straight forward:
const customers = await getABigListOfCustomers();
await pMap(customers, async (c) => {
c.orders = await getOrdersForCustomer(c.id);
}, { concurrency: 4 });
You can specify the max concurrency
in the options object. This will limit the number of
database queries in-flight at any one time. There’s a bit of art to choosing the right
concurrency limit, and it might be specific to different areas of your application. Four is
probably a good starting point. You’d probably only get marginal improvements by going higher and
run the risk of hitting the same problems we’ve discussed.