Programming with futures: patterns and anti-patterns
Twitter’s Future library is a beautiful abstraction for dealing with concurrency. However, there are code patterns that seem natural or innocuous but can cause real trouble in production systems. This short article outlines a few of the easiest traps to fall into.
Below is a method from a fictional web application that registers a user by calling the Foursquare API to get the user’s profile info, their friend graph and their recent check-ins.
There are some problems with this code.
Anti-pattern #1: Blocking in a yield or a map
The last part of the
for-comprehension desugars to
The problem here is that
createDBUser makes a blocking call to the database.
You should never do blocking work in
map on a Future.
Every Future runs in a thread pool that is (hopefully) tuned for a particular purpose.
Code inside the
map (generally) runs on the thread that completes the Future.
So you’re putting work in a thread pool that wasn’t designed to handle that work.
Furthermore, when you’re dealing with Futures composed from other Futures, it’s often hard to tell by inspection which
Future will be the last to complete (and whose thread pool will run the
It’s frequently not the “outermost” Future. For example:
It’s also possible that the Future completes before you call
map — in which case the work inside the
happens in the main thread. This is bad if your callers expect you to to return instantly with a Future.
It’s also possible to cause a deadlock (and yes we’ve seen this in production) if the code inside the
Await on another thread in the same thread pool — but again, it’s hard to know which thread pool that is.
So instead, set up your own thread pool for blocking work:
And use it like this:
This now desugars to
which is safe.
yield a plain value or a simple computation. If you have blocking work, wrap it in a ThreadPool-backed
It’s worth noting that in the Scala-native Future library (
scala.concurrent.Future), you must supply an implicit
or explicit execution context when you create a Future. That way, you do have control over where your code executes, so
the above warnings about
map do not apply.
Anti-pattern #2: Too much parallelism
apiFriendsF creates a future for each item in a list of user IDs and collects the results into a single
But this is too much parallelism! You’ll flood the thread pool with a ton of simultaneous work. Some network or database drivers don’t even allow more than a certain number of concurrent connections, and you’ll get a bunch of exceptions, and you will not have a good day. A better way to do it is to limit how much you are doing in parallel.
groupedCollect helper method can be impemented as follows:
par parameter lets you specify how much work you want done in parallel. For example, if you specify 5, it will
take 5 items from the list, do them all in parallel, and wait for them to complete before moving on to the next 5 items.
This can be mitigated a different way, by configuring a thread pool with a maximum number of threads, and making sure that all database or network calls go through this pool. This has the advantage of limiting parallelism application-wide, rather than just at a given call site. It still might be a good idea to limit parallelism at an individual call site to prevent it from crowding out other work.
Anti-pattern #3: Not enough parallelism
This code invokes
api.getCategoriesF() sequentially when they could be run in parallel:
It desugars to
So one waits for the other even though it doesn’t need to. The fix is to invoke the methods outside of the
Likewise, we have:
These two can also be done in parallel. Write it this way instead:
join method runs multiple Futures in parallel and collects their results in a tuple.
It also explicitly documents that the two calls will happen in parallel.
Here’s what we ended up with:
Things to note here:
- Nested methods take plain values and return Futures, for great
- All the work is set up ahead of time via
vals and nested methods.
- Everything is “glued” together with a
for-comprehension at the end.
- Parallelism and dependencies are made explicit in the
- Blocking work is explicitly wrapped in a thread pool.
blog comments powered by Disqus