Parallelizing HTTP Requests in Clojure

We’ve been working our way through Clojure for the Brave and True and enjoyed solving Exercise 2 from Chapter 10 so much that we’d like to share our solution with you!

The problem

For context, the exercise asks us to consume some data from an API via concurrent asynchronous HTTP requests. The author of Clojure for the Brave and True, Daniel Higginbotham, @nonrecursive on twitter, was kind enough to let us reproduce it here:

Create a function that uses futures to parallelize the task of downloading random quotes from http://www.braveclojure.com/random-quote using

(slurp "http://www.braveclojure.com/random-quote")

The futures should update an atom that refers to a total word count for all quotes. The function will take the number of quotes to download as an argument and return the atom’s final value. Keep in mind that you’ll need to ensure that all futures have finished before returning the atom’s final value. Here’s how you would call it and an example result:

(quote-word-count 5)
; => {"ochre" 8, "smoothie" 2}

Our solution

You can check out our solution on GitHub. We discuss the juicy details below.

We’re going to cover our solution to parallelizing the HTTP requests, the other bits are fairly straightforward. Here’s the public interface:

(defn quote-word-count [number-of-quotes]
"Returns a histogram of the word counts in `number-of-quotes`
quotes retrieved from http://www.braveclojure.com/random-quote"
(let [word-count (atom {})]
    (doall
      (pmap
       (fn [_]
         (swap! word-count
                (fn [histogram]
                  (histogramize histogram (-> (get-quote)
                                              get-words)))))
       (range number-of-quotes)))
    @word-count))

The key points to cover are that pmap implicitly creates futures, using an atom ensures that we have a single thread of history, and why we need doall. Finally we will cover some of the caveats for using this approach.

pmap implicitly creates futures

The exercise instructions recommend using futures. Clojure’s handy pmap uses futures under the hood. By using Clojure’s built-in parallel mapping features, we can solve the problem using futures without having to worry about them ourselves.

Atomic state: a single thread of history

Using an atom ensures that we have consistent history for converting all the quotes to a word count. You can see a great explanation in Chapter 10 of Clojure for the Brave and True.

Briefly we use an atom here to ensure that we get a consistent state no matter when the spawned future word count processes return their values.

[ [wc1] [wc2] [wc3] ]
--------atom-------->

[ [wc2] [wc3] [wc1] ]
--------atom-------->

We’re representing each HTTP request as [wcx], where x is the number of the spawned process. As you can see they can complete in any order. We call the atom word-count in our implementation. The two rows represent two possible results of calling (quote-word-count 3).

Since our problem can be solved in an order independent manner, we don’t have to worry about some of the harder problems related to parallelism and concurrency. The atom’s swap! interface handles all of the overhead to guarantee that we get a single line of history when we dereference the value of our requests.

doall forces the pmap to realize

Like its sequential namesake, map, pmap is a lazy bum. pmap will only do computation as it becomes absolutely necessary. We wrap it with a doall call to force it to realize all of the computational results.

Caveats

Our solution will only return a single line of history. This solves the requirements for the exercise but it will not allow multiple processes to communicate in parallel. Clojure’s STM machinery can tackle intercommunication among processes. Check out this section from Clojure for the Brave and True for a nice summary.

We’re not handling any failure states here. So our implementation of quote-word-count will throw an exception if the network is spotty or a process is hanging. Another approach would be to use Clojure’s core.async to handle errors, via timeouts and exponential retries.

The semantics of atoms are surprisingly powerful. We’re only scratching the surface of what they can do here. Cognitect’s datomic database is a great example of what atoms can do in practice. Clojure for the Brave and True covers core.async in Chapter 11.

Hit us up if you need idiomatic solutions to your problems whipped up in artisanal Clojure.


Category: Development
Tags: Clojure