Transducers: Clojure's Next Big Idea

Recently Rich Hickey announced transducers for Clojure, the next big idea in Clojure after reducers.

In Clojure, we often work with collections in various types of sequences (lazy or not), and core.async channels. The workhorse functions in Clojure–map, reduce, and filter, among others–are all functions that operate on collections. When we compose multiple functions that operate over these collections, we gain performance benefits and power. However, implementing those composed functions ourselves means locking it into a particular collection input and output. For example, core.async needed to have its own implementation of map that behaved like Clojure’s map but only worked with core.async channels.

Transducers were introduced to free functions from implementation-specific knowledge of collection types. Instead, transducers operate on any type of collection and focus on the operation they wish to perform on that collection. Before we go further into transducers, however, let’s take a look at their predecessors.

Reducers introduced the idea to Clojure that functions operating over a collection could be combined into one function and then operate on the entire collection in one go. Reducers decoupled the implementation of inputs from the operation you wish to perform on the inputs. Reducers can only be evaluated eagerly, not lazily, and not over a core.async channel. As core.async becomes more and more popular with Clojure, reducers are left behind. Reducers’ eager evaluation means the work is all done at once; the output cannot be a lazy sequence anymore. Additionally, reducers use macros to perform their magic instead of using function composition, which means we have to repeat our logic to handle different abstractions.

Clojure’s collection function internals have all been rewritten with this new concept of transducers. By digging into how transducers work, we can learn to use their full power and gain their benefits. I’ve been reading about the recent changes to Clojure’s functions, and this post details what I’ve learned so far.

In our code, we want to ignore the implementation of our inputs and outputs and instead focus on the operations to perform on a collection. To get a mental model of how this looks, consider a worker in an chocolate factory. The workers in the factory have many different tasks they need to perform including decorating the candies, removing misshapen candies, and putting the candies into different bins based on color. The candies could come continuously from a conveyer belt or arrive in bulk: the worker doesn’t care how they show up, only that they do show up. Once the worker is done with their task, the candies leave the worker’s area and the worker has no further interaction with them.

How would an eager-loaded collection look to our factory worker? The candies would show up in a big bin, all ready to be processed. The worker might perform a filtering operation—throw out the bad candies—and place the good candies into a new bin. In this example, the worker isn’t so concerned about where that bin comes from or where the output bin goes after the work is finished.

For a lazily-loaded collection, we can imagine that a tube comes down next to the worker’s station on the assembly line. The tube fills up with candies coming from somewhere else. Every time the worker takes one to process, there’s another candy waiting. Again, the input and output don’t matter. The worker just does their filtering job of removing the bad candies.

In both examples, the worker doesn’t have to worry about how the candy gets to their spot on the assembly line or where the candies go afterwards. The assembly line itself is composed of many different workers doing many different jobs. Because their inputs and outputs are interchangeable with other steps in the assembly line, they can be combined to build complex candy-making operations.

Our last type of input and output is a core.async channel. We might imagine the worker has many conveyor belts with irregularly arriving candies coming from all over the factory. But the worker must still determine which candies are bad. That’s it. The implementation of the rest of the assembly line is outside of the worker’s awareness. The worker still fits into our assembly line paradigm.

The famous chocolate factory scene from I Love Lucy These three candy making scenarios describe the three types of input collections that transducers handle out of the box. (Transducers should also allow us to handle new types of collections, including types that we haven’t thought of yet.) The Clojure team implemented transducers in both Clojure and ClojureScript, and the addition of transducers across the board means that we can begin to ignore the differences between collection implementations and focus on the work we want our functions to perform. Most importantly, we can combine transducers with simple functional composition to build more powerful operations.

Let’s take a look at how at the generic layout of a transducer. The basic structure you’ll see is two anonymous functions—an inner and an outer function. It looks like:

(fn [reduction-function]
  (fn
    ([] (reduction-function))
    ([result] (reduction-function result))
    ([result input]
       (reduction-function result input))))

The outer function exists to allow transducers to be combined with functional composition. The inner function performs the actual work.

Notice how the inner function has 3-arity. The transducer’s inner function must handle three cases:

  • When no arguments are provided, we consider this the “base” case. This arity will always be called when the input and output are specified and ready. An example is if our reduction-function is the + function, with no arguments it returns 0. Clojure core’s conj has been changed to work with transducers: it now handles a 0-arity call returning an empty PersistentVector.
  • When called with the single result argument, the inner function performs the lazy case. This arity is best explained as being used when the transducer chain is stopped and the code must perform any cleanup. When done, it will call reduction-function with result. This arity is used for internal plumbing and you shouldn’t need to worry about this case unless you’re digging into something like partition-by and partition-all.
  • When called with the result and input, the operation actually performs work. The input argument in this case is a single item given to the function. We’re not concerned with where it came from, whether it’s from a lazy sequence or a core.async channel or a conveyor belt full of candies. Here are some examples of how this arity works taken from Clojure’s source code:

map applies mapping-function to its input then uses reduction-function to add it to result:

([result input]
  (reduction-function result (mapping-function input)))

filter uses the filter-predicate to determine if the reduction-function should add the input to the result:

([result input]
  (if (filter-predicate input)
    (reduction-function result input)
    result))

Finally keep transforms the input and only passes it to the reduction-function if it is not nil:

([result input]
  (let [keep-value (keep-transformation input)]
    (if (nil? keep-value)
      result
      (reduction-function result keep-value)))

All of these examples operate on the entire input. What about when you want to halt midway, say after you have found a truffle in a batch of peanut brittle? You can send a halt signal by using reduced. Here is an example from take-while‘s definition:

([result input]
  (if (take-while-predicate input)
    (reduction-function result input)
    (reduced result))) ;;Halts the transducer operation

You might wonder: Why can’t we just use partial to compose these collection functions, and skip all this new transducer hullabaloo?

For example, why not just write code like (comp (partial filter even?) (partial map inc))? Applying partial here gives you the ability to chain functions (just like -> or ->>), but the logistics of managing the inputs and outputs are tied up in the collection functions. The map function still returns a lazy sequence. You can’t take this composed function and use it with core.async channels or a reducer. To do that, you’d have to re-implement the same logic with those abstraction’s functions.

core.async is becoming increasingly popular as a way to add performance to your existing Clojure code. You might find yourself replacing some part of your code with core.async channels. If your code was implemented with transducers, it will work in the new case without having to rewrite it.

To further drive home why transducers over reducers: Reducers were a separate namespace and required pulling in functions like map and filter from clojure.core.reducers and ensuring correct namespacing, for example, because you don’t want to clobber your existing usage of map. Because transducers are built into the core functions, they don’t present the same namespace-clobbering issues.

With transducers, we don’t have to juggle both reducers’ functions and Clojure core’s functions at the same time. The distinctions of each implementation collapse into one map function, regardless of context.

Using the same logic with multiple types of collections is the real power of transducers. Clojure has transducers built in, meaning that you can begin using them immediately everywhere that you need to compose functions that operate over collections. There’s no separate namespace to juggle in your code. And transducers play well with core.async, which is increasingly finding its way into our code bases.

Have any questions about transducers? Reach out to us @bendyworks on Twitter.


Category: Development
Tags:Clojure