From Ruby to Haskell, Part 2: Similarity, Refactoring, and Patterns

It has been a while since I last wrote one of these posts and I didn’t want to leave people sitting by their computer desks forever, waiting with bated breath for the next one to pop up in your Google Reader feed (…whispers from the Internet…), okay never mind that Google Reader thing. I had a lot to cover in my exploration of Ruby and Haskell and I don’t even feel like I made a big chip in the surface of all that I wanted to talk about. But I’m going to keep at it until I feel like I’ve said all I want to cover. This time I want to talk about three subjects:

  1. How ruby and haskell are similar and, unavoidably, how they’re different.
  2. Refactoring code because I don’t think that this is solely the domain of the OOP’er; refactoring functional code is way cool
  3. Patterns also don’t belong to any sort of programmer in particular. Patterns, and here I mean something like “an underlying repetition or similarity”. Functional programming has some great patterns which, once you start to model code with them, you can claim great wins in concision.

But as I was writing this I wondered what order I would cover these subjects and the more I thought about it the less sense it made to me to talk about any one of them without pulling in the other two. What sort of refactoring can you do that isn’t pulling out some commonality among seemingly-disparate hunks of code? Hmmm? That being said, let me talk about a refactoring story.

My refactoring story

I was strolling through a codebase when I found a bit of code… Well that’s not quite the truth. My program raised an error and then as I was debugging I found the following:

def to_s
  string = ""
  string += self.description if self.description
  string += self.name if self.name
  string += self.location if self.location
  string += self.user.try(:company) if self.user
  string += self.category.name
  string += self.event_type
  string
end

Seasoned Rubyists can probably already spot the problem. The show-stopper was that the category of the event (like “Birthday” or something) could be nil, and (with apologies to Murphy) is that if something can be nil then at some point it will be nil. So, sure enough, an event snuck into the database (remember your database constraints, friends) that didn’t have an associated category, so then self.category was itself, nil and in Ruby you can’t send the name message (or call the name method, if you like) on nil.

1.9.3p392 :001 > nil.name
NoMethodError: undefined method `name' for nil:NilClass

Which makes sense when nil is an object. Basically, all you can do with a nil is ask if it is in fact nil:

1.9.3p392 :002 > nil.nil?
=> true

In the code above my quick fix was to protect the method call using try:

def to_s
  ...
  string += self.category.try(:name) || ''
  ...
end

The effect of try here is that when self.category comes up nil, like it was in my case, then the whole chain of methods self.category.try(:name) itself comes up nil, rather than blowing up. See the difference? The last part of that line is so that I end up with a String that can be appended to the built-up string output. At that point I had something that passed my new regression test and I could have conceivably moved on. But I figured that I could express what was going on in a clearer way than what was there. “What is going on here?” I asked myself. It is a bunch of attributes that I’m gluing together and I just want to skip anything that isn’t there. In my mind that sounded like filtering:

def to_s
  [description, name, location, user.try(:company), category.try(:name), event_type]
    .compact.join ''
end

What I do here is to make an array of all the attributes that I want. Next, I use the compact method, which filters out any elements that are nil. Finally, I join them back together using an empty string as the in-between character, this also has the effect of coercing each item to a string — if it wasn’t already. I think it reads pretty well. “Computer! Bring me the following attributes, remove the bad ones, and stick them all together! Do it right now!” (For best effect, yell this in your open-plan office; it feels great). And so my refactoring story stopped, or so I thought. Yet, it seemed that there was another pattern lurking here. But to talk about it, I’ll have to drop the Monoid!

Droppin’ the Monoid

What’s a monoid? How do you even say it? I go with “Moe-noid” but I’ve heard “Monn-oid” too. This leads to the common saying: “You say Moe-noid I say Monn-oid.” As for what it is, that’s more interesting; and easier for me to convey than pronunciations. Monoids are about combining things. But this combining comes with a rule that helps do it in a nicely ordered way. Because all these different ways of combining share characteristics we can draw comparisons between things that don’t seem the same. Take a look at these things and see if you spot the similarity:

  1. Addition:
  2. You can put two numbers together to get another number
  3. If you put any number together with zero, you just get the first number back
  4. If you have three numbers to add, it doesn’t matter if you do (a + b) + c or a + (b + c), those are equal
  5. Multiplication:
  6. You can multiply two numbers together to get another number
  7. If you multiply any number together with one, you just get the first number back
  8. If you have three numbers to multiply, it doesn’t matter if you do (a * b) * c or a * (b * c), those are equal
  9. String concatenation:
  10. You can put two strings together to get another string
  11. If you put any string together with the empty string, you just get the first string back
  12. If you have three strings to concatenate, it doesn’t matter if you do (s1 + s2) + s3 or s1 + (s2 + s3) those are equal

Are you seeing a pattern? All these things share the same three rules. And the rules all talk about two aspects, the first aspect is the operation that we’re talking about: addition, multiplication, and concatenation. The second aspect is the “neutral” element: zero, one, and empty string. The neutral element is special because when you combine something with it, the result is the same as the other operand:

  • 5 + 0 = 5
  • 5 * 1 = 5
  • “five” + “” = “five”

Clearly something is up. There’s definitely a shared pattern here. What is it? What is it used for? Well, enter the Monoid! Let’s look at Monoid as it is implemented in Haskell. I’m about to lay some syntax on you, so get ready:

class Monoid a where
  mempty :: a
  mappend :: a -> a -> a
  mconcat :: [a] -> a
    -- Defined in `Data.Monoid'

There are like a few new things that I just introduced. You may have noticed that in the first post, I didn’t really mention type signatures all that much, but I think I’m going to finally have to break with that. It’ll be a little to digest, but I think it’ll pay off soon.

A quick aside about multiple parameters

When you see a -> a, think of that as a function that takes some type a, and returns that same type a. This is like the type of sqrtsqrt :: a -> a takes a single number and returns a number. Then there are two parameters. These are written like this foo :: a -> a -> a, you may have expected that to be like otherFoo :: (a, a) -> a, but that means something different. Here’s the scoop: in Haskell, functions only take one argument! Yeah. The way that it works is that when you give an argument to a function in haskell, it is happy eating that first argument and giving you back a new function:

(+) :: Num a => a -> a -> a

If you’re familiar with interfaces, you could say that a, implements the Num interface, letting us know we can add it, multiply it, negate it, and etc. Read it like: “a can be any type as long as that type can do Num-things.” Let’s feed plus an argument:

(+3) :: Num a => a -> a

I have a totally new function that now expects a single number and will return a number. I’ve effectively created a “plus3″ function!

(+3) 2 :: Num a => a

he last type signature is when I’ve called “plus3″ with the argument 2. The result has no skinny arrows left, it is just a value of type a. I should be able to evaluate that and see what I get:

> (3+) 2
5

yup. The same thing is sort of awkward in something like JavaScript:

(function(f, a) {
  return function(b) { return f(a, b); };
})(add, 3)(2);

Back to your regularly scheduled blog post

Starting at the top: class Monoid a where defines something called a typeclass, which you can think of as being similar to an interface. Like other implementations of interfaces, it can be “polymorphic”; we don’t have to say that it only applies to Strings or Integers or whatever. We can just say that it applies to some type, and I’ll let the a stand-in for that type. That a is really important too, because all the following lines make use of it! mempty gives you back the “neutral” element that we talked about before. This is the element that doesn’t change things when it is combined with other elements. Since it is just an element, it simply has type a; if we’re talking about Strings it would type String (and be the empty string). If we’re dealing with addition, it would have type Int (and be 0). mappend is the operation from above, it combines two elements to produce another element:

Sum 0     `mappend` Sum 1         -- equals 1
Product 1 `mappend` Product 5     -- equals 5
""        `mappend` "foo"         -- equals "foo"

Now, ther is a bit of weirdness about the Sum and Product above (can you guess why? I’ll explain in just a bit but just see that it works basically like I sketched out above). Lastly we come to mconcat, this guy is the logical extension of mappend. If you had a big list of things and you just put mappends between them all, that’s mconcat. This works because we know that the order that we group things can’t affect the outcome!

("foo" `mappend` "bar") `mappend` "baz" == "foo" `mappend` ("bar" `mappend` "baz")
-- True

And, in fact, it is okay to write things like:

"foo" `mappend` "bar" `mappend` "baz"
-- "foobarbaz"

Cool. Right that bit about there being a “Sum” and a “Product” that snuck in. The reason is that there are two ways to think about numbers as being monoids: as sums and as products. They both have numbers as their elements, but they have different memptys and mappends! In haskell if you just do:

1 `mappend` 5 -- error

Ambiguous type variable `a0' in the constraints:
  (Num a0) arising from the literal `1' at :38:1
  (Monoid a0)

Ugh, yeah, it isn’t the most clear, but the issue is that haskell doesn’t know which to pick, should it use mempty = 1 or mempty = 0? Okay, okay. Monoids. They’re a thing. How does this tie back into the refactoring story from above? Well, there are two other cool things that work as monoids: Maybe and Lists!

Refactoring with monoids

I’ll show you what I’m plotting and then reveal to you how we get here. It’ll be like an episode of Star Trek where some crazy thing goes down (Enterprise explodes) and then they’re all like “48 Hours Earlier…” (Shot of Data reading poetry or something). You know those ones. So back to my example, I had left off with this:

def to_s
  [description, name, location, user.try(:company), category.try(:name), event_type]
    .compact.join ''
end

bit of Ruby code, a nice regression test, but a slightly metallic taste in my mouth; perhaps zinc with a dash of bismuth. I liked what I had come up with but I still had that lingering feeling that I had missed something more essential. Until one morning this:

to_s :: Event -> String
to_s e = mconcat [eventDescription e, eventName e, eventLocation e, eventName e]

popped into my head. If you’ve followed the monoid description so far this should generally make sense. The above function is a bit like doing:

eventDescription `mappend` eventName `mappend` ...

which conveniently, is just what mconcat does. The behavior is that any two strings get joined together and, since the result is a string it can in turn be joined again. Empty strings are just skipped, so that the result of all that is just the concatenation of all non-empty strings! But I hear some muttering… what I showed there only works if we’re guaranteed to have at least empty strings for each of our method/function calls. In the Ruby version, there was a very real possibility that certain calls could come up nil! What then?

Two ways to get the bugs out

Yeah, that’s the issue right? That we live in an unreliable world. Those method calls represent some unreliable real-world data! We pulled it out of a database or we got it from who-knows-where. What then? I can think of two things right away.

Types

Here’s what creating an event datatype might look like in Haskell:

module Event (Event, mkEvent, eventDescription,
              eventName, eventLocation, eventType) where

data Event = Event { eventDescription :: String
                   , eventName        :: String
                   , eventLocation    :: String
                   , eventType        :: String
                   } deriving Show

mkEvent = Event "" "" "" ""

There’s a lot of Haskell syntax going on here, but the idea is simple. We’re forcing everyone to use the mkEvent “smart constructor” in order to make events. Here’s an example:

mkEvent {eventDescription = "Fun birthday party", eventName = "My Birthday"}
-- Event {eventDescription = "Fun birthday party", eventName = "My Birthday", eventLocation = "", eventType = ""}

mkEvent is a pattern called a smart constructor. Inside mkEventwe’re providing default values for each field so that any we omit get automatically filled with empty strings. Lastly, in our module declaration, we’re only exporting mkEvent (and the “getters”). This means that outside this module you can’t create an event except by using mkEvent. We know that it is always safe to use an event in the way that we were above because the only way we can have an event “in the wild” is if it has been properly constructed! Effectively we’re putting all the work on whoever is providing an event. If we go and fetch a so-called “event” from the database, you’ll have to have something to fill in each field otherwise it isn’t going to be an event:

SELECT (description, name, location, type)
FROM events
WHERE type = 'birthday';

1 row
    description          name            location     type
------------------+--------------------+-----------+------------+
'Super fun b-day' | 'My 32nd Birthday' |    NULL   | 'birthday' |

-- Uh, oh! My db code has to figure out what to do here...

But it is all sunshine and roses back up in application-land because we can just merrily go on our way, if we see an Event we know it is an event. But it leaves the weird situation that our DB code has to know about an app-level thing… an Event. We may just want the db to preserve NULLs and let something else figure out what is going on.

More monoids!

Really. Yup! Because other things besides Sums, Products, and Strings make monoids. You may be been starting to get a sixth sense for this stuff, but here it is. The Maybe!

data Maybe a = Nothing | Just a

Maybe goes by many names, but what it represents is pretty intuitive. It is something that’s there or not. It is like nil, but a bit more civilized. It is your flakey friend that sometimes doesn’t show up, but who does so consistently so that you’ve learned to just write a “?” by their name whenever you invite him somewhere. It turns out that Maybe is also a monoid! I bet you can guess what the “zero-y” element is: Nothing! And what would it look like to put two Maybes together?

Just "foo" `mappend` Nothing = Just "foo"
Nothing `mappend` Just "foo" = Just "foo"
Just "foo" `mappend` Just "bar" = Just "foobar"

(Note: There’s some behavior here that I haven’t talked about yet. Did you notice that Just "foo" combined with Just "bar" also combined their contents? I’ll get to that.) The monoid behavior fits really nicely because there’s a built-in concept of “nothing to see here” in Maybe. It makes a lot of sense then to model our DB-returned data as:

eventDescription :: Maybe String

Instead of just being strings, these are only potential strings. This makes sense and eliminates ambiguity. In the previous solution, if we saw an empty string, it could be for one of two reasons: it was found and is empty, or it wasn’t found at all. Was NULL in the database? Was it there but just happened to be empty? Is the location of the birthday a guarded secret? Maybe is important not for avoiding null-pointer blowups (though it is great for that), but because it makes meaning clearer it moves null-ness out-of-band. So let’s go with that. All of our event functions are a bit unreliable, but they are predictably unreliable which is okay. Here’s how the definition of Event and mkEvent change:

data Event = Event { eventDescription :: Maybe String
                   , eventName        :: Maybe String
                   , eventLocation    :: Maybe String
                   , eventType        :: Maybe String
                   } deriving Show

mkEvent = Event Nothing Nothing Nothing Nothing

What does that make of our original to_s function?

to_s :: Event -> Maybe String
to_s e = mconcat [eventDescription e, eventName e,
                  eventLocation e, eventType e]

it looks basically the same! All I changed was the return type. Our function could have all Nothings and so then itself be Nothing. Now that we’re up in our application code, we’re in a position to make a choice about that. In the case where everything is Nothing (woah, like deep man), it probably makes sense to just say our event description should be an empty string:

to_s :: Event -> String
to_s e = maybe "" id (mconcat [eventDescription e, eventName e, eventLocation e, eventType e])

I’m using maybe (the function) to give default values for the two cases:

  1. If the whole thing evaluates to Nothing, then use the empty string
  2. Otherwise, just use the value you got (it is id here because maybe lets you run a function on the value inside the Just, e.g. Just “fred” means give me id "fred" = "fred", id just returns its argument)

So how does this work? Like I hinted at earlier, the stuff inside the Maybe is getting joined together. There are multiple levels of monoid going on

  1. Maybe is a monoid
  2. Strings are monoids

Maybe is a monoid in an interesting way, it says that its contents must also be a monoid!

instance Monoid a => Monoid (Maybe a) where
  mempty = Nothing
  ...
  Just m1 `mappend` Just m2 = Just (m1 `mappend` m2)

In the words of Ted Theodore Logan, “woah”. The first interesting bit is what happens when we have two Just values. We combine those inner values. And how do we do it? With what else? mappend! We know that we can do that because of the other interesting bit of the instance definition instance Monoid a => Monoid (Maybe a) where, there’s a “fat arrow” there that constrains the a to also be a monoid! You can read it like “Given that a is a monoid, then Maybe a is a monoid too, here’s how…” Knowing what we do about the String and the Maybe monoid gives us just the behavior we saw:

Just "foo" `mappend` Just "bar" = Just "foobar"

Maybe could be sure that it is allowed to join its contents (the Strings) because that’s what the monoid definition says for the content of a Maybe!

Wrapping up

So where did we get? I started out with this code:

def to_s
  string = ""
  string += self.description if self.description
  string += self.name if self.name
  string += self.location if self.location
  string += self.user.try(:company) if self.user
  string += self.category.name
  string += self.event_type
  string
end

and ended up with this ruby:

def to_s
  [description, name, location, user.try(:company), category.try(:name), event_type]
    .compact.join ''
end

or this in Haskell:

to_s :: Event -> String
to_s e = maybe "" id (mconcat [eventDescription e, eventName e, eventLocation e, eventType e])

But we learned that there’s some cool patterns behind the idea of gluing things together that’s shared by addition, multiplication, and string concatenation (and other stuff too). One notably cool one that I skipped is Ordering. This is the typeclass that lets you sort things (put them in an order). It just says if something is LT, GT, or EQ to something else. So you could have a list of things that can be sorted many different ways:

data User = User Name Age Height
userList = [User "Jimbo" 32 5.9, User "Molly" 30 5.5, User "Gary" 32 6.0]

and you could have several different sorting functions:

age, name, height :: User -> User -> Ordering

all of them take two users and then return either LT, GT, or EQ. What’s cool is that Ordering belongs to the Monoid typeclass! This means we can combine our sorting functions in a really nice way:

sortBy :: (a -> a -> Ordering) -> [a] -> [a]

sortBy (age `mappend` height `mappend` name) userList
[User "Molly" 30 5.5, User "Jimbo" 32 5.9, User "Gary" 32 6.0]

The users got sorted first by age, then by height, and lastly by name. We were able to combine the sorting functions just using the monoid machinery. There’s lots more out there to find. What I like about learning other languages is that they give me a larger palette of ideas to work with. Haskell seems to be an almost inexhaustible store of very general patterns that I’ve been happily mining for a while. If it stopped there though, that would be cool, for what it’s worth. But it couples these really interesting patterns and interfaces with a language that has some tricks up its sleeve. In my next article I’ll develop and compare some apps that are in my field: web applications. If you’re familiar with Rails I’ll introduce you to Yesod. Also, here’s part 1 of this series.