Teaching, Playing, and Programming: functional programming

Showing posts with label functional programming. Show all posts

Thursday, October 22, 2009

Binomial queues as a nested type

Warning: this is probably only of interest to functional programmers.

Maciej Kotowicz recently asked on the Haskell Cafe mailing list about implementing binomial queues (also known as binomial heaps) using fancy type machinery so that the type-checker can enforce the shape invariants of the data structure. This reminded me of a discussion I had with some colleagues in the summer of 1998 about using nested types to enforce these kinds of invariants. A little digging around in mail archives yielded this email, presented here with some light formatting and a few fixed typos.

I've been playing with binomial queues as a nested type, and I think you'll find the end result interesting.

REVIEW

Let me first quickly review the usual implementation of binomial queues. Recall that a binomial tree has the shape

data Tree a = Node a [Tree a]

A binomial tree of rank k has k children, of ranks k-1 .. 0 (stored in the list in decreasing order of rank). Note that we can combine two heap-ordered binomial trees of rank k to get a heap-ordered binomial tree of rank (k+1) as follows:

combine a@(Node x xs) b@(Node y ys)
   | x <= y    = Node x (b : xs)
   | otherwise = Node y (a : ys)

Now a binomial queue is a list of heap-ordered binomial trees of increasing height (but not necessarily of consecutive ranks). We'll represent the ranks of those trees that are present by their position in the tree. Thus, some ranks will be empty. This could be implemented as

type Binom a = [Maybe (Tree a)]

or somewhat more efficiently as

data Binom a = Nil
             | Zero (Binom a)
             | One (Tree a) (Binom a)

or better still by unboxing the Tree in the One constructor

data Binom a = Nil
             | Zero (Binom a)
             | One a [Tree a] (Binom a)

I won't go over all the operations -- they are amply covered elsewhere. I'll just describe two functions. The first, add, takes a tree and a list (where the tree has the same rank as the first position in the list), and returns a new list. It works basically like the increment function on binary numbers.

add :: Ord a => a -> [Tree a] -> Binom a -> Binom a
add x xs Nil = One x xs Nil
add x xs (Zero h) = One x xs h
add x xs (One y ys h)
  | x <= y    = Zero (add x (Node y ys : xs) h)
  | otherwise = Zero (add y (Node x xs : ys) h)

The merge function is similar and works basically like the addition function on binary numbers.

Finally, the getMin function returns the minimum element of a queue paired with the queue without that element. The helper function getMin_ returns a triple of the minimum element in some suffix of the queue, the list of children associated with that minimum element, and the suffix without that element.

getMin_ :: Ord a => Binom a -> (a, [Tree a], Binom a)
getMin_ (Zero h) = case getMin_ h of
                     (y,ys,h') -> (y,ys,Zero h')
getMin_ (One x xs Nil) = (x,xs,Nil)
getMin_ (One x xs h) = case getMin_ h of
                         (y,ys,h') | x <= y    -> (x,xs,Zero h)
                                   | otherwise -> (y,ys,One x xs h')

getMin :: Ord a => Binom a -> (a, Binom a)
getMin h = let (x,xs,h') = getMin_ h
           in (x,merge (list2binom xs Nil) h')

list2binom [] h = h
list2binom (Node x xs : ys) h = list2binom ys (One x xs h)

Note that when getMin receives a list of children, it converts them to a valid binomial queue by reversing the list (and changing each pair of a Node constructor and a (:) constructor to a One constructor).

NESTED REPRESENTATION -- FIRST ATTEMPT

By following the same approach we have been using to design nested datatypes, we get the following representation of binomial queues.

data Binom_ a b = Nil
                | Zero (Binom_ a (Trees a b))
                | One a b (Binom_ a (Trees a b))

type Trees a b = (a, b, b)

type Binom a = Binom_ a ()

Here the list of children

  (Node x3 xs3 : Node x2 xs2 : Node x1 xs1 : Node x0 xs0 : [])

is being represented by the nested triples

  (xs3,xs3', (x2,xs2', (x1,xs1', (x0,xs0', ()))))

(where xsN' is the appropriate conversion of xsN).

All the functions are perfectly straightforward, except for getMin and getMin_ which must incrementally build up the reversing function to convert

  (xs3,xs3', (x2,xs2', (x1,xs1', (x0,xs0', ()))))

  One x0 xs0' (One x1 xs1' (One x2 xs2' (One x3 xs3' Nil)))

As a result of building up this reverse function incrementally, this implementation ends up being roughly 10% slower than the original.

INTERLUDE ON REVERSING LISTS

Suppose we want to support the following ADT. We want sequences with three operations:

empty :: ReversableSeq a
cons  :: a -> ReversableSeq a -> ReversableSeq a
rev   :: ReversableSeq a -> [a]

with the obvious semantics. cons must take O(1) time, but rev may take O(n) time. A list is a reasonable implementation with

type ReversableSeq a = [a]
empty = []
cons = (:)
rev = reverse

but, if you think about it, there's a fair amount of interpretive overhead in the pattern matching that reverse performs at every step. Way back in 1985, John Hughes came up with a representation that avoids this overhead:

type ReversableSeq a = [a] -> [a]
empty = id
cons x xs = xs . (x:)
rev xs = xs []

The result of cons 1 (cons 2 (cons 3 empty)) is

id . (3:) . (2:) . (1:)

which, when applied to [] by the rev function, yields [3,2,1]. One way to think of this representation is as lists with "holes" at the ends. The cons operation fills in this hole with an element and another hole, and the rev operation fills in the hole with []. (This is quite similar to difference lists in logic programming...)

Applying this trick to the regular representation of binomial queues yields

data Binom a = Nil
             | Zero (Binom a)
             | One a (Binom a -> Binom a) (Binom a)

which turns out to be about 10% faster than the original representation.

NESTED REPRESENTATION -- SECOND ATTEMPT

Ok, let's try to make a nested datatype out of this. Conceptually, the nesting should keep track of our position in the queue so that we know what rank the current tree has and can't possibly mix up trees of different ranks. We hypothesize some Base type for the beginning of the list, and some type transformer Succ that modifies the type as we move down the queue.

type Base = ???
type Succ b = ???  -- this might need to be Succ a b

type Binom a = Binom_ a Base

data Binom_ a b = Nil
                | Zero (Binom_ a (Succ b))
                | One a (Binom_ a ??? -> Binom_ a ???) (Binom_ a (Succ b))

Now, what should the missing types in the One constructor be? Well, the function (Binom_ a ??? -> Binom_ a ???) represents the children of the current node. If this node is of rank k, then these children are of ranks 0 .. k-1. The function (Binom_ a ??? -> Binom_ a ???) takes a queue starting at rank k and adds the children to the front of it, yielding a queue starting at rank 0. So the ???'s can be filled in as

                | One a (Binom_ a b -> Binom_ a Base) (Binom_ a (Succ b))

or just

                | One a (Binom_ a b -> Binom a) (Binom_ a (Succ b))

Now, what should the types Base and Succ be? It doesn't matter because we never construct data of those types, we simply use the types to discriminate between positions in the queue. So we can define Base and Succ as

type Base = Void
newtype Succ b = S Void

I think this is a very interesting type for several reasons:

It includes an arrow type, which we haven't seen much.
The RHS of the datatype definition (in fact, just the One constructor) has occurrences of Binom_ at no fewer than three different types
```
Binom_ a b
Binom_ a Base
Binom_ a (Succ b)
```
I've seen two before, but not three.
It includes types which are never used, but are merely there to capture certain invariants (in this case, that trees of different ranks should never be confused). [2009 Comment: Today, these are called phantom types.] In some ways this might make this type less interesting, but an interesting consequence is that the code satisifes a certain "type erasure" property: if you erase the types from all the functions, you get code for the optimized regular representation.

Because of this "type erasure" property, you might expect it to be just as fast as the optimized regular representation, but in fact it is roughly 10% slower -- or roughly as fast as the original regular representation. This is because the extra types hinder a certain optimization that the programmer might make (and which I did make in the optimized regular representation).

Recall the type of the getMin_ function from the original regular representation, and the way it was used by getMin.

getMin_ :: Ord a => Binom a -> (a, [Tree a], Binom a)

getMin :: Ord a => Binom a -> (a, Binom a)
getMin h = let (x,xs,h') = getMin_ h
           in (x,merge (list2binom xs Nil) h')

For the optimized regular representation, this becomes

getMin_ :: Ord a => Binom a -> (a, Binom a -> Binom a, Binom a)

getMin :: Ord a => Binom a -> (a, Binom a)
getMin h = let (x,xs,h') = getMin_ h
           in (x,merge (xs Nil) h')

Now, consider the optimized nested representation. We can't just write

getMin_ :: Ord a => Binom_ a b -> (a, Binom_ a b -> Binom a, Binom_ a b)

because we don't know the rank of the tree that the second component of the triple came from. (Note that if we had existential types, we could write

getMin_ :: Ord a => Binom_ a b -> 
                    (a, exists c. Binom_ a c -> Binom a, Binom_ a b)

getMin :: Ord a => Binom a -> (a, Binom a)
getMin h = let (x,xs,h') = getMin_ h
           in (x,merge (xs Nil) h')

Some implementations do in fact support existential types, but in limited ways that would carry efficiency costs of their own...)

Instead of returning the children and then applying them to Nil at the end, we apply the children to Nil first, and then return the result, yielding the type

  
getMin_ :: Ord a => Binom_ a b -> (a, Binom a, Binom_ a b)

getMin :: Ord a => Binom a -> (a, Binom a)
getMin h = let (x,xs,h') = getMin_ h
           in (x,merge xs h')

This depends on laziness to evaluate only the children of the final minimum, and not all the children of the temporary minimums along the way. However, this does mean that we build a lot more thunks of the form (xs Nil) along the way. And building these thunks is apparently quite expensive.

2009 Addendum Ralf Hinze considers a similar representation in Section 6 of Numerical Representations as Higher-Order Nested Datatypes. Today, you would probably use fancier type machinery like dependent types or maybe GADTs, but it's still amazing how far you can get with only nested types.

Tuesday, July 8, 2008

Breadth-First Numbering: An Algorithm in Pictures

I've always enjoyed the mathematical tradition of “proofs without words”, so I wanted to see if something similar could be done with an algorithm. Of course, an algorithm is typically more complicated than a mathematical identity, so it's not surprising that an algorithm typically can't be captured in a single picture. Instead, I've found the idea of a commutative diagram to be a useful pictorial shorthand, where the short way around the diagram gives a high-level view of the operation, and the long way around gives more detail. Here is an attempt to describe my breadth-first numbering algorithm from ICFP 2000 using this idea.

Wanted

Rules

Example

Wednesday, July 2, 2008

Functional Programming in...Ada?

Ada is not the first language that comes to mind when you want to do functional programming. (You in the back, stop laughing!) Why, with excellent functional programming languages like Haskell or ML easily available, would I choose to do a functional programming project in Ada? (I CAN STILL HEAR YOU LAUGHING!) Partly because we use Ada in some courses at our institution, and I wanted a library that could help in teaching recursion to novices. But, honestly, mostly I just wanted to see if it could be done.

“It's like this fellow I knew in El Paso. One day, he just took all his clothes off and jumped in a mess of cactus. I asked him that same question, ‘Why?’...He said, ‘It seemed like a good idea at the time.’”
– Steve McQueen, The Magnificent Seven

Somewhat to my surprise, functional programming actually works reasonably well in Ada. You wouldn't want to program a large system this way, but small-scale projects are quite possible, albeit much more verbose than functional programmers are used to.

Instrumenting recursive functions

My application is going to be a library for instrumenting recursive functions. I want to be able to take a recursive function and run it with different combinations of monitors. Each monitor executes a little bit of extra code every time the recursive function is called, including recursive calls. Simple examples of useful monitors include counting the number of calls, or tracking the maximum call depth, or printing trace information on every call or return. More complicated examples include adding memoization to a recursive function or automatically detecting when a function is stuck in an infinite recursion.

Ideally, the library should be easy enough for students to use, but I'll settle for it being easy enough for instructors to use.

Of course, there are many ways to design such a library. I'm going to use a design based on ideas from functional programming. The basic ideas have been folklore in the functional programming community for a long time. For example, I remember doing something similar in SML in the early '90s. Bruce McAdam wrote a nice tech report describing these techniques in great detail. The trick here is going to be figuring out how to implement this design in Ada.

(Incidentally, if anybody has a truly object-oriented approach to solving the same problem, I'd be interested in seeing it.)

Preparing the recursive function

Without some kind of support for reflection, it seems too much to hope that I'll be able to observe the recursive calls inside a recursive function. Therefore, I'm willing to require some modifications to the recursive function, as long as those modifications are fairly straightforward. However, I should only need to modify the function once, rather than every time I run it with a different monitor.

Stealing an idea from functional programming called the Y combinator, I'll modify each recursive function to take an extra argument that is itself a function (or, more specifically, a pointer to a function). The recursive function will call this passed-in function wherever it would normally call itself. The passed-in function will eventually call the recursive function, but may run some other code first.

Here's an example using the ever popular Fibonacci function. The original recursive function might be written

 function Fib(X : Integer) return integer is
 begin
   if X <= 1 then
     return 1;
   else
     return Fib(X-1) + Fib(X-2);
   end if;
 end Fib;

Adding the extra argument and replacing the recursive calls with calls to the passed-in function yields

 function Fib(Rec : access function(X : Integer)
                    return Integer;
              X : Integer) return integer is
 begin
   if X <= 1 then
     return 1;
   else
     return Rec(X-1) + Rec(X-2);
   end if;
 end Fib;

where the changes are shown in red. Note that access function(X : Integer) return Integer is the type of a pointer to a function that takes an integer and returns an integer. In the rest of this post, I'll assume that the functions being instrumented always take an integer and return an integer. If I can get this to work at all, then generalizing this to arbitrary argument and return types will be trivial using Ada's notion of generics.

Running the recursive function

Now, to simply run this function recursively without attaching a monitor, I'll call it with Run, which is responsible for “tying the knot” by somehow making the function call itself. The Run function is written

 function Run(Fun : ...type to be filled in 
                       later...
              X : Integer) return Integer is
   function Rec(X : Integer) return Integer is
   begin
     Fun(Rec'Access, X);
   end Rec;
 begin
   return Rec(X);
 end Run;

Here, Fun is a function like the modified Fib above. Run defines a local helper functon, Rec, that merely passes itself to Fun.

I would call this function by writing something like

   ...Run(Fib'Access, 10)...

where Fib'Access is Ada-speak for a pointer to the Fib function. This calls Rec(10), which calls Fib(Rec'Access,10), which in turn calls Rec(9), which in turn calls Fib(Rec'Access,9), and so on. Eventually, the whole thing returns 89, as expected.

I left out the type of Fun above, because it starts to get unwieldy. I can figure out what this type should be by looking at the type of Fib. The full signature of Run is

 function Run(Fun : access function
                (Rec : access function(X:Integer)
                       return Integer;
                 X : Integer) return Integer;
              X : Integer) return Integer;

A small lie

What I'd really like to be able to do is define a type abbreviation

 type Fun_Type is access function
   (Rec : access function(X:Integer)
          return Integer;
    X : Integer) return Integer;

and then write the signature of Run as

 function Run(Fun : Fun_Type; 
              X : Integer) return Integer;

Unfortunately, I can't. Or rather, I can, but it turns out to be useless. The problem is that Ada is really paranoid about what you do with pointers to functions. In particular, it won't let a pointer to a function match a named type if that function is defined in a deeper lexical scope than the type definition. This restriction does not apply to so-called anonymous access types, so I'm forced to rewrite the entire type everywhere it's used rather than giving it a name and referring to the name.

However, to simplify the presentation, I'm going to pretend that I can use Fun_Type as defined above. Just remember that, in the real implementation, every occurrence of Fun_Type needs to be replaced with the more verbose anonymous type. I'll write Fun_Type in italics as a reminder that it's not the real code.

A simple monitor

Now, here is a simple monitor that counts the number of calls to the recursive function, and prints out that number at the end.

 function Count(Fun : Fun_Type;
                X : Integer) return Integer is

   Num_Calls : Integer := 0;

   function My_Fun
       (Rec : access function(X:Integer) 
              return Integer;
        X : Integer) return Integer is
   begin
     Num_Calls := Num_Calls + 1;
     return Fun(Rec,X);
   end My_Fun;

   Result : Integer := Run(My_Fun'Access,X);
 begin
   Put("Number of calls = ");
   Put(Num_Calls,0);
   New_Line;
   return Result;
 end Count;

Now, when I call

   Put( Count(Fib'Access,10) );

I get

Number of calls = 177
         89

where the first line is printed by the monitor and the second line is printed by the external Put.

The Count monitor creates a new function My_Fun to give to Run. My_Fun simply increments Num_Calls and calls Fun. Now, every time Fun makes a recursive call to Rec, Rec will call My_Fun, which calls Fun.

So basically, Count “wraps” Fun with a little bit of code to increment the Num_Calls variable, and this code gets executed every time the Rec function is called.

In addition, Count does a little bit of work around the top-level recursive call, namely initializing the Num_Calls variable to 0, and printing the number of calls at the end.

A memoization monitor

Here's a more complicated monitor. This one modifies the recursive function to do memoization (first cousin to dynamic programming). The idea is that, when a function is called multiple times on the same arguments, you can save time by remembering those arguments and the corresponding results. Then when the function is called with arguments you have seen before, you can simply return the stored result, instead of recomputing it.

 function Memoize(Fun : Fun_Type; 
                  X : Integer) return Integer is

   Results : array (0..X) OF Integer;
   Ready   : array (0..X) OF Boolean 
               := (others => False);

   function My_Fun
       (Rec : access function(X:Integer) 
              return Integer;
        X : Integer) return Integer is
   begin
     if not Ready(X) then
       Results(X) := Fun(Rec,X);
       Ready(X) := True;
     end if;
     return Results(X);
   end My_Fun;

 begin
   return Run(My_Fun'Access,X);
 end Memoize;

The difference can be dramatic.

   ...Memoize(Fib'Access, 30)...

executes the body of Fib a total of 31 times, whereas

   ...Run(Fib'Access, 30)...

executes the body of Fib almost 2.7 million times (2692537 times, to be exact).

I made an assumption above that, if the original argument was X, then the range of possible arguments in all the calls is 0..X. This is good enough to illustrate the idea, but a more general implementation might be parameterized by the bounds to use, or by a function that calculates those bounds from the original X.

Multiple monitors

How did I know above that Run executes the body of Fib 2692537 times? By running Fib with the Count monitor. How did I know that Memoize executes the body of Fib 31 times? By running Fib with both the Memoize and Count monitors together.

Except that, as currently written, the Memoize and Count monitors can't be used together. I can run Fib with one or the other, but not both simultaneously.

What I want is a way to layer one monitor on top of another, building arbitrary stacks of monitors. The clues to how to do this are already in the current implementations of Memoize and Count.

Notice that both Memoize and Count have the same interface as Run. In other words, a monitor is an alternative run function that adds some extra machinery to the basic Run. Further, notice that both Memoize and Count build a My_Fun function by wrapping some extra code around Fun and then run My_Fun using the Run function. The key insight is that we can parameterize Memoize and Count by which run function to use for running My_Fun. That might be the basic Run function, but it might also be a run function corresponding to a different monitor.

To achieve this parameterization, I'll use generics instead of passing function pointers. The existing definitions of Memoize and Count will still work, but they need to be preceded by additional code to take a run function as a generic parameter. For example,

 generic
   with function Run(Fun : Fun_Type; 
                     X : Integer) return Integer;
 function Count(Fun : Fun_Type; 
                X : Integer) return Integer;

 function Count(Fun : Fun_Type; 
                X : Integer) return Integer is
   ...
     ... Run(My_Fun'Access, X);
   ...

The new parts are in red. I didn't change the existing code of Count at all, but now the call to Run in the body of Count refers to the run function passed in as a generic parameter.

Using a monitor is now a two-stage process, first instantiating the generic function to create the desired run function, and then calling that run function. For example,

 function C_Run is new Count(Run);
 function MC_Run is new Memoize(C_Run);

 ...MC_Run(Fib'Access, 30)...

Notice that the basic Run function is always used in the first instantiation, where it serves as the base of the stack of monitors.

But wait! Something strange is happening here. This program displays 59 calls, not 31 calls as expected. However, if we layer the monitors in the opposite order

 function M_Run is new Memoize(Run);
 function CM_Run is new Count(M_Run);

 ...MC_Run(Fib'Access, 30)...

then we get the 31 calls.

It's really a question of when the memoization code checks for a repeated argument relative to when the counting code increments the counter. If the memoization check happens before the increment, we get 31 calls. If the memoization check happens after the increment, we get 59 calls. Either way, the body of the original Fib function is only executed 31 times.

And that's it, really. There's still a lot of grunt work involved in fleshing out a complete, robust library, but all the important design decisions have already been made. Implementing a wide variety of monitors is fairly easy using Count and Memoize as templates.

Generics vs function pointers

One aspect of this design deserves a little more discussion. I'm mixing two different styles of passing functions as arguments, using generics in some places and function pointers in other places. Why not just use one style consistently? Because the two styles of passing around functions have different strengths and weaknesses.

One major difference is that generics work much better than function pointers when you want to return a function. For example, the generic Count monitor takes a run function and returns a run function. It's easy to imagine writing Count as

 function Count(Run : Run_Type) 
                return Run_Type is ...

where Run_Type is the appropriate access function type. But, in practice, this doesn't work because Ada lacks closures. The natural way to write this would be

 function Count(Run : Run_Type) return Run_Type is

   function My_Run(Fun : Fun_Type; 
                   X : Integer) return Integer is
     ...

 begin
   return My_Run'Access;
 end Count;

but you can't return My_Run'Access because My_Run goes away as soon as Count returns.

Note that it is possible to get around this restriction against returning function pointers by re-writing strategic code fragments in continuation-passing style, so that the function pointers in question can be passed to a continuation instead of returned. This works fine on a small scale, but quickly blows the runtime stack when attempted on a large scale.

A second major difference is that it is easier to write third-order (or even higher-order) functions using function pointers than using generics. A first-order function takes an ordinary value as an argument. A second-order function takes a first-order function as an argument. A third-order function takes a second-order function as an argument, and so on. Now, consider a monitor like Count. Count takes a Run function, which takes a Fun function, which takes a Rec function, so Count is fourth-order!

Currently, I pass the Run functions using generics, but the Fun functions using function pointers. (Ignore the Rec functions for now.) Suppose I wanted to pass both Run functions and Fun functions using generics. This would mean that each Run function would take its Fun function as a generic parameter. But that means that, when we pass a Run function to a monitor like Count, we need to pass it as a generic function. In other words, Count would need to be a generic function that takes a generic function as an argument. That's not allowed in Ada. Or, more truthfully, it is sometimes allowed, but it's extremely painful.

In Ada, a generic function can neither take a generic function as an argument nor return a generic function as a result. However, a package can contain generic components, so you can sometimes fake it wrapping the argument or result in a package. For example, you can fake a generic function that returns a generic function by writing a generic package that contains a generic function as a component.

In a limited way, you can do the same thing in the other direction. If you want a generic function to take a generic function as an argument, you can fake it by wrapping the argument in a package, so that the outer generic function takes a package (which contains a generic function as a component). The catch is that you can't instantiate the outer generic function with just any old package meeting a certain specification, but only with packages that are themselves instantiations of a single generic package. Making all that work is an exercise in pain that I leave to the most masochistic of readers.

Friday, February 8, 2008

Ten Years of Purely Functional Data Structures

In 1998, I published a book called Purely Functional Data Structures. Ten years later, the book is still selling well. (Every time I get a royalty check, my wife the skeptic says “People are still buying that?!”) The ten-year anniversary seems like a good time to reflect back on my experience with this book.

Origins

My first introduction to functional programming was at Carnegie Mellon University, where I worked for Peter Lee and Phil Koopman on an implementation of a subset of Haskell (called “Eddie”), written in Standard ML. So right from the very beginning, I was working with a mix of strict and lazy evaluation.

I've always been fascinated by data structures, and I soon realized that something was strange about data structures in these two languages. On the one hand, when they worked, data structures were more pleasant to work with than in any other languages I had ever used before. On the other hand, sometimes they just didn't work.

The issue was immutability. In pure functional programming, all data structures are immutable, meaning that they cannot be changed once created. Of course, data structures frequently need to be changed, so what happens is that you create a new copy of the data structure that incorporates the change, without actually modifying the old copy. This can often be done efficiently by having the new structure share large chunks of the old one. For example, stacks are trivial to implement this way. Tree-like structures such as binary search trees and many kinds of priority queues are almost as easy.

However, some common data structures, such as FIFO queues and arrays, can actually be quite difficult to implement this way. Some people take this as a sign that functional languages are impractical. However, it's not the fault of functional programming per se, but rather of immutability. Immutable data structures are just as awkward to implement in conventional languages such as Java. (Of course, immutable data structures also offer huge advantages, which is why, for example, strings are immutable in Java.)

In 1993, I ran across an interesting paper by Tyng-Ruey Chuang and Benjamin Goldberg, showing how to implement efficient purely functional queues and double-ended queues. Their result was quite nice, but rather complicated, so I started looking for a simpler approach. Eventually, I came up a much simpler approach based on lazy evaluation, leading to my first paper in this area. (When I say simpler here, I mean simpler to implement—the new structure was actually harder to analyze than Chuang and Goldberg's.)

Almost immediately after my paper on queues, I worked on a paper about functional arrays, here called random-access lists because they combined the extensibility of lists with the random access of arrays. (Because the lead time on journal papers is longer than the lead time for conferences, this paper actually appeared before the paper on queues, even though it was written later.) The approach I used, which turned out to be very similar to one used earlier by Gene Myers, was based on properties of an obscure number system called skew binary numbers. This marked the beginning of a second theme of my work, that of designing implementations of data structures in analogy to number systems. For example, inserting an element into a data structure is similar to incrementing a number, and merging two data structures is similar to adding two numbers. I called this approach numerical representations. The idea was to choose or create a number system with the properties you want (such as constant-time addition), and then build the data structure on top of that. The advantage of this approach is that you can often do the hard work in a simplified setting, where you don't have to worry about carrying around the actual data in your data structure.

Amusingly, this paper marked the second round of a circle dance between Rob Hoogerwoord, Tyng-Ruey Chuang, and me. First, Hoogerwoord published a paper on functional queues, then Chuang, then me. And again with functional arrays, first Hoogerwoord published a paper, then Chuang, then me.

Around this time, I was looking for a thesis topic (that period that John Reynolds describes as “making even the best student look bad”). Fortunately, my advisor Peter Lee had patience with me during this search. I considered a bunch of topics related to functional programming, but none of them seemed right. I was enjoying the work I was doing with functional data structures, but there didn't really seem to be enough meat there for a dissertation. That changed one day during a walk in the snow.

I had become fascinated by the idea of concatenation (appending two lists). This is trivial to do in conventional languages using, say, doubly-linked lists. But it's much harder to do efficiently with immutable lists. I thought that I had come up with a new approach, so I made an appointment to talk to Danny Sleator about it. He had been involved in some of the early work in this topic (along with Neil Sarnak and Bob Tarjan), and his office was just a few floors below mine. The day before the meeting, I realized that my approach was completely broken. Afraid of looking like an idiot, I went into a frenzy, playing with different variations, and a few hours later I came up with an approach again based on lazy evaluation. I was sure that this approach worked, but I had no idea how to prove it. (Again, this was a case where the implementation was simple, but the analysis was a bear.) My wife had our car somewhere else that day, so I ended up walking home. The walk usually took about 45 minutes, but it was snowing pretty hard, so it took about twice that long. The whole way home I thought about nothing but how to analyze my data structure. I knew it had something to do with amortization, but the usual techniques of amortization weren't working. About halfway home, I came up with the idea of using debits instead of credits, and by the time I got home the entire framework of how to analyze data structures involving lazy evaluation had crystallized in my head.

With this framework in mind, I now believed that there was enough meat there for a dissertation, and I sailed through my thesis proposal and, about a year later, my thesis defense.

The Book

After my defense, Peter Lee suggested that I try to publish my dissertation as a book. He contacted two publishers he had worked with before, and one of them, Lauren Cowles of Cambridge University Press, bit.

The process of turning my dissertation into a book was pure joy. I thought that the basic organization of my dissertation was pretty solid, so mostly I was able to focus on adding and adjusting things to make it work better as a book. For example, I no longer had the constraint from my dissertation of having to focus on original work, so I was free to add data structures that had been developed by other people.

The main additions were expanded introductory material (such as my simplification of red-black trees, which was developed a few weeks after my thesis defense in a series of emails with Richard Bird), exercises, and an appendix including all the source code in Haskell (the main text used source code in Standard ML).

After several months of hacking, I was able to ship off a giant Postscript file to Lauren Cowles, and that was that.

Well, not quite. Lauren asked me what I wanted for cover art. My wife was an avid quilter, and I liked the idea of having a suitably geeky quilt pattern on the cover. I adapted a quilt pattern called Flying Geese, and sent a sketch to Lauren, who got a graphic artist to draw up the final version. I was quite happy with it. Later, I found out that Flying geese was actually the name of a different pattern, and to this day, I don't know the correct name of the pattern. It looks a little different on the book than in quilts because quilters usually only use two levels of the recursion, but I used three.

Reactions

Initial reviews were positive and the hardcover edition sold out pretty quickly, so Cambridge University Press released a paperback edition the following year. I don't know of any courses that used it as the main textbook (it would have to be a pretty specialized course to do that), but several mainstream courses used it as a supplementary textbook.

Sales were steady but trending downward, until 2004, when a book review by Andrew Cooke appeared on Slashdot. After that, sales doubled and have stayed at that level for several years. Of course, I can't be sure this increase was because of Slashdot, but it seems likely. Thanks, Andrew!

The most interesting place the book has shown up was in the data of a TopCoder programming contest problem involving sorting books on a bookshelf. It was part of a list of CS textbooks. (And, no, I wasn't the author of the problem, although I have written some problems for TopCoder.)

The funniest moment about the book was when I worked at Columbia University. A potential grad student from an unrelated field in CS was visiting from Germany, and I was talking to him about the department. When I told him about my research, he said, “My roommate has been reading a book about something like that. I don't remember the name, but do you know which book I mean?” I reached over and pulled my book off the shelf and said, “Do you mean this one?” His mouth fell open.

The most flattering review of the book appeared a few years ago in a blog by José Antonio Ortega Ruiz, who said

“The exposition is extremely elegant and synthetic, and i’ve spent many, many hours immersed in its 220 pages (i can’t think of any other programming book with a better signal to noise ratio, really).”

Thanks, jao!

Complaints

Of course, you can't please everybody. The most common complaints I've heard are

No hash tables. This was mentioned in the first review of the book on Amazon, which called the lack of hash tables a “major omission, likely because they don't fit well into a functional environment.” More than a year later, a second reviewer mentioned that hash tables are in fact in the book, but buried in an exercise. Both reviewers are correct. I only gave a very short description of hash tables in an exercise, because there just wasn't that much to say. A hash table is basically the composition of two finite maps, one approximate and one exact. The choice of what structures to use for those finite maps are up to you. Typically, an array is used for the approximate map, but, as the first reviewer noted, arrays do not fit well into a functional environment.
No delete for red-black trees. I present my simplified form of red-black trees very early in the book as an introduction to functional data structures, but I only describe insertions, not deletions. There were two main reasons for that omission. First, deletions are much messier than insertions, and the introduction did not seem like the right place for a lot of messy details. Second, and more importantly, probably 95+% of uses of functional red-black trees don't need deletions anyway. Because the trees are immutable, when you want to delete something, you can simply revert back to a previous version of the tree from before the unwanted item was inserted. This is almost always both the easiest and the right thing to do. The exception is when you inserted something else that you want to keep after the item to be deleted, but this is extremely rare.
Why Standard ML? In the last 10 years, Haskell has become steadily more popular, while Standard ML has largely been replaced by OCaml. A frequent complaint has been people wishing that Haskell was in the main text, with Standard ML (or OCaml) in the appendix, instead of the reverse. Even at the time, I debated with myself about the choice of Standard ML, particularly because in some chapters I had to fake extensions to the language, such as lazy evaluation and polymorphic recursion. I went with Standard ML for two main reasons. First, I wanted to be able to distinguish between lazy evaluation and strict evaluation, which was much more difficult in Haskell. Second, I really wanted to be able to give module signatures for all the common structures (stacks, queues, etc.) that I was going to be revisiting over and over. You can sort of do this in Haskell, but not cleanly.
Too esoteric/impractical. Guilty as charged! I did try to point out a few data structures along the way that were especially practical, but that was not my main focus. If it was, I certainly would not have included umpteen different implementations of queues! Instead, I was trying to focus on design techniques, so that readers could use those techniques to design their own data structures. Many of the example data structures were included as illustrations of those design techniques, not because the examples were particularly practical or valuable by themselves.

The Future?

People sometimes ask me if I'm planning a second edition. The answer is maybe, but not anytime soon. I'm still pretty happy with what I said and how I said it, so I don't feel a burning need to go back and correct things. If I wrote a second edition, it would be to modernize the presentation and to add some new data structures and techniques. For example, I might include the finger trees of Ralf Hinze and Ross Paterson.

In terms of the presentation, however, the biggest question would be what language to use. Standard ML probably doesn't make sense any more, but I'm still not sure Haskell is ready to take over, for the same reasons mentioned above. I suppose I could switch to OCaml, but I'm not sure that's enough of a change to warrant a new edition. I have faith that the vast majority of my readers are perfectly capable of making the translation from Standard ML to OCaml themselves.

Teaching, Playing, and Programming