Warning: this is probably only of interest to functional programmers.
Maciej Kotowicz recently asked on the Haskell Cafe mailing list about implementing binomial queues (also known as binomial heaps) using fancy type machinery so that the type-checker can enforce the shape invariants of the data structure. This reminded me of a discussion I had with some colleagues in the summer of 1998 about using nested types to enforce these kinds of invariants. A little digging around in mail archives yielded this email, presented here with some light formatting and a few fixed typos.
I've been playing with binomial queues as a nested type, and I think
you'll find the end result interesting.
REVIEW
Let me first quickly review the usual implementation of binomial queues.
Recall that a binomial tree has the shape
data Tree a = Node a [Tree a]
A binomial tree of rank k has k children, of ranks k-1 .. 0 (stored in the
list in decreasing order of rank). Note that we can combine two
heap-ordered binomial trees of rank k to get a heap-ordered binomial
tree of rank (k+1) as follows:
combine a@(Node x xs) b@(Node y ys)
| x <= y = Node x (b : xs)
| otherwise = Node y (a : ys)
Now a binomial queue is a list of heap-ordered binomial trees of increasing
height (but not necessarily of consecutive ranks). We'll represent the ranks
of those trees that are present by their position in the tree. Thus,
some ranks will be empty. This could be implemented as
type Binom a = [Maybe (Tree a)]
or somewhat more efficiently as
data Binom a = Nil
| Zero (Binom a)
| One (Tree a) (Binom a)
or better still by unboxing the Tree in the One constructor
data Binom a = Nil
| Zero (Binom a)
| One a [Tree a] (Binom a)
I won't go over all the operations -- they are amply covered elsewhere.
I'll just describe two functions. The first, add, takes a tree and a
list (where the tree has the same rank as the first position in the
list), and returns a new list. It works basically like the increment
function on binary numbers.
add :: Ord a => a -> [Tree a] -> Binom a -> Binom a
add x xs Nil = One x xs Nil
add x xs (Zero h) = One x xs h
add x xs (One y ys h)
| x <= y = Zero (add x (Node y ys : xs) h)
| otherwise = Zero (add y (Node x xs : ys) h)
The merge function is similar and works basically like the addition
function on binary numbers.
Finally, the getMin function returns the minimum element of a queue
paired with the queue without that element. The helper function
getMin_ returns a triple of the minimum element in some suffix of the
queue, the list of children associated with that minimum element, and
the suffix without that element.
getMin_ :: Ord a => Binom a -> (a, [Tree a], Binom a)
getMin_ (Zero h) = case getMin_ h of
(y,ys,h') -> (y,ys,Zero h')
getMin_ (One x xs Nil) = (x,xs,Nil)
getMin_ (One x xs h) = case getMin_ h of
(y,ys,h') | x <= y -> (x,xs,Zero h)
| otherwise -> (y,ys,One x xs h')
getMin :: Ord a => Binom a -> (a, Binom a)
getMin h = let (x,xs,h') = getMin_ h
in (x,merge (list2binom xs Nil) h')
list2binom [] h = h
list2binom (Node x xs : ys) h = list2binom ys (One x xs h)
Note that when getMin receives a list of children, it converts them
to a valid binomial queue by reversing the list (and changing each
pair of a Node constructor and a (:) constructor to a One constructor).
NESTED REPRESENTATION -- FIRST ATTEMPT
By following the same approach we have been using to design nested
datatypes, we get the following representation of binomial queues.
data Binom_ a b = Nil
| Zero (Binom_ a (Trees a b))
| One a b (Binom_ a (Trees a b))
type Trees a b = (a, b, b)
type Binom a = Binom_ a ()
Here the list of children
(Node x3 xs3 : Node x2 xs2 : Node x1 xs1 : Node x0 xs0 : [])
is being represented by the nested triples
(xs3,xs3', (x2,xs2', (x1,xs1', (x0,xs0', ()))))
(where xsN' is the appropriate conversion of xsN).
All the functions are perfectly straightforward, except for getMin and
getMin_ which must incrementally build up the reversing function to
convert
(xs3,xs3', (x2,xs2', (x1,xs1', (x0,xs0', ()))))
to
One x0 xs0' (One x1 xs1' (One x2 xs2' (One x3 xs3' Nil)))
As a result of building up this reverse function incrementally, this
implementation ends up being roughly 10% slower than the original.
INTERLUDE ON REVERSING LISTS
Suppose we want to support the following ADT. We want sequences with
three operations:
empty :: ReversableSeq a
cons :: a -> ReversableSeq a -> ReversableSeq a
rev :: ReversableSeq a -> [a]
with the obvious semantics. cons must take O(1) time, but rev may take
O(n) time. A list is a reasonable implementation with
type ReversableSeq a = [a]
empty = []
cons = (:)
rev = reverse
but, if you think about it, there's a fair amount of interpretive
overhead in the pattern matching that reverse performs at every step.
Way back in 1985, John Hughes came up with a representation that
avoids this overhead:
type ReversableSeq a = [a] -> [a]
empty = id
cons x xs = xs . (x:)
rev xs = xs []
The result of cons 1 (cons 2 (cons 3 empty)) is
id . (3:) . (2:) . (1:)
which, when applied to [] by the rev function, yields [3,2,1]. One way to
think of this representation is as lists with "holes" at the ends. The
cons operation fills in this hole with an element and another hole, and the
rev operation fills in the hole with []. (This is quite similar to
difference lists in logic programming...)
Applying this trick to the regular representation of binomial queues
yields
data Binom a = Nil
| Zero (Binom a)
| One a (Binom a -> Binom a) (Binom a)
which turns out to be about 10% faster than the original representation.
NESTED REPRESENTATION -- SECOND ATTEMPT
Ok, let's try to make a nested datatype out of this. Conceptually, the
nesting should keep track of our position in the queue so that we know
what rank the current tree has and can't possibly mix up trees of different
ranks. We hypothesize some Base type for the beginning of the list,
and some type transformer Succ that modifies the type as we
move down the queue.
type Base = ???
type Succ b = ??? -- this might need to be Succ a b
type Binom a = Binom_ a Base
data Binom_ a b = Nil
| Zero (Binom_ a (Succ b))
| One a (Binom_ a ??? -> Binom_ a ???) (Binom_ a (Succ b))
Now, what should the missing types in the One constructor be? Well,
the function (Binom_ a ??? -> Binom_ a ???) represents the children
of the current node. If this node is of rank k, then these children
are of ranks 0 .. k-1. The function (Binom_ a ??? -> Binom_ a ???)
takes a queue starting at rank k and adds the children to the front of
it, yielding a queue starting at rank 0. So the ???'s can be filled in as
| One a (Binom_ a b -> Binom_ a Base) (Binom_ a (Succ b))
or just
| One a (Binom_ a b -> Binom a) (Binom_ a (Succ b))
Now, what should the types Base and Succ be? It doesn't matter because
we never construct data of those types, we simply use the types to
discriminate between positions in the queue. So we can define Base
and Succ as
type Base = Void
newtype Succ b = S Void
I think this is a very interesting type for several reasons:
Because of this "type erasure" property, you might expect it to be just
as fast as the optimized regular representation, but in fact it is roughly
10% slower -- or roughly as fast as the original regular representation.
This is because the extra types hinder a certain optimization that the
programmer might make (and which I did make in the optimized regular
representation).
Recall the type of the getMin_ function from the original regular
representation, and the way it was used by getMin.
getMin_ :: Ord a => Binom a -> (a, [Tree a], Binom a)
getMin :: Ord a => Binom a -> (a, Binom a)
getMin h = let (x,xs,h') = getMin_ h
in (x,merge (list2binom xs Nil) h')
For the optimized regular representation, this becomes
getMin_ :: Ord a => Binom a -> (a, Binom a -> Binom a, Binom a)
getMin :: Ord a => Binom a -> (a, Binom a)
getMin h = let (x,xs,h') = getMin_ h
in (x,merge (xs Nil) h')
Now, consider the optimized nested representation. We can't just write
getMin_ :: Ord a => Binom_ a b -> (a, Binom_ a b -> Binom a, Binom_ a b)
because we don't know the rank of the tree that the second component of
the triple came from. (Note that if we had existential types, we could
write
getMin_ :: Ord a => Binom_ a b ->
(a, exists c. Binom_ a c -> Binom a, Binom_ a b)
getMin :: Ord a => Binom a -> (a, Binom a)
getMin h = let (x,xs,h') = getMin_ h
in (x,merge (xs Nil) h')
Some implementations do in fact support existential types, but in limited
ways that would carry efficiency costs of their own...)
Instead of returning the children and then applying them to Nil at the
end, we apply the children to Nil first, and then return the result,
yielding the type
getMin_ :: Ord a => Binom_ a b -> (a, Binom a, Binom_ a b)
getMin :: Ord a => Binom a -> (a, Binom a)
getMin h = let (x,xs,h') = getMin_ h
in (x,merge xs h')
This depends on laziness to evaluate only the children of the final
minimum, and not all the children of the temporary minimums along the
way. However, this does mean that we build a lot more thunks of the
form (xs Nil) along the way. And building these thunks is apparently
quite expensive.
2009 Addendum Ralf Hinze considers a similar representation in Section 6 of Numerical Representations as Higher-Order Nested Datatypes. Today, you would probably use fancier type machinery like dependent types or maybe GADTs, but it's still amazing how far you can get with only nested types.