Thursday, July 31, 2008

In search of epiphany

I watched the movie Proof last night. I highly recommend it. It's not often that mathematicians get so much screen time.

The nature of mathematical creativity plays a large role in the movie. One comment was especially intriguing. (Minor spoiler ahead.) In describing the process of solving a particularly difficult problem, one character says, “It was like...connecting dots. Some nights I...I could connect three or four of them. And some nights they'd be really far apart. I'd have no idea how to get to the next one, if there was the next one.”

This description rings both true and false for me. The true part is I'd have no idea how to get to the next one, if there was the next one. This nicely summarizes what it's like to work on a hard problem. You don't know what to do next, and you're not sure that a solution even exists.

But the connecting dots part doesn't feel right. It makes the process sound linear, when the reality is anything but. In my experience, the really hard problems—the ones that require the most creativity—are most often solved by epiphany, by having the whole solution burst into your head, seemingly in an instant. That's when the connecting dots part happens, but it all happens at once, not a few a time.

Of course, the catch is that epiphany can't be forced. I'm reminded of the old joke about the lottery. A man prays fervently every night, “Please, Lord, let me win the lottery tomorrow.” After twenty years, a heavenly voice responds, “Would you buy a ticket already!”

No, you can't just sit around waiting for an epiphany to happen. Epiphanies are hard work! TANSTAAFL. So, how do you go about searching for epiphany? Every epiphany I've ever had has been the result of a two-pronged effort.

The first prong is building up sweat equity. I thoroughly explore the space around the problem, playing with the problem from many different angles. Of course, I'm actively trying to solve the problem during this time, because I might get lucky and stumble across a solution. But I don't worry too much if I don't seem to be getting anywhere. This part of the process is not so much about connecting the dots as about building lots and lots (and lots) of potential connections.

The second prong is staying alert. That's easy to say, but really hard to do over long periods of time. Eventually, I hope, some stray thought or some environmental trigger will start a cascade, where some subset of those potential connections will become actual connections, creating a path to the solution. Sometimes this happens when you're relaxed and your mind is drifting, such as in the shower or while driving. I remember once in grad school trying all afternoon to track down a bug. That night my wife and I went over to a friend's house for dinner. Everybody was shocked when, in the middle of beef stroganoff, I suddenly shouted out, “That's it!”

Other times the epiphany can happen while you're actively working on something else. For example, the key insight behind TradeMaximizer came while I was trying to put together a homework for my Algorithms class. I had been working off and on for weeks trying to find cycles in the trade graph. But I had set that aside and was coming up with homework problems, when I stumbled across a stray comment that talked about converting a directed graph into a bipartite graph. That sentence was enough to set off the chain reaction of connections and, within seconds, TradeMaximizer was born.

Probably the most famous story about scientific epiphany is that of Archimedes' bath. Funny story: Shortly after announcing TradeMaximizer to the math trade community on BoardGameGeek, I received a peevish email from another BoardGameGeek user who thought he also had a solution. He described how he had come to BoardGameGeek, eager to share his discovery, only to find my announcement first. He complained “Argh! Way to shout ‘Eureka!’ while Archimedes is drying himself off!” (The other solution turned out to be bogus.)

Wednesday, July 16, 2008

Games for Programmers: Black Vienna

The next in an irregular series of board game recommendations for programmers. Of course, you don't have to be a programmer to enjoy these games, but if you are a programmer, then I think there's a pretty good chance that you'll like them.

O frabjous day!

My favorite deduction game, Black Vienna, has long been out of print, but now you can give it a try online, thanks to the efforts of Greg Aleknevicus. Go check out Black Vienna Online. It's free and easy to use. The user interface isn't going to win any awards, but it gets the job done. I'm thrilled that this game can now reach a wider audience.

Most deduction games are for two players, but Black Vienna handles three to six players. Your goal is to figure out the members of the infamous “Black Vienna” gang. There are 27 suspect cards, labeled A-Z and Ö. At the beginning of the game, three of these cards are chosen randomly and set aside. Those are the members of the gang.

The remaining 24 suspect cards are dealt out evenly to the players. (With 5 players, one of the players gets only 4 cards.) These are the cards of suspects for whom you can provide alibis.

On your turn you conduct an investigation. There is a pool of available investigation cards, each showing three suspects. You choose one of the available investigation cards and place it in front of another player. That player marks the card with plastic chips, according to how many of the suspects are in her hand. For example, if I have the cards BGMTXY and somebody plays the BQT card on me, I would mark it with two chips (for the B and the T). Other players would then know that I had two of the three suspects, but not necessarily which ones.

The turn then passes to the player who was just investigated.

One nice thing about Black Vienna as a game is that every player is involved on every turn. Even if you are not actively part of the investigation, you still need to update your information with the results of the investigation, and follow any chains of deductions to their conclusions.

Example Deductions

Some deductions are obvious. For example, if I played two chips on the BQT card, and you have the Q card in your hand, then you know that I have the B and the T. Other deductions are less obvious. Here's an actual example from a recent game. Call the other players Caitlin, Monica, Drew, and Dom.

Monica played one chip on GPX, but I had the X, so I knew that she had either G or P (but not both). Similarly, she also played one chip on DHR, but I knew Dom had the R, so Monica had either D or H. Finally, she played two chips on LNV.

Drew played one chip on DVY and zero chips on FÖY, so he had either D or V. Between Drew and Monica, they had at least 4 of the 5 cards DHLNV and at least 5 of the 7 cards DGHLNPV (possibly more if Drew had any of GHLNP).

Meanwhile, I knew all but two of Caitlin's cards. The only possibilities for those last two cards were DGHLNPV. Monica and Drew together had at least 5 of those 7 cards, so Monica, Drew, and Caitlin together must have all 7.

This tells me that none of these 7 cards are in the gang, and none of these 7 cards are in Dom's hand. It also tells me that Drew does not have any of GHLNP.

Finally, I can be more specific about Caitlin's cards. One of her two cards is G or P, and the other is one of DHLNV.

Online Play

The online implementation is asynchronous. You do not need to be logged in at the same time as other players. Instead, the system emails you whenever a turn is taken with the results of the investigation. When it is your turn, you can log on and take your turn at your leisure. This means that a 45-minute face-to-face game can take days or even weeks when played online, but it also means that you can play the game with odd minutes here and there, instead of having to carve out a solid chunk of time. An additional benefit is that it is easy to play with friends in different time zones. For example, I frequently play with friends on the other side of the country.

However, the greatest benefit of the online implementation is that it does away with the greatest weakness of face-to-face play—player errors. Not errors in deductions, which would be your own fault and would only affect you, but errors in answering investigations. If a player carelessly puts out, say, one chip when he should have put out two chips, it ruins the game for everyone. Even if the player notices the error a few turns later and tries to correct it, it's far too late. By then, the other players have already updated their notes with deductions made from the faulty answer, and typically have no way of backing out of those deductions.

If you want to give Black Vienna a try, see the detailed rules of the game and the instructions for the online interface.

Addendum: I've played in several three-player games recently. It's ok with that number, but I think it's significantly better with four or five players. (I haven't tried six yet.) With three players, the deductions are less interesting, and there is a greater chance for one player to get lucky.

Tuesday, July 8, 2008

Breadth-First Numbering: An Algorithm in Pictures

I've always enjoyed the mathematical tradition of “proofs without words”, so I wanted to see if something similar could be done with an algorithm. Of course, an algorithm is typically more complicated than a mathematical identity, so it's not surprising that an algorithm typically can't be captured in a single picture. Instead, I've found the idea of a commutative diagram to be a useful pictorial shorthand, where the short way around the diagram gives a high-level view of the operation, and the long way around gives more detail. Here is an attempt to describe my breadth-first numbering algorithm from ICFP 2000 using this idea.




Wednesday, July 2, 2008

Functional Programming in...Ada?

Ada is not the first language that comes to mind when you want to do functional programming. (You in the back, stop laughing!) Why, with excellent functional programming languages like Haskell or ML easily available, would I choose to do a functional programming project in Ada? (I CAN STILL HEAR YOU LAUGHING!) Partly because we use Ada in some courses at our institution, and I wanted a library that could help in teaching recursion to novices. But, honestly, mostly I just wanted to see if it could be done.

“It's like this fellow I knew in El Paso. One day, he just took all his clothes off and jumped in a mess of cactus. I asked him that same question, ‘Why?’...He said, ‘It seemed like a good idea at the time.’”
– Steve McQueen, The Magnificent Seven

Somewhat to my surprise, functional programming actually works reasonably well in Ada. You wouldn't want to program a large system this way, but small-scale projects are quite possible, albeit much more verbose than functional programmers are used to.

Instrumenting recursive functions

My application is going to be a library for instrumenting recursive functions. I want to be able to take a recursive function and run it with different combinations of monitors. Each monitor executes a little bit of extra code every time the recursive function is called, including recursive calls. Simple examples of useful monitors include counting the number of calls, or tracking the maximum call depth, or printing trace information on every call or return. More complicated examples include adding memoization to a recursive function or automatically detecting when a function is stuck in an infinite recursion.

Ideally, the library should be easy enough for students to use, but I'll settle for it being easy enough for instructors to use.

Of course, there are many ways to design such a library. I'm going to use a design based on ideas from functional programming. The basic ideas have been folklore in the functional programming community for a long time. For example, I remember doing something similar in SML in the early '90s. Bruce McAdam wrote a nice tech report describing these techniques in great detail. The trick here is going to be figuring out how to implement this design in Ada.

(Incidentally, if anybody has a truly object-oriented approach to solving the same problem, I'd be interested in seeing it.)

Preparing the recursive function

Without some kind of support for reflection, it seems too much to hope that I'll be able to observe the recursive calls inside a recursive function. Therefore, I'm willing to require some modifications to the recursive function, as long as those modifications are fairly straightforward. However, I should only need to modify the function once, rather than every time I run it with a different monitor.

Stealing an idea from functional programming called the Y combinator, I'll modify each recursive function to take an extra argument that is itself a function (or, more specifically, a pointer to a function). The recursive function will call this passed-in function wherever it would normally call itself. The passed-in function will eventually call the recursive function, but may run some other code first.

Here's an example using the ever popular Fibonacci function. The original recursive function might be written

 function Fib(X : Integer) return integer is
   if X <= 1 then
     return 1;
     return Fib(X-1) + Fib(X-2);
   end if;
 end Fib;
Adding the extra argument and replacing the recursive calls with calls to the passed-in function yields
 function Fib(Rec : access function(X : Integer)
                    return Integer;
              X : Integer) return integer is
   if X <= 1 then
     return 1;
     return Rec(X-1) + Rec(X-2);
   end if;
 end Fib;
where the changes are shown in red. Note that access function(X : Integer) return Integer is the type of a pointer to a function that takes an integer and returns an integer. In the rest of this post, I'll assume that the functions being instrumented always take an integer and return an integer. If I can get this to work at all, then generalizing this to arbitrary argument and return types will be trivial using Ada's notion of generics.

Running the recursive function

Now, to simply run this function recursively without attaching a monitor, I'll call it with Run, which is responsible for “tying the knot” by somehow making the function call itself. The Run function is written

 function Run(Fun : ...type to be filled in 
              X : Integer) return Integer is
   function Rec(X : Integer) return Integer is
     Fun(Rec'Access, X);
   end Rec;
   return Rec(X);
 end Run;
Here, Fun is a function like the modified Fib above. Run defines a local helper functon, Rec, that merely passes itself to Fun.

I would call this function by writing something like

   ...Run(Fib'Access, 10)...
where Fib'Access is Ada-speak for a pointer to the Fib function. This calls Rec(10), which calls Fib(Rec'Access,10), which in turn calls Rec(9), which in turn calls Fib(Rec'Access,9), and so on. Eventually, the whole thing returns 89, as expected.

I left out the type of Fun above, because it starts to get unwieldy. I can figure out what this type should be by looking at the type of Fib. The full signature of Run is

 function Run(Fun : access function
                (Rec : access function(X:Integer)
                       return Integer;
                 X : Integer) return Integer;
              X : Integer) return Integer;

A small lie

What I'd really like to be able to do is define a type abbreviation

 type Fun_Type is access function
   (Rec : access function(X:Integer)
          return Integer;
    X : Integer) return Integer;
and then write the signature of Run as
 function Run(Fun : Fun_Type; 
              X : Integer) return Integer;
Unfortunately, I can't. Or rather, I can, but it turns out to be useless. The problem is that Ada is really paranoid about what you do with pointers to functions. In particular, it won't let a pointer to a function match a named type if that function is defined in a deeper lexical scope than the type definition. This restriction does not apply to so-called anonymous access types, so I'm forced to rewrite the entire type everywhere it's used rather than giving it a name and referring to the name.

However, to simplify the presentation, I'm going to pretend that I can use Fun_Type as defined above. Just remember that, in the real implementation, every occurrence of Fun_Type needs to be replaced with the more verbose anonymous type. I'll write Fun_Type in italics as a reminder that it's not the real code.

A simple monitor

Now, here is a simple monitor that counts the number of calls to the recursive function, and prints out that number at the end.

 function Count(Fun : Fun_Type;
                X : Integer) return Integer is

   Num_Calls : Integer := 0;

   function My_Fun
       (Rec : access function(X:Integer) 
              return Integer;
        X : Integer) return Integer is
     Num_Calls := Num_Calls + 1;
     return Fun(Rec,X);
   end My_Fun;

   Result : Integer := Run(My_Fun'Access,X);
   Put("Number of calls = ");
   return Result;
 end Count;

Now, when I call

   Put( Count(Fib'Access,10) );
I get
Number of calls = 177
where the first line is printed by the monitor and the second line is printed by the external Put.

The Count monitor creates a new function My_Fun to give to Run. My_Fun simply increments Num_Calls and calls Fun. Now, every time Fun makes a recursive call to Rec, Rec will call My_Fun, which calls Fun.

So basically, Count “wraps” Fun with a little bit of code to increment the Num_Calls variable, and this code gets executed every time the Rec function is called.

In addition, Count does a little bit of work around the top-level recursive call, namely initializing the Num_Calls variable to 0, and printing the number of calls at the end.

A memoization monitor

Here's a more complicated monitor. This one modifies the recursive function to do memoization (first cousin to dynamic programming). The idea is that, when a function is called multiple times on the same arguments, you can save time by remembering those arguments and the corresponding results. Then when the function is called with arguments you have seen before, you can simply return the stored result, instead of recomputing it.

 function Memoize(Fun : Fun_Type; 
                  X : Integer) return Integer is

   Results : array (0..X) OF Integer;
   Ready   : array (0..X) OF Boolean 
               := (others => False);

   function My_Fun
       (Rec : access function(X:Integer) 
              return Integer;
        X : Integer) return Integer is
     if not Ready(X) then
       Results(X) := Fun(Rec,X);
       Ready(X) := True;
     end if;
     return Results(X);
   end My_Fun;

   return Run(My_Fun'Access,X);
 end Memoize;
The difference can be dramatic.
   ...Memoize(Fib'Access, 30)...
executes the body of Fib a total of 31 times, whereas
   ...Run(Fib'Access, 30)...
executes the body of Fib almost 2.7 million times (2692537 times, to be exact).

I made an assumption above that, if the original argument was X, then the range of possible arguments in all the calls is 0..X. This is good enough to illustrate the idea, but a more general implementation might be parameterized by the bounds to use, or by a function that calculates those bounds from the original X.

Multiple monitors

How did I know above that Run executes the body of Fib 2692537 times? By running Fib with the Count monitor. How did I know that Memoize executes the body of Fib 31 times? By running Fib with both the Memoize and Count monitors together.

Except that, as currently written, the Memoize and Count monitors can't be used together. I can run Fib with one or the other, but not both simultaneously.

What I want is a way to layer one monitor on top of another, building arbitrary stacks of monitors. The clues to how to do this are already in the current implementations of Memoize and Count.

Notice that both Memoize and Count have the same interface as Run. In other words, a monitor is an alternative run function that adds some extra machinery to the basic Run. Further, notice that both Memoize and Count build a My_Fun function by wrapping some extra code around Fun and then run My_Fun using the Run function. The key insight is that we can parameterize Memoize and Count by which run function to use for running My_Fun. That might be the basic Run function, but it might also be a run function corresponding to a different monitor.

To achieve this parameterization, I'll use generics instead of passing function pointers. The existing definitions of Memoize and Count will still work, but they need to be preceded by additional code to take a run function as a generic parameter. For example,

   with function Run(Fun : Fun_Type; 
                     X : Integer) return Integer;
 function Count(Fun : Fun_Type; 
                X : Integer) return Integer;

 function Count(Fun : Fun_Type; 
                X : Integer) return Integer is
     ... Run(My_Fun'Access, X);
The new parts are in red. I didn't change the existing code of Count at all, but now the call to Run in the body of Count refers to the run function passed in as a generic parameter.

Using a monitor is now a two-stage process, first instantiating the generic function to create the desired run function, and then calling that run function. For example,

 function C_Run is new Count(Run);
 function MC_Run is new Memoize(C_Run);

 ...MC_Run(Fib'Access, 30)...
Notice that the basic Run function is always used in the first instantiation, where it serves as the base of the stack of monitors.

But wait! Something strange is happening here. This program displays 59 calls, not 31 calls as expected. However, if we layer the monitors in the opposite order

 function M_Run is new Memoize(Run);
 function CM_Run is new Count(M_Run);

 ...MC_Run(Fib'Access, 30)...
then we get the 31 calls.

It's really a question of when the memoization code checks for a repeated argument relative to when the counting code increments the counter. If the memoization check happens before the increment, we get 31 calls. If the memoization check happens after the increment, we get 59 calls. Either way, the body of the original Fib function is only executed 31 times.

And that's it, really. There's still a lot of grunt work involved in fleshing out a complete, robust library, but all the important design decisions have already been made. Implementing a wide variety of monitors is fairly easy using Count and Memoize as templates.

Generics vs function pointers

One aspect of this design deserves a little more discussion. I'm mixing two different styles of passing functions as arguments, using generics in some places and function pointers in other places. Why not just use one style consistently? Because the two styles of passing around functions have different strengths and weaknesses.

One major difference is that generics work much better than function pointers when you want to return a function. For example, the generic Count monitor takes a run function and returns a run function. It's easy to imagine writing Count as

 function Count(Run : Run_Type) 
                return Run_Type is ...
where Run_Type is the appropriate access function type. But, in practice, this doesn't work because Ada lacks closures. The natural way to write this would be
 function Count(Run : Run_Type) return Run_Type is

   function My_Run(Fun : Fun_Type; 
                   X : Integer) return Integer is

   return My_Run'Access;
 end Count;
but you can't return My_Run'Access because My_Run goes away as soon as Count returns.

Note that it is possible to get around this restriction against returning function pointers by re-writing strategic code fragments in continuation-passing style, so that the function pointers in question can be passed to a continuation instead of returned. This works fine on a small scale, but quickly blows the runtime stack when attempted on a large scale.

A second major difference is that it is easier to write third-order (or even higher-order) functions using function pointers than using generics. A first-order function takes an ordinary value as an argument. A second-order function takes a first-order function as an argument. A third-order function takes a second-order function as an argument, and so on. Now, consider a monitor like Count. Count takes a Run function, which takes a Fun function, which takes a Rec function, so Count is fourth-order!

Currently, I pass the Run functions using generics, but the Fun functions using function pointers. (Ignore the Rec functions for now.) Suppose I wanted to pass both Run functions and Fun functions using generics. This would mean that each Run function would take its Fun function as a generic parameter. But that means that, when we pass a Run function to a monitor like Count, we need to pass it as a generic function. In other words, Count would need to be a generic function that takes a generic function as an argument. That's not allowed in Ada. Or, more truthfully, it is sometimes allowed, but it's extremely painful.

In Ada, a generic function can neither take a generic function as an argument nor return a generic function as a result. However, a package can contain generic components, so you can sometimes fake it wrapping the argument or result in a package. For example, you can fake a generic function that returns a generic function by writing a generic package that contains a generic function as a component.

In a limited way, you can do the same thing in the other direction. If you want a generic function to take a generic function as an argument, you can fake it by wrapping the argument in a package, so that the outer generic function takes a package (which contains a generic function as a component). The catch is that you can't instantiate the outer generic function with just any old package meeting a certain specification, but only with packages that are themselves instantiations of a single generic package. Making all that work is an exercise in pain that I leave to the most masochistic of readers.