Friday, March 28, 2008

Wrong Number

Is it time to overhaul the user interface of the phone system?

A few months ago we started getting a lot of calls that were wrong numbers. Before that we used to get maybe one a week, but suddenly we were getting 5 or 6 a day. After an initial period of frustration, we eventually started to apply our debugging skills.

Our first guess was that somebody was distributing our phone number on a flyer or something, either as a prank or by accident. But this didn't make sense because every caller was asking for somebody different. “Hello, is Juan there?” “Hi, Elizabeth/Gary/Tanya/Carlos?” Or the ever popular “<pause> Who is this?!!” No, dude, you called me, so you tell first.

Ok, scratch that idea. Hmm, I suppose we could always ask. “Sorry, wrong number, what number are you trying to reach?” At last, some progress. Our phone number is 987-654-3210 (not our real number). The people who were calling us were uniformly dialing 654-321-0xxx. The 654 area code is for an area maybe 20 miles away, so lots of people in our area code were dialing numbers there. Even worse, there are up to 1000 numbers in that area code that match this pattern. No wonder we were getting a lot of calls!

The reason they were getting us was that they were forgetting to dial a 1 first. Without the 1, the phone system was interpreting this as a local call within the 987 area code to 654-3210, with the extra three digits on the end simply being ignored. On many (most?) cell phones the 1 is not necessary, so people lose the habit of dialing it. Then when they use a land line, where the 1 is required, they get us.

With this insight, we contacted the phone company. It took three tries and many patient explanations to even get somebody to understand what the problem was. I asked what had changed in the system recently, because clearly something had changed to cause the sudden increase in wrong numbers.

“Uh, I don't know. I'll ask our tech guy.” A few minutes later. “Our tech guy says the 321 exchange in the 654 area code is a Cingular exchange. You should call them.” “But how is that supposed to help? The phone calls are originating in your system and never getting out of your system, so even if they made a change over there, it wouldn't change what's happening here.” “Well, you should call Cingular.”

Predictably, Cingular couldn't help. They couldn't even tell us if the 321 exchange was new because “our computers don't tell us that”.

The really frustrating thing about the problem is that it's so easy to fix. It should be easy for the system to detect when somebody has dialed a 10 digits but is being connected to a 7-digit number. Give the caller a warning and ask them to redial! It's possible this might break some systems with extensions, where the leftover 3 digits indicate the extension. So perhaps disable the warning if there is a pause between the 7th and 8th digits.

Has anyone else had this problem?

Wednesday, March 19, 2008

Program Testing For The Sake Of Learning

A number of years ago, I used to compete in the TopCoder on-line programming contests. As an educator, I was fascinated by the TopCoder environment, less by the contests themselves than by the so-called Practice Rooms. These were archives of problems from old contests, where you could submit solutions and run them against the test data used in the actual contest.

I was amazed by how hard many people would work in the Practice Rooms. I wished I could get my students to work half that hard! At first, I thought these people were motivated by the idea of doing better in the contest. And, indeed, that was probably what got most of them to come to the Practice Rooms in the first place. But then I noticed that many of the people working hard in the Practice Rooms competed only rarely, if at all. Something else was going on.

Eventually, it dawned on me that the secret lay in the ease of running system tests. As teachers, we know that feedback is crucial to learning, and that quick feedback is better than slow feedback. These programmers were getting feedback within a few seconds, while they were still “in the moment”. The experience of this tight, nearly instantaneous feedback loop was so powerful that it kept them coming back for more. The burning question then was could I re-create this experience in the classroom?

Why test?

Of course, program testing has been part of CS education forever. However, there are several different reasons for testing, and which reason or reasons the teacher has in mind will strongly affect how testing plays out in any particular class.

Here are several common reasons for testing:

Improved software quality. Ironically, the number one reason for using testing in the real world is the least important reason to use it in the classroom. Most student programs, especially in early courses, will never be used for real. What matters in these programs is not that students complete the program correctly, but what they learn in the process.

Easier grading. Grading programs can be hard, especially if you have a lot of students. Naturally, many CS educators seek to automate this process as much as possible. Many systems exist that will run student programs against a set of instructor tests and record the results. These results may used as a guide for human grading, or the results may entirely determine the student grade. When testing is performed as part of the grading process, it is common for the actual test cases to be hidden from the students.

To learn about testing. Of course, testing is a vital part of real-world software development. Therefore, it is important to expose students to the testing process so they have some idea of how to do it. However, care must be taken here, especially in early courses. Hand-crafting good test cases is hard work. If students are forced to fight through a heavy-weight testing process on toy problems, including making up their own tests, the lesson they frequently take away is that testing is much more trouble than it's worth.

Testing as design. In recent years, test-first and test-driven development have begun to spread from agile programming into the classroom. Advocates view such testing as part of the design process. By writing tests before writing code, students are forced to consider what the behavior of the program should be, rather than diving in without a clear idea of what the program is supposed to do.

Another way

All of these reasons are important, but none of them quite capture what I was seeing in the TopCoder practice rooms. There, people were becoming addicted to quick, easy-to-run tests as a powerful learning experience. This suggests viewing program testing as a way to improve learning. Not learning about testing, but learning about the content being tested.

In an attempt to re-create the relevant parts of the TopCoder experience in the classroom, I adopted the following process. With most programming assignments, I distribute some testing code and test cases. Students then run these tests while they are developing their programs, and they turn in their test results along with their code.

Note that this is an extremely lightweight process. All I'm doing is adding maybe 20 lines of test-driver code to the code I distribute for each assignment, and I can usually cut-and-paste that code from the previous assignment, with only a few tweaks. I then add maybe 50-100 test cases, either hardwired into the code or in a separate text file. I usually generate perhaps 5 of the tests by hand, and the rest randomly. Usually this process adds about 20-30 minutes to the time it takes me to develop an assignment.

I think CS teachers are often scared away by ideas like this because they imagine a fancy system with a server where students submit their programs, and the server runs the tests and records the results into a database. That all sounds like a lot of work, so teachers put it off until “next year”. In contrast, what I'm suggesting could probably be added without much hassle to an assignment that is going out tomorrow.

Similarly, teachers are sometimes scared away by the need to come up with the test suite, thinking it will take too much time to come up with a set of high-quality tests. But that's not what I'm doing. Experience has shown that large numbers of random tests work essentially as well as smaller numbers of hand-crafted tests, but at a fraction of the cost. (See, for example, the wonderful QuickCheck testing tool.) Besides, I'm not trying to guarantee that my test suite will catch all possible bugs, which is a hopeless task anyway. Instead, I'm providing a low-cost test suite that is good enough to gain most of the benefits I'm looking for.

Benefits

And just what are those benefits? Well, since I started using this kind of testing in my homeworks, the general quality of submissions has gone up substantially. Partly, this is because of a drastic decrease in the number of submissions that are complete nonsense. When I did not require tests, students would write code that seemed correct to them at the time, and turn it in without testing it. Now, they can still turn in complete nonsense, but at least they'll know that they are doing so. Usually, however, when they see that they're failing every test, they'll go back and at least try to fix their code. In addition, many students who would have gotten the problem almost right without testing now get it completely right with testing.

Which leads me to the second benefit—there has been a noticeable improvement in the students' debugging skills. As one student put it, “[The testers] also taught us how to troubleshoot our problems. Without those test cases, many of us would have turned in products that were not even close to complete. This would have meant for worse grades and harder grading for the instructor. We also would not have been able to use and develop our troubleshooting skills. When we don't know that something is wrong, it is very hard to try to test for the failures.

However, the main benefit is that the students are learning more. They are getting needed feedback while they are still “in the moment”, and can immediately apply that feedback, which is all great for learning. Contrast this, for example, to feedback that is received after an assignment has been turned in. Often such feedback comes days or weeks later. Students often don't even look at such feedback, but even if they do, they have often lost the context that would allow them to incorporate the feedback into their learning. Even with an automated system that gives feedback instantly, if students do not have an opportunity to resubmit, then they will usually not bother to apply the feedback, and so will miss out on a substantial portion of the benefit to their learning.

As another student put it, “The automated testers are much more than just a means of checking the block to ensure the program works. They function almost as an instructor themselves, correcting the student when he or she makes a mistake, and reaffirming the student's success when he or she succeeds. Late at night, several hours before the assignment is due, this pseudo-instructor is a welcome substitute.

Grading

I have the students turn in the results of their testing, which makes grading significantly easier. Failed tests give me clues about where to look in their programs, while passed tests tell me that I can focus more on issues such as style or efficiency than on correctness.

But this is merely a fringe benefit. Note that I only use the test results to help guide my grading, not to determine grades automatically. I worry that when grading becomes the primary goal of testing, it can interfere with the learning that is my primary goal. For example, testing when used for grading often breaks the tight feedback loop that is where my students learn the most.

Also, when testing is used for grading, the instructor's test suite is often kept hidden from the students. In contrast, I can make my test suites public, which saves me all kinds of headaches. I'm not worried that a student might hardcode all the test cases into their code just to pass all the tests, because I'll see that when I look at their code. (In fact, this happens about once a year.) Students like having the test cases public because, if they don't understand something in the problem statement, they can often answer their own questions by looking at the test cases.

Admittedly, sometimes students take this too far. I occasionally catch them poring over the test cases like trying to read tea leaves, instead of coming to ask me a 30-second question.

Crawl-Walk-Run

I am not advocating that this approach to testing should be used everywhere in the curriculum. Among other things, I agree that students should have the experience of creating their own test cases. The question is when.

I see my approach as being most useful in beginning programming courses, and in courses that are not nominally about programming. For example, it works particularly well in Algorithms.

In other programming courses, however, I believe that a crawl-walk-run approach is appropriate. The “crawl” step is about attitude rather than skills; it is to convince students that testing is valuable. I believe my approach does this, especially if you occasionally leave out the tests and explicitly ask students to compare the experiences of developing with or without tests.

The “walk” step might be to have students generate their own tests, but using instructor-supplied scaffolding. The “run” step might be to have students generate both the tests and the scaffolding. I admit, however, that I have not taught the courses in which those steps would be most appopriate.

Tuesday, March 11, 2008

Games for Programmers: Schotten Totten/Battle Line

The next in an irregular series of board game (or, in this case, card game) recommendations for programmers. Of course, you don't have to be a programmer to enjoy these games, but if you are a programmer, then I think there's a pretty good chance that you'll like them.

In programming, the specifications you thought you were implementing are practically guaranteed to change by the time you're done. Unless you have the power to “just say no” to the changes, the best approach is often to stay flexible by delaying commitments for as long as possible. You try to keep your options open by putting off irreversible decisions to the last possble moment. Schotten Totten (by noted game designer Reiner Knizia) elegantly captures this feeling in a short, two-player card game.

box cover

Schotten Totten is nominally about two Scottish clans squabbling over territory. Nine boundary stones are placed between you and your opponent, in a line from left to right. You win by being the first to claim either three adjacent stones or five stones altogether. To claim a stone, you must play a better three-card group on your side of the stone than your opponent does.

game in progress
(photo by Scott Alden)

There are a total of 54 cards, numbered 1-9 in six different colors. Three-card groups are ranked as follows, from best to worst: straight flush (three consecutive numbers of the same color), three-of-a-kind (three of the same number), flush (three of the same color), straight (three consecutive numbers), “wild rabble” (anything else). If both players play the same kind of group, then each player adds their cards together and the player with the higher sum wins. If there is still a tie, the player who completed their group first wins.

You start the game with six cards. Each turn you play a single card on your side of one of the stones, and then draw a replacement card. (For beginners, the play-then-draw order can be a little confusing because most players are more accustomed to draw-then-play.) Towards the end of the game, if there are no more cards to draw, then you simply skip that step.

Once you have played a card on a stone, you're stuck with it. There is no way to take it back or replace it or cover it up. Most of the time you will not have all the cards that you want when you start playing a particular three-card group. The tension in the game comes from trying to avoid commiting to a particular kind of group for as long as possible. For example, suppose you've played the blue 6 on a stone, and now you have the blue 5 and the red 6 in your hand. If you play the blue 5 next, then you are commiting to trying for a straight flush. If you play the red 6 next, then you are commiting to trying for three-of-a-kind. The straight flush is stronger, but there are only two other cards in the deck that will complete the straight flush (the blue 4 or the blue 7). On the other hand, there are four other 6's in the deck, so you have a better chance of completing the three-of-a-kind. What you will probably do is delay playing on this stone by playing somewhere else, but that just means making a commitment there instead of here. If you do play on this stone, be prepared for the frustration of drawing exactly the card you needed for the group that you just ruled out!

(Technically, if you play the blue 5 on the blue 6, you are not commiting to trying for a straight flush. Any other blue will complete a flush, and any 4 or 7 will complete a straight. However, you almost never want a plain flush or a plain straight until close to the end of the game, because they are just too easy to beat.)

Each stone is awarded when both three-card groups are complete. However, there is a twist here that may appeal to programmers, who tend to excel at logical thinking. You can also claim a stone early if you can prove that your opponent can't beat your hand. For example, suppose you have three 7's on your side of a stone, and your opponent has the green 1 and the green 2 on his side. If the green 3 has already been played somewhere else, then there is no way your opponent can complete the straight flush. The best he could do would be a plain flush, which would lose to your 7's. Therefore, you can claim the stone without waiting for him to complete his group.

Note that the proof must be accomplished with cards that have already been played. In the above example, if you were holding the green 3, you could not claim the stone until you played it. Sometimes this means that you will play a card in a place where you would prefer not to, just to be able to claim a stone elsewhere.

Claiming stones early is important for two reasons. First, there are situations where both players can fulfill the victory conditions, so it is just a matter of who claims the stones first. For example, both players might be able to get three adjacent stones, or one might be able to get three adjacent stones while the other can get five stones altogether.

Second, and getting back to the theme of this recommendation, claiming a stone early helps you keep the pressure on your opponent. Once a stone has been claimed, neither player can play cards on that stone anymore, even if they have room to do so. You never want to give your opponent an easy place to dump a card. Not only does that let them rid themselves of a useless card—when you can only hold six cards at a time, freeing up an otherwise useless slot is huge—but more importantly it lets them avoid making a commitment someplace else.

How to get it

Schotten Totten can be hard to find in stores. Fortunately, its sister game, Battle Line is much more widely available. Battle Line is almost the same game, but with different artwork, with cards numbered 1-10 instead of 1-9, and with 7 cards per hand instead of 6. Also, Battle Line adds a separate mini-deck of so-called tactics cards, which let you change the rules in various ways. For example, one tactics card forces you to play 4 cards on each side of a particular stone instead of 3. (Actually, newer editions of Schotten Totten include the tactics cards as well.)

Personally, I like Schotten Totten better. I prefer the smaller, more claustrophobic scale. Also, I prefer the game without the tactices cards, which make the game too chaotic. Fortunately, if you get Battle Line you can experiment with or without the tactics cards, with our without the 10's, and with 7-card or 6-card hands, to see which way you like it better.

A good place to look for both games is BoardGamePrices.com, which summarizes prices and availability at a host of on-line game retailers.

If you want to try the game out before buying it, there are excellent implementations of Schotten Totten at both flexgames and Yucata.de. (The latter site defaults to German but can be set to English by clicking on the flag.)

Thursday, March 6, 2008

Get the job done, but what is the job?

One of my most memorable conversations with a student happened a few years ago. I was grading a programming project, and something in one of the project documents struck me. At our school, students are required to document help from other students in some detail. One pair described getting help from another student, call him K. Apparently, K was telling them how to do something, but they weren't understanding him fast enough. He got impatient, grabbed the keyboard, and starting typing in the code for them.

I was flabbergasted by this description. How could K have possibly believed that this was appropriate? I knew the class that he was just about to get out of, so I met him there and took him into an empty classroom.

For at least five minutes, we talked completely past each other. He didn't understand why I was upset, and I didn't understand why he seemed to believe he had done a genuinely good thing. Obviously, there were some unspoken assumptions on both sides that were preventing us from communicating.

Eventually, I realized what was going on. He had been taught that the bottom line was getting the job done. That was the most important thing, and he believed that what we teachers wanted from students was for them to get the job done. Helping his buddies get the job done was, in his eyes, no less than the responsible thing to have done, a positive act of virtue.

Aha! With this realization, and with his phrase of “getting the job done”, I finally had the wedge I needed to get through to him.

“What was the job on this project?”, I asked him.

“To complete a working program”, he replied.

“No, it wasn't”, I said. Utter confusion covered his face. “Think about it. I already have a solution to this project. Why would I need a dozen student implementations? I'm pretty sure that mine is going to be less buggy, faster, and better documented than any of yours.”

“But...” The first signs of doubt.

“So, if I don't really care about your implementations, then why did I have you all do this project? What do I care about? What was the job?

K frowned in concentration. It's never an easy thing when somebody tries to dismantle your basic assumptions. To his credit, he didn't just shut down, but actively struggled to understand what I was saying. He started to speak and stopped a few times. He was almost there, but couldn't quite articulate it.

“Learning”, I finally said. “The job was learning. The implementation was only a way to trigger that learning. But the implementation doesn't really matter, it's the learning that I care about.”

He got it. He understood why, from my point of view, typing in code for the other students was not helping them get the job done, but instead was actively hindering the learning that was the real job.

As we wound down, he mentioned a different teacher he had had the previous two semesters. The other teacher had a grading policy where you get credit for an assignment only when it is completely correct. If your assignment has flaws, then you redo it until it's correct, losing points with each resubmission. For somebody like K, who strongly believes in getting the job done, this policy just reinforces the idea that finishing the program is the job. I know this wasn't the other teacher's intent, but I can certainly see how it could be taken that way.

Saturday, March 1, 2008

What the heck is a math trade?

Many sites exist for trading objects or services of various kinds—my favorite example is a site for trading right or left shoes. Over on BoardGameGeek, there is a very active community of users who trade board games. The site lets you list games that you are offering and games that you want, and then automatically finds possible matches. Even with this support, however, finding and negotiating individual trades can be tedious. Enter Math Trades, which have been running regularly on BoardGameGeek for several years.

One frequent difficulty with individual trades is that, although you have something I want, I might not have what you want. What if we were able to get a third person involved? Suppose you have what I want, Tina has what you want, and I have what Tina wants. Then we can arrange a three-person swap. SwapTree is a site that will find such three-way and even four-way trades for you automatically.

But why stop there? Why not look for five-way loops, or even hundred-way loops? That's what happens in a Math Trade.

How a math trade works

First, a moderator announces the opening of the trade and lays down trade-specific ground rules. Everybody who is interested in participating lists the items they choose to offer in the trade. At a pre-determined time, the list closes to new entries.

Next, the participants review all the offers and decide which ones they want. For each item you are offering, you create a wantlist that says, in essence, “I would be willing to trade item 57 (my item), for item 22 or item 34 or item 119 or item 576.” Everybody sends their wantlists to the moderator by the announced deadline.

After a brief period for finding and correcting errors, the moderator runs special math trade software to find the most trades possible. The moderator then publishes the results, the participants exchange addresses, and drop their items in the mail.

A lot of trades

Note that the “most trades possible” can be quite a lot. The largest math trade to date involved 2320 items, of which 994 traded. These 994 items were broken into 7 trade loops, including one monstrous loop of 920 items. Just imagine standing in a circle with 919 of your closest friends and everyone handing their game to the person on their right!

(Actually, this trade involved 316 users, many of whom were trading multiple items. So that loop of 920 items wasn't really 920 people.)

Although math trades have been run regularly on BoardGameGeek for almost three years, it is only recently that trades of this size have become possible. About six months ago, I wrote TradeMaximizer, which has become the de facto standard software for math trades. TradeMaximizer improves on previous software by guaranteeing to find the maximum number of trades possible, and by doing so quickly, in under a minute for the large trade above and in just a few seconds for most trades. Previous software ran in hours or days, and even then might not find all the trades possible. I'll describe TradeMaximizer in detail in an upcoming post.

Beyond games

There's nothing specific to games in the idea of math trades or in TradeMaximizer. I know of math trades run for metal CDs (using earlier software) and for geocoins (using TradeMaximizer). If you have run a math trade in another community, or think you might like to, please post a comment here and let us know. Don't be afraid! They're easy to run if you keep it relatively small, and a lot of fun.

Probably the coolest application of math trades is for trading...kidneys!

No, this isn't some urban legend about waking up in an ice-filled bathtub. This is serious, life-or-death stuff.

When somebody needs a kidney transplant, they can often find a family member or friend who is willing to donate one. But what happens if the donor's kidney is biochemically incompatible with the intended recipient? Well, sometimes you can find two such donor-recipient pairs where each donor is compatible with the other recipient. By donating to the other recipient, each donor ensures that their intended recipient gets a compatible kidney. Three-way trades and four-way trades are also possible. Unlike ordinary math trades, however, larger loops are not supported. Oh, you can find the loops, but doctors won't perform the surgeries because the risks of something going wrong (such as somebody backing out at the last minute) are just too great.

A trio at CMU implemented a system that is now being used to find matches in a national list of donors every two weeks by the Alliance for Paired Donation. Their system differs from an ordinary BoardGameGeek-style math trade in two fundamental ways. One is the deliberate limit on the sizes of trade loops (which actually makes the problem harder, not easier). The other is the presence of altruistic donors—donors who give their kidneys to strangers without asking for a kidney for a friend or loved one in return.