A number of years ago, I used to compete in the TopCoder on-line programming contests. As an educator, I was fascinated by the TopCoder environment, less by the contests themselves than by the so-called Practice Rooms. These were archives of problems from old contests, where you could submit solutions and run them against the test data used in the actual contest.
I was amazed by how hard many people would work in the Practice Rooms. I wished I could get my students to work half that hard! At first, I thought these people were motivated by the idea of doing better in the contest. And, indeed, that was probably what got most of them to come to the Practice Rooms in the first place. But then I noticed that many of the people working hard in the Practice Rooms competed only rarely, if at all. Something else was going on.
Eventually, it dawned on me that the secret lay in the ease of running system tests. As teachers, we know that feedback is crucial to learning, and that quick feedback is better than slow feedback. These programmers were getting feedback within a few seconds, while they were still “in the moment”. The experience of this tight, nearly instantaneous feedback loop was so powerful that it kept them coming back for more. The burning question then was could I re-create this experience in the classroom?
Why test?
Of course, program testing has been part of CS education forever. However, there are several different reasons for testing, and which reason or reasons the teacher has in mind will strongly affect how testing plays out in any particular class.
Here are several common reasons for testing:
Improved software quality. Ironically, the number one reason for using testing in the real world is the least important reason to use it in the classroom. Most student programs, especially in early courses, will never be used for real. What matters in these programs is not that students complete the program correctly, but what they learn in the process.
Easier grading. Grading programs can be hard, especially if you have a lot of students. Naturally, many CS educators seek to automate this process as much as possible. Many systems exist that will run student programs against a set of instructor tests and record the results. These results may used as a guide for human grading, or the results may entirely determine the student grade. When testing is performed as part of the grading process, it is common for the actual test cases to be hidden from the students.
To learn about testing. Of course, testing is a vital part of real-world software development. Therefore, it is important to expose students to the testing process so they have some idea of how to do it. However, care must be taken here, especially in early courses. Hand-crafting good test cases is hard work. If students are forced to fight through a heavy-weight testing process on toy problems, including making up their own tests, the lesson they frequently take away is that testing is much more trouble than it's worth.
Testing as design. In recent years, test-first and test-driven development have begun to spread from agile programming into the classroom. Advocates view such testing as part of the design process. By writing tests before writing code, students are forced to consider what the behavior of the program should be, rather than diving in without a clear idea of what the program is supposed to do.
Another way
All of these reasons are important, but none of them quite capture what I was seeing in the TopCoder practice rooms. There, people were becoming addicted to quick, easy-to-run tests as a powerful learning experience. This suggests viewing program testing as a way to improve learning. Not learning about testing, but learning about the content being tested.
In an attempt to re-create the relevant parts of the TopCoder experience in the classroom, I adopted the following process. With most programming assignments, I distribute some testing code and test cases. Students then run these tests while they are developing their programs, and they turn in their test results along with their code.
Note that this is an extremely lightweight process. All I'm doing is adding maybe 20 lines of test-driver code to the code I distribute for each assignment, and I can usually cut-and-paste that code from the previous assignment, with only a few tweaks. I then add maybe 50-100 test cases, either hardwired into the code or in a separate text file. I usually generate perhaps 5 of the tests by hand, and the rest randomly. Usually this process adds about 20-30 minutes to the time it takes me to develop an assignment.
I think CS teachers are often scared away by ideas like this because they imagine a fancy system with a server where students submit their programs, and the server runs the tests and records the results into a database. That all sounds like a lot of work, so teachers put it off until “next year”. In contrast, what I'm suggesting could probably be added without much hassle to an assignment that is going out tomorrow.
Similarly, teachers are sometimes scared away by the need to come up with the test suite, thinking it will take too much time to come up with a set of high-quality tests. But that's not what I'm doing. Experience has shown that large numbers of random tests work essentially as well as smaller numbers of hand-crafted tests, but at a fraction of the cost. (See, for example, the wonderful QuickCheck testing tool.) Besides, I'm not trying to guarantee that my test suite will catch all possible bugs, which is a hopeless task anyway. Instead, I'm providing a low-cost test suite that is good enough to gain most of the benefits I'm looking for.
Benefits
And just what are those benefits? Well, since I started using this kind of testing in my homeworks, the general quality of submissions has gone up substantially. Partly, this is because of a drastic decrease in the number of submissions that are complete nonsense. When I did not require tests, students would write code that seemed correct to them at the time, and turn it in without testing it. Now, they can still turn in complete nonsense, but at least they'll know that they are doing so. Usually, however, when they see that they're failing every test, they'll go back and at least try to fix their code. In addition, many students who would have gotten the problem almost right without testing now get it completely right with testing.
Which leads me to the second benefit—there has been a noticeable improvement in the students' debugging skills. As one student put it, “[The testers] also taught us how to troubleshoot our problems. Without those test cases, many of us would have turned in products that were not even close to complete. This would have meant for worse grades and harder grading for the instructor. We also would not have been able to use and develop our troubleshooting skills. When we don't know that something is wrong, it is very hard to try to test for the failures.”
However, the main benefit is that the students are learning more. They are getting needed feedback while they are still “in the moment”, and can immediately apply that feedback, which is all great for learning. Contrast this, for example, to feedback that is received after an assignment has been turned in. Often such feedback comes days or weeks later. Students often don't even look at such feedback, but even if they do, they have often lost the context that would allow them to incorporate the feedback into their learning. Even with an automated system that gives feedback instantly, if students do not have an opportunity to resubmit, then they will usually not bother to apply the feedback, and so will miss out on a substantial portion of the benefit to their learning.
As another student put it, “The automated testers are much more than just a means of checking the block to ensure the program works. They function almost as an instructor themselves, correcting the student when he or she makes a mistake, and reaffirming the student's success when he or she succeeds. Late at night, several hours before the assignment is due, this pseudo-instructor is a welcome substitute.”
Grading
I have the students turn in the results of their testing, which makes grading significantly easier. Failed tests give me clues about where to look in their programs, while passed tests tell me that I can focus more on issues such as style or efficiency than on correctness.
But this is merely a fringe benefit. Note that I only use the test results to help guide my grading, not to determine grades automatically. I worry that when grading becomes the primary goal of testing, it can interfere with the learning that is my primary goal. For example, testing when used for grading often breaks the tight feedback loop that is where my students learn the most.
Also, when testing is used for grading, the instructor's test suite is often kept hidden from the students. In contrast, I can make my test suites public, which saves me all kinds of headaches. I'm not worried that a student might hardcode all the test cases into their code just to pass all the tests, because I'll see that when I look at their code. (In fact, this happens about once a year.) Students like having the test cases public because, if they don't understand something in the problem statement, they can often answer their own questions by looking at the test cases.
Admittedly, sometimes students take this too far. I occasionally catch them poring over the test cases like trying to read tea leaves, instead of coming to ask me a 30-second question.
Crawl-Walk-Run
I am not advocating that this approach to testing should be used everywhere in the curriculum. Among other things, I agree that students should have the experience of creating their own test cases. The question is when.
I see my approach as being most useful in beginning programming courses, and in courses that are not nominally about programming. For example, it works particularly well in Algorithms.
In other programming courses, however, I believe that a crawl-walk-run approach is appropriate. The “crawl” step is about attitude rather than skills; it is to convince students that testing is valuable. I believe my approach does this, especially if you occasionally leave out the tests and explicitly ask students to compare the experiences of developing with or without tests.
The “walk” step might be to have students generate their own tests, but using instructor-supplied scaffolding. The “run” step might be to have students generate both the tests and the scaffolding. I admit, however, that I have not taught the courses in which those steps would be most appopriate.