About four years ago, I created my own programming language for teaching. I'll probably write more about this language at some other time, but for now I want to focus on one feature of the language: the use of mandatory indentation. My experience with this aspect of the language has been so overwhelmingly positive that I will never again voluntarily use a language without mandatory indentation for teaching novice programmers.
Of course, sometimes the choice of language is not under my control. Even when it is, there are always many different factors that go into that choice. But no other single factor I've run across has greater significance. For example, programming language afficionados spend endless hours arguing about static vs dynamic typing, or functional vs object-oriented languages, or strict vs lazy evaluation, or...you get the idea. Those differences can indeed be important, but more so for experienced programmers working on large projects than for novice programmers working on classroom projects. None of these differences individually comes close to the issue of indentation.
I say this with some pain, because I'm a programming languages guy myself. I've taken part in some of those arguments, and spent many hours contemplating the relative merits of many much deeper programming language properites. It hurts me to say that something so shallow as requiring a few extra spaces can have a bigger effect than, say, Hindley-Milner type inference. I wish it weren't so, but that is what my classroom experience tells me, loudly and unambiguously.
Why not mandatory indentation?
The vast majority of languages don't make indentation mandatory. Instead, they usually use explicit syntax to indicate block structure, such as { and }, or BEGIN and END. Yet, if you look at well-written programs in those languages, they are almost always indented sensibly. Furthermore, there's remarkably little disagreement as to what “sensible” indentation looks like. So why not make that sensible indentation mandatory? There are several reasons that are often put forth:
- It's weird. Because the vast majority of languages don't use it, most programmers aren't used to the idea. Therefore, there's an initial sense of unease.
- It messes up the scanner/parser. True, mandatory indentation is harder to deal with using traditional scanners and parsers based strictly on regular expressions and context-free grammars, respectively. But it's usually trivial to modify the scanner to keep track of indentation and issue an INDENT token when indentation increases, and one or more OUTDENT tokens when indentation decreases. The parser can then treat these tokens just like normal BEGIN/END keywords. In this approach the scanner is no longer based strictly on regular expressions, but most scanners aren't anyway (for example, when dealing with nested comments). Using the INDENT/OUTDENT tokens, the parser can still be based strictly on context-free grammars.
- Don't try to take away my freedom! Programmers are a pretty libertarian bunch. Anytime somebody tries to impose rules that they follow 99% of the time anyway, they always focus on the 1% exceptions. For indentation, these exceptions often involve what to do with lines that are too long. So yeah, a language with mandatory indentation shoud deal gracefully with that issue. Or sometimes the exceptions involve code that is nested 20 levels deep. But these cases are almost always easy to rewrite into an equivalent but shallower structure. One place where I tend to deliberately break indentation rules is with temporary debugging output. I often leave such print statements unindented, so that they're easier to find when it's time to take them out. This is convenient, but I can certainly live without it.
- I don't want people to be able to read my code! Maybe some people view obfuscated code as job security. As a different example, the former champion in the TopCoder programming contest, John Dethridge, was famous for never indenting. Why? Because in TopCoder, there is a “challenge” phase, where other competitors look at your code and try to find bugs. So there's an incentive to make your code hard for other competitors to understand. I remember teasing him about this once, and he said laughingly “Beware my left-justified fury!” I replied that I'd be more afraid if his fury was right justified.
- It doesn't scale. As programs get bigger, both in lines of code and in number of programmers, you run into more mismatches in indentation. For example, you might want to move or copy a loop that was nested 5 levels deep to another location nested 3 levels deep. Or you might need to integrate code written by programmers that used different numbers of spaces per indentation level. Refactoring tools can certainly help here. But, you know, if you're the sort of programmer who would leave the indentation messed up when you moved that loop, just because your language didn't require you to fix it, then I probably don't want to work with you anyway.
What about novices?
Most of the objections above don't really apply to novices. Programming is new to them so it's all weird anyway. They have no idea what scanners and parsers are. As teachers, we already take away a lot of their freedoms anyway, and we certainly want them to care if somebody (namely us!) can read their code. And novices are usually not going to be writing large enough programs for the scaling issues to be a big problem.
Ok, but what are the benefits for novices?
- They are already used to the idea of indentation. Both from writing outlines in English class and from nested bullet lists in the (almost) ubiquitous PowerPoint, novices already have experience with the idea of indicating grouping using indentation. This makes such languages much easier for novices to learn. In contrast, explicit markers such as curly braces or BEGIN/END keywords are something novices have much less experience with. However natural such markers might seem to us, they are not natural for novices, and are a constant source of mistakes. (Worse, a typical novice strategy for dealing with those mistakes is to randomly insert or delete braces until it compiles—a strategy Peter Lee used to call “programming by random perturbation”.)
- Less is more. Or, put another way, smaller is better. To the novice, a fifteen-line program is less intimidating than a twenty-line program, a program that fits on one page is much easier to understand than a program that spans multiple pages. Those extra lines taken up by explict braces or BEGIN/END keywords really add up. Even if you use a style that puts a { at the end of the previous line, the } still usually goes on a line by itself. I shudder now everytime I look at a Java program and see a code fragment like
... } } } } } }
Note that I am not advocating compressing everything into as few lines as possible (a la Perl Golf). Nor am I saying that all redundancy is bad. But in this case, the redundancy of explicit markers was hurting more than it was helping. - Mandatory indentation promotes good habits. I've taught plenty of novices in languages that did not require indentation. If the language doesn't require it, they won't do it, or at least not consistently. If they are using an IDE that indents for them, fine, but sometimes they need to write code in a primitive editor like Notepad, and then they just won't bother. Even if I require the final submission to be properly indented, all too often they will do all their development without indentation, and then indent the code just before turning it in (kind of like the typical novice approach to commenting). Of course, indenting after the fact means that they don't get any of the benefits from indenting their code, such as making debugging easier.
On the other hand, if the language makes indentation mandatory, then the novice needs to keep their indentation up to date during the entire development cycle, so they will reap those benefits. Since I started using this language, I've also noticed improved indentation habits even when students switch to other languages without mandatory indentation. I can at least hope that this habit is permanent, although I have no evidence to back that up.
A surprise
I was shocked by how much the mandatory indentation seemed to help my students. I did not come into this expecting much of a change at all. I had experience with mandatory indentation in a couple of languages (most notably Haskell), and I had found it to be a pleasant way to code. Also, I had heard good things about people using Python in the classroom. However, I was by no means a convert at the time that I was designing my language.
I had two motivations for making indentation mandatory in the language. First, this language was designed to be the second language most of the students saw, and I wanted to expose them to a range of language ideas that they had not seen before. For example, their first language used static typing so I made my language use dynamic typing. Similarly, their first language did not make indentation mandatory, so I took the opposite route in my language. My second motivation was simply that I was annoyed. I was tired of students coming to me with code what was either completely unindented or, worse, randomly indented. I figured that making the compiler enforce indentation was the surest way to stop this.
Imagine my surprise when I started teaching this language and found the students picking it up faster than any language I had ever taught before. As fond as I am of the language, I'm certainly under no illusions that it's the ultimate teaching language. After carefully watching the kinds of mistakes the students were and were not making, I gradually realized that the mandatory indentation was the key to why they were doing better. This seemed to manifest itself to two ways, one obvious and one more subtle. The obvious way was that they were simply spending much less time fighting the syntax.
The more subtle way was that they appeared to be finding it easier to hold short code fragments in their head and figure out exactly what the fragment was doing. I conjecture that there may be some kind of seven-plus-or-minus-two phenomenon going on here, where adding extra lines to a code fragment in the form of explicit braces or BEGIN/END keywords pushes the code fragment above some size limit of what novices can hold in their heads. This wouldn't affect expert programmers as much, because they see beneath the braces to the chunks underneath, but novices live at the level of syntax.
Whatever the explanation, I'm now a convert to the power of mandatory indentation for novices. I've never taught Python, but I suspect those who have may have had similar experiences. If so, I'd love to hear from you.