Friday, May 2, 2008

Beware Pseudo-Arrays

I'm always fascinated by the kinds of programming mistakes that novices make, and what they say about that student's state of mind. I particularly enjoy mistakes that aren't really mistakes, in the sense that the student has actually gotten the code to work, but where the code clearly demostrates some flaw in that student's mental model of programming. For example, see my previous post on Boolean confusion.

Here's another example involving arrays, or rather the lack of them. I was reminded of this example just yesterday when multiple teams tried to use this idea in a local programming contest.

Suppose you are rolling some standard six-sided dice, and keeping track of how many times each number has appeared. The most natural way to do this would be to use an array indexed from 1-6. In a language with zero-based arrays, you might use indices 0-5 and subtract 1 from each die value, or you might use indices 0-6 and simply ignore index 0.

However, many novices faced with this decision will reject arrays altogether and use six separate variables, perhaps named count1, count2, count3, count4, count5, and count6. (Or, if they are really gluttons for punishment, ones, twos, threes, fours, fives, and sixes.) I call these kinds of variables pseudo-arrays.

Why use pseudo-arrays? Because real arrays are scary. Plain variables and if-statements are much friendlier! After all, why write

   count[dieValue]++;
when you could write
   if (dieValue == 1) {
      count1++;
   }
   else if (dieValue == 2) {
      count2++;
   }
   else if (dieValue == 3) {
      count3++;
   }
   else if (dieValue == 4) {
      count4++;
   }
   else if (dieValue == 5) {
      count5++;
   }
   else {
      count6++;
   }
Oh, and don't bother suggesting a switch-statement — if-statements are good enough, thank you very much.

Why do so many novices do this? In many cases, I think it is because they were first exposed to if-statements, and found them easy, and then later were exposed to loops, and found them difficult. Arrays are usually processed using loops, and are usually introduced while the beginner is still reeling from the trauma of loops. Thus, arrays become tainted and suspect. In such a novice's mind, the approach with variables and if-statements truly is easier, especially because so many of the if-statements can be programmed using cut-and-paste — the favorite editing commands of most novice programmers!

Every once in a while, I see code by somebody who has gotten over their fear of loops, but has not yet become reconciled with arrays. This can lead to gems where the student tries to loop through a pseudo-array, as in

   for (int d = 1; d <= 6; d++) {
      if (d == 1) {
         System.out.println(count1);
      }
      else if (d == 2) {
         System.out.println(count2);
      }
      else if (d == 3) {
         System.out.println(count3);
      }
      else if (d == 4) {
         System.out.println(count4);
      }
      else if (d == 5) {
         System.out.println(count5);
      }
      else {
         System.out.println(count6);
      }
   }
As Dave Barry would say, I am not making this up.

17 comments:

Greg D said...

Yep, you aren't making it up. :) It's out there (here?) in production code. In fact, that last snippet is a degenerate instance of the For-Case paradigm. Wiki also lists it as the Loop-switch sequence.

Actually, since graduating and joining the industry, I'm shocked at how much professionals similarly abuse copy/paste and fear advanced programming constructs like "containers" and "functions".

I suppose I'm glad that the average code I graded as a TA was better than the average code I maintain as a professional. It tells me that the industry should slowly be getting better. :)

Mike Kucera said...

the average code I graded as a TA was better than the average code I maintain as a professional

This is so true. Sometimes I just want to get out my red marker. I'd have to give most of the code I encounter something like a C, the code works (most of the time) but its usually a mess.

I think many programmers fear deadlines, so the hack something that just works and quickly move on. This is a losing game as bugs and maintenance tasks soon pile up and there never seems like enough time to get it all done. I find that if you take the time to always think things through, to put some love into the code, the result is quality software and somehow deadlines are always met.

Carlos da Silva said...
This comment has been removed by the author.
Chris Okasaki said...

Carlos: No, I was unaware of that example from Dietel. I don't think these students ever had any Dietel textbooks in earlier courses, but who knows if they've picked some up elsewhere.

Chris Okasaki said...

Oh, that comment has been deleted by its author. It pointed to
http://www.amazon.com/review/R2C7L5KHUVHOR2/ref=cm_cr_rdp_perm
which contains a snippet of code from a textbook that is similar to what I posted.

One Time Login said...

How many eyes have passed over this code and no one has spotted this glaring error:

if (dieValue == 1) {
count1++;
}

references a scalar variable. I believe you're looking for count[1]++

James said...

I remember well being afraid of arrays ... over 12 years ago. I think we were using Pascal (or BASIC) and the thing that got me was dimensioning an array. I couldn't make sense of it.

I don't know if I did the exact hacks you described but they would have looked like a logical alternative.

Weird that I recall that fear of arrays. And databases. I remember moving to C++ in college and being amazed it didn't have associated arrays built into the language. Strange memories.

I wonder what technologies I fear today without realizing?

JonMR said...

One Time, reading the whole blog post helps.

grant rettke said...

How is it that students internalize a conceptual difference between the notion of bindings between:

6count -> 10
and
counts[6count] -> 10

?

Chris League said...

Once in a while, I have tried to motivate arrays using an example with variables like count1, count2, etc. An array is then just a more convenient way to write a sequence of variables of the same type. The ‘a-ha’ moment is when students learn that any integer expression can go in the brackets, and they use “print(count[d])” over “if(d==2) print(count[2])...”

Similarly, I have tried to motivate loops by asking students to write out extremely repetitive code, and then showing how it can become two lines by abstracting out a loop counter.

But if beginners see no value in true abstraction over copy/paste, maybe this backfires..? Still unclear to me.

Stick said...
This comment has been removed by the author.
Stick said...

I (as one of those students who has had an unnatural fear of arrays) think that much of the confusion stems from two factors:

1) Array syntax. My experience is that it's highly inconsistent across languages, and most students would rather use variables (which are much more consistent) than look up array syntax when they can't remember it. I have limited experience in perhaps 4 languages, and I usually don't remember which syntax goes with which language.

2) Poor design. The average student, in my classes at least, goes about solving a problem by programming one thing as fast as they can, and then testing to find the next small thing to try, programming that... rinse, repeat. No design necessary. There may be some basic design ideas when the student starts, but students rarely have the full grasp of a problem necessary to make even the simple design decisions to use arrays.

rgrig said...

@Chris League: Arrays are not quite a sequence of variables of the same type: The size of the array can be chosen at run-time. I think this distinction is quite important.

Chris Okasaki said...

rgrig: Good point. The situations where students are most likely to go down this path are exactly those where the size is static, such as 6 for the numbers on dice, or 4/6 for directions in a square/hex grid.

Greg D said...

Now that you mention it, I remember the exact assignment where I was taught not to fear arrays. It was in an early programming class at the university and we were writing some cute little card game that played itself. I created my "Player" class, but instead of making an array of Players or a container of Players, I created 4 instances and (predictably) called them Player1 through Player4.

The instructor only docked me minor points for the mistake (if any), but I remember the large, red circle around those declarations with the words: "When you use names like this, you should probably use an array instead."

Thanks, Prof! I don't believe I've made that mistake since. :)

Epic Fail Guy said...

you can balance the two approaches by creating an array and then indexing it with a sequence of constants that have some meaning to you, e.g. array[ONES], array[TWOS], etc. However, I would rarely use that.

Raoul Duke said...

(a) the fear of collapsing the if/else into a single array update might be similar to people being freaked out by higher-order stuff in other languages. since it is all about the (a1) power and (a2) syntax of being able to collapse a long-winded something down to something concise. it takes a while for the brain to grok that. i mean, i audited a general relativity course and damned if i can comprehend einstein summation notation yet. nor am i able to really "do" haskell.

(b) perhaps anybody trying to teach could take advantage of what the commenters have said here, and motivate something like arrays by these horrible alternatives: show that one aspect of using arrays is to be able to be working at an ever-so-slightly higher level of abstraction which translates into rather large ASCII reduction, which has the potential to make code better?