Thursday, September 25, 2008

Less than vs Greater than

From math, we know that X < Y is the same as Y > X. But in programming, they're not the same.

Oh, I'm not talking about situations where X and Y have side effects that interact badly. Instead, I'm talking about situations where one may be more error prone than the other.

Although the two expressions are mathematically equivalent, they're not linguistically equivalent. One focuses attention on something being smaller, and the other focuses attention on something being bigger. For example, it is a relatively common mistake to go the wrong way when searching a binary search tree. However, almost every time I run into this error, the student has also written the comparisons backward from the direction I think of as “natural”.

Consider the following function for determining whether a given key x appears in a binary search tree t:

  function member(x,t) is
     if t = null then return false
     if t.key = x then return true
     if t.key < x then return member(x,t.left)
     if t.key > x then return member(x,t.right)
The fact that I've written this using recursion instead of iteration is irrelevant. What matters is that the third case goes left when it should go right, and vice versa for the fourth case. But looking more carefully at that third case
     if t.key < x then return member(x,t.left)
I'm not particularly surprised by the error. The comparison t.key < x focuses attention on something being smaller, and the left side of the tree is where the smaller keys go. By asking whether x is smaller than t.key, rather than whether t.key is smaller than x, I think we're much less likely to fall into this trap.

11 comments:

Lawrence Kesteloot said...

I have a similar observation about people who write "if (5 == i)" to avoid the possibility of writing a single equal sign. Linguistically that's wrong. You're not testing 5 to see if it's equal to i, you're testing i. Let the compiler warn you about the assignment bug.

Adam said...

Could you explain why this is wrong?

What situations are X or Y more error prone that the other?

Rearranging the two last lines in your member function will not change the behavior of your program, x is either bigger or smaller than t.key at that point. So why does this matter?

Rohan Sharma said...

Jon Bentley's Programming Pearls covers this quite well.

For instance, if you're writing a binary search, loop like this:

if x[mid] < t then low = mid + 1
if x[mid] = t then return mid
if x[mid] > t then high = mid - 1

It's pretty helpful to keep the invariant in mind. In the first case, x[mid] < t implies x[low] < x[low + 1] < x[low + 2] < ... < x[mid] < t, eliminating everything from x[low] to x[mid]. The third case just follows by symmetry.

Chris Okasaki said...

Adam: In this example, it's wrong because if t.key < x (ie, x > t.key), then the recursive call should be on t.right, not on t.left.

rgrig said...

@lawrence: Equals is symmetric while less-than is not. That makes it a different issue in this case. Also writing 5==x avoids bugs but doesn't lead to bugs as is the case for less-than. This also makes it a different issue.

Lawrence Kesteloot said...

@rgrig: I agree with you. My point is that (i == 5) and (5 == i) are not linguistically the same (Chris's point), and that the former is closer to the intent of the programmer.

John Clements said...
This comment has been removed by the author.
John Clements said...

I believe this is related to the way we parse english sentences; in particular, we group the verb & direct object together into the "verb phrase." In a similar way, I claim that

3 < 5

is (linguistically) parsed as

3 ( < 5)

... that is, 3 has the property of being less-than-five.

This also comes up when making the shift to prefix notation; many people will convert (3 < 5) into
(< 5 3), because they're reading the prefix form as "lessthanfive" applied to 3. In fact, this continues to be a problem even for experienced scheme & lisp programmers.

Unknown said...

I try to keep the things I'm comparing in "number line order": left to right from smallest to largest. I think this is something I picked up from the first edition of Code Complete ages ago.

As a result I hardly ever use > and >=.

Witek Baryluk said...

I always tend to use searched value to be on appropriate side, to know in which direction i need to traverse tree:

so:

function member(x,t) is
if t = null then false
if t.key = x then true
if x < t.key then member(x,t.left)
if t.key < x then member(x,t.right)

Witek Baryluk said...

There is also few possibilities of testing about equality, in loops.

for (i = 0; i < N; i++) ...

or

for (i = 0; i != N-1; i++)

the first one is in some sense safer. Eventually if by any incident in loop body we will jump beyond N it will terminate, but in second it will enter "infnite" loop.