Students are rarely given an opportunity to design an algorithmically non-trivial data structure. We might give them a scenario and ask them to design how to represent the data, but that design decision is usually something like choosing between a hash table or an array or a linked-structure. Once that high level decision is made, there is very little left to design.

One reason for this is that many data structures depend on non-obvious invariants. Most students are pretty shaky on the whole idea of invariants to begin with; they're never going to spontaneously come up with something like the rules describing red-black trees or leftist heaps. With these kinds of data structures, we basically present the finished product and possibly ask them to implement and/or analyze the data structure, but we never ask them to design something like that.

For several years, I've been trying to give students a taste of real data structure design, using the sample priority queue below. I usually do this in class, where I can help steer them with gentle questioning, but I try very hard to make sure that *they* come up with the important parts.

I always do this at some point after the students have studied binary heaps (the kind typically used in heapsort). I briefly review the priority queue operations binary heaps support (`insert`, `findMin`, and `deleteMin`) and the basic idea of heap-ordered trees. Then, I introduce the idea of *merging* two heaps. With binary heaps, this takes O(N) time, or even O(N log N) time, if done naively. I ask them to design a new data structure to support merging in O(log n) time, while the other priority queue operations run in the same bounds as for binary heaps, O(1) for `findMin` and O(log N) for `insert` and `deleteMin`. I suggest that they use heap-ordered binary trees (real trees, rather than simulating trees using arrays like in binary heaps), but that they can modify these trees in any way that is helpful.

With heap-ordered trees, `findMin` is trivial—it just returns the root. To get their creative juices flowing, I then ask them to implement `insert` and `deleteMin` in terms of `merge`. This is also pretty easy: `insert` creates a new singleton tree and calls `merge`; `deleteMin` discards the root and calls `merge` on the two children of the root. So the whole problem boils down to merging two trees efficiently.

At this point, they usually flounder for a while, which is both ok and expected. Some floundering is good. Naturally, they'll start to get a little frustrated. That, too, is ok and expected. I just don't want them to get frustrated to the point where they shut down. If the floundering continues too long without any progress, I'll ask them what they can tell immediately about the result tree by only looking at the roots of the input trees. They'll quickly reply that the smaller of the two input roots becomes the root of the result tree. So I have them draw that much of the result. Depending on how things are going, I may further suggest that they draw two question marks for the children of the root in the result tree.

Then the students will play for a while with how to fill in those question marks. There are two slots for subtrees, but we have three subtrees that need to go in those slots: the two subtrees of the “winning” input root plus the “losing” input root. To fit three subtrees into two slots, naturally we need to merge two of them and leave one unchanged. But which two should be merged?

Usually, the students will begin with some fixed strategy, such as always merge the two subtrees of the winning root. But it's easy to come up with counter-examples where such a strategy will take O(N) time.

Once it becomes clear that some smarter strategy is required, the students focus on the question of which two of the three subtrees to merge. Usually the next thing they try is merging the two subtrees with the smallest roots, which fails on the same kinds of counterexamples as before.

Eventually, based on the intuition that `merge` should run faster on small trees than on big trees, they try merging the two smallest trees. This is always a good learning point about ambiguity, as we realize that there are three different possible meanings of “smallest”: the smallest root according to the ordering on keys, the smallest subtree according to number of nodes, or the smallest subtree according to the height of the subtrees. We've already shot down the first one, but either of the other two will work, so I just go with whichever one the class prefers (usually smallest in number of nodes).

One catch is that to efficiently compare sizes of subtrees (either number of nodes or height), we need to store the sizes at every node. But that's easy to do by adding an extra field, and maintaining that size information is also easy, albeit slightly easier for number of nodes than for height.

Typically, we don't have a lot of time left in class at this point, so I help them with the analysis, but it's easy to show that this design supports `merge` in O(log N) time. And there was much rejoicing!

I call this data structure *maxiphobic heaps* because of the way `merge` avoids the biggest subtree at every step. But I don't tell the students that name until after they've already come up with the design.