39B. Height-Balanced Binary Search Trees


An issue of time

Binary search trees are a nice idea, but they fail to accomplish our goal of doing lookup, insertion and deletion each in time O(log2(n)), when there are n items in the tree. Imagine starting with an empty tree and inserting 1, 2, 3 and 4, in that order.

You do not get a branching tree, but a linear tree. All of the left subtrees are empty. Because of this behavior, in the worst case each of the operations (lookup, insertion and deletion) takes time Θ(n). From the perspective of the worst case, we might as well be using a linked list and linear search.

That bad worst case behavior can be avoided by using an idea called height balancing, also called AVL trees.


Height of nodes and trees

The height of a node in a tree is the length of the longest path from that node downward to a leaf, counting both the start and end vertices of the path. The height of a leaf is 1. The height of a nonempty tree is the height of its root. For example, tree

has height 3. (There are 3 equally long paths from the root to a leaf. One of them is (30 18 24).) The height of an empty tree is defined to be 0.


Height-balanced trees

Height-balancing requirement.

A node in a tree is height-balanced if the heights of its subtrees differ by no more than 1. (That is, if the subtrees have heights h1 and h2, then |h1h2| ≤ 1.)

A tree is height-balanced if all of its nodes are height-balanced. (An empty tree is height-balanced by definition.)

For example, the preceding tree is height-balanced. Check each node. Notice that the height of that node's left subtree differs from the height of its right subtree by no more than 1. The node holding 18 has a left subtree of height 0 and a right subtree of height 1. The root has two subtrees of height 2.

Our goal is to keep our binary search trees height-balanced. The basic algorithms defined on the preceding pages can yield an unbalanced tree. Rather than casting them aside, though, we simply patch them by adding balancing steps to restore the balance. It can be shown that:

A height-balanced tree with n nodes has height Θ(log2(n)).

Since the cost of our algorithms is proportional to the height of the tree, each operation (lookup, insertion or deletion) will take time Θ(log2(n)) in the worst case, as long as we keep the trees height-balanced and don't spend more than O(log(n)) time keeping them height-balanced.