10.2.6. Balanced Binary Search Trees


An issue of time

Binary search trees are a nice idea, but they fail to accomplish our goal of doing lookup, insertion and deletion each in time O(log2(n)), when there are n items in the tree. Imagine starting with an empty tree and inserting 1, 2, 3 and 4, in that order.

     1      1        1           1
             \        \           \
              2        2           2
                        \           \
                         3           3
                                      \
                                       4
You do not get a branching tree, but a linear tree. All of the left subtrees are empty. Because of this behavior, in the worst case each of the operations (lookup, insertion and deletion) takes time Θ(n). From the perspective of the worst case, we might as well be using a linked list and linear search.

That bad worst case behavior can be avoided by using an idea called height balancing, sometimes called AVL trees.


Height-balanced trees

The height of a node in a tree is the length of the longest path from that node downward to a leaf, counting both the start and end vertices of the path. The height of a leaf is 1. The height of a nonempty tree is the height of its root. For example, tree

             30
            /  \
           /    \
         18      50
           \    /  \
           24  36   51
has height 3. (There are 3 equally long paths from the root to a leaf. One of them is (30 18 24).) The height of an empty tree is defined to be 0.

Height-balancing requirement. A node in a tree is height-balanced if the heights of its subtrees differ by no more than 1. (That is, if the subtrees have heights h1 and h2, then |h1h2| ≤ 1.) A tree is height-balanced if all of its nodes are height-balanced. (An empty tree is height-balanced by definition.)

For example, tree

             30
            /  \
           /    \
         18      50
           \    /  \
           24  36   51
is height-balanced. Check each node. Notice that the height of that node's left subtree differs from the height of its right subtree by no more than 1. The node holding 18 has a left subtree of height 0 and a right subtree of height 1. The root has two subtrees of height 2.

Our goal is to keep our binary search trees height-balanced. The basic algorithms defined on the preceding page can yield an unbalanced tree. Rather than casting them aside, however, we simply patch them by adding balancing steps to restore the balance. It can be shown that a height-balanced tree with n nodes has height Θ(log2(n)). Since the cost of our algorithms is proportional to the height of the tree, each operation (lookup, insertion or deletion) will take time Θ(log2(n)) in the worst case.


Rotations


Single rotations

Suppose that tree T has a left subtree of height h and a right subtree of height h+2. T is not height balanced, but imagine that both of its subtree are height-balanced. Then an idea for restoring balance is to perform a single rotation, as follows. Here, x and y are integers representing nodes and A, B and C are subtrees.

           x                     y
          / \                   / \
         /   \                 /   \
        A     y       ==>     x     C
             / \             / \
            /   \           /   \
           B     C         A     B
It is easy to check that a single rotation preserves the ordering requirement for a binary search tree. For example, based on the position of subtree B in the left-hand tree, all values in B must be >x and <y. That is just what is required of B in the right-hand tree.

There is a similar single rotation to use when the left subtree has height h+2 and the right subtree has height h. It just replaces the right-hand tree above by the left-hand tree.

The big question is whether performing a single rotation does any good. Is the tree surely height-balanced after doing it? We start with the following tree.

                  x
                 / \
                /   \
    height h~> A     y <~ height h+2
                    / \
                   /   \
                  B     C
At least one of trees B and C has height h+1, since otherwise node y could not have height h+2. Since the the tree rooted at y is height-balanced, the other of B and C must have height h or h+1. So we know that each of trees B and C has height h or h+1. After performing the single rotation we get the following.
                    y
                   / \
                  /   \
                 x     C <~ height h or h+1
                / \
               /   \
  height h ~> A     B <~ height h or h+1
Suppose that tree B has height h. Then the tree rooted at node x has height h+1, and the entire tree is height-balanced.

But suppose that tree B has height h+1. Then the subtree rooted as x has height h+2. If tree C has height h, the tree is not height-balanced. We need to do something else.


Double rotations

Look again at rebalancing a tree, and assume that a single rotation does not fix it. Then the heights must be as follows.

                  x
                 / \
                /   \
    height h~> A     y <~ height h+2
                    / \
                   /   \
    height h+1 ~> B     C <~ height h
Since B has height h+1, it cannot be empty. Let's expand it one more level.
                  x
                 / \
                /   \
    height h~> A     y <~ height h+2
                    / \
                   /   \
    height h+1 ~> z     C <~ height h
                 / \
                /   \
               U     V
By similar reasoning as before, each of subtrees U and V has height either h or h−1. A double rotation lifts z up between x and y, yielding the following tree.
                  z
                 / \
                /   \
	       x     y
              / \   / \
             A   U V   C
The subtrees rooted at x and y each have one subtree of height h and another of height either h or h−1, so they are height-balanced. Nodes x and y each have height h+1, so the tree rooted at z is height-balanced.

There is a similar double-rotation for the case where the left subtree is 2 higher than the right subtree. See if you can see what it should be.


Summary: zig-zig and zig-zag

If a node is already height-balanced, you should not do any kind of rotation on it. Leave it alone. But if it is not height-balanced, decide whether a single rotation or a double rotation is appropriate as follows. Starting at the root, take two steps downward, each time moving into the higher subtree. For example, looking at

              20
             /  \
            /    \
          10      30
                 /  \
                /    \
              25      40
                     /
                    /
                   35
The left subtree of the root has height 1, but the right subtree has height 3. So a rotation is called for. Imagine taking two steps toward the higher subtree. So, starting at 20, step to 30 and then to 40.

If the two steps are in the same direction (two right steps or two left steps), as they are in this example, we call it a zig-zig, and a single rotation is called for. Performing the single rotation yields the following tree.

               30
              /  \
             /    \
            /      \
          20        40
         /  \      /
        /    \    /
      10     25  35
If the two steps are in opposite directions (a left step and a right step), we call that a zig-zag. In that case, a double-rotation is called for. For example,
              20
             /  \
            /    \
          10      30
                 /  \
                /    \
              25      40
                \    
                 \
                  28
is out of balance, and two steps toward the higher subtree take you from 20 to 30 (a right step) then from 30 to 25 (a left step). A double rotation is called for here. Performing the double rotation yields the following.
               25
              /  \
             /    \
            /      \
          20        30
         /         /  \
        /         /    \
       10        28     40


A double rotation is two single rotations.

Each kind of rotation has two versions, one that moves things from right to left and one that moves things from left to right. Let's look at a left-to-right double rotation.

             y                 z
            / \               / \
           /   \             /   \
	  x     C           x     y
         / \         ==>   / \   / \ 
        /   \             A   U V   C  
       A     z
            / \
           /   \
          U     V
That can be accomplished in two steps, first a single rotation from right-to-left (the opposite direction) at x, then a single rotation from left to right at the root.
             y                   y                   z
            / \                 / \                 / \
           /   \               /   \               /   \
	  x     C             z     C             x     y
         / \         ==>     / \        ==>      / \   / \ 
        /   \               /   \               A   U V   C  
       A     z             x     V
            / \           / \
           /   \         /   \
          U     V       A     U
That makes the implementation of a double rotation simple: just call a single-rotation function twice.


Perform rotations from bottom to top

When performing a rotation at a given node, it is important for the subtrees of that node to be height-balanced already. That is ensured by performing rotations from bottom to top. If you perform an insertion, for example, you start by following a path downward to the insertion point. Then walk back up that path. At each node, check whether a rotation is needed. If so, then perform it. It is possible that more than one rotation is needed. Just do each one that is required, in bottom-to-top order. Here is an example. The first step is to use the basic insertion algorithm, walking down the tree to the insertion point.

          100                                   100
         /   \                                 /   \
        /     \         insert 54             /     \
      45       150      ---------->         45       150
     /  \         \     (basic alg.)       /  \         \
    /    \         \                      /    \         \
  16     58         160                 16      58        160
        /                                      / 
       /                                      /
     50                                     50
                                              \
                                               \
                                                54
Now walk back up the same path that you just walked down. The node holding 50 is already height-balanced. But the node holding 58 is not. If you take two steps downward from 58, each toward the higher subtree, you must take one step to the left and then one step to the right. So this is a zig-zag, and it calls for a double rotation. The subtree rotation is
            58              54
           /               /  \
          /       --->    /    \
        50              50      58
          \
           54
so the full tree is now
          100
         /   \
        /     \
      45       150
     /  \         \
    /    \         \
  16     54         160
        /  \
       /    \
     50      58
Continuing to walk back up the insertion path takes you to 45 and 100. Since those nodes are already height-balanced, you make no changes to them.

Here is another example.

           60                                  60
          /  \                                /  \
         /    \          insert 95           /    \
       40      80        ------------>     40      80
      /       /  \       (basic alg.)     /       /  \
     /       /    \                      /       /    \
    30     70      85                  30      70      85
          /       /  \                        /       /  \
         /       /    \                      /       /    \
       65      82      90                  65      82      90
                                                             \
                                                              \
                                                               95
Now walk back up the insertion path. The nodes containing 90, 85 and 80 are already height-balanced. But the root is not height-balanced, since its left subtree has height 2 and its right subtree has height 4. So in this case, a rotation is done at the root of the tree. Two steps downward from the root toward the higher subtree are both to the right, so this is a zig-zig operation, calling for a single rotation. The rotation is as follows.
                                                  80
            60                                   /  \
           /  \                                 /    \
          /    \                               /      \
        40      80                           60        85
       /       /  \                         /  \      /  \
      /       /    \                       /    \    /    \
    30      70      85        ---->      40     70  82    90
           /       / \                  /      /            \
          /       /   \                /      /              \
        65      82     90            30     65                95
                         \
                          \
                           95


Exercises

  1. What tree do you get if you insert 4 into the following binary search tree using the algorithm that performs rotations to keep the tree height-balanced?

              10
             /  \
            /    \
           5     15
          /     /  \
         /     /    \
        2     12     20
    
    Answer

  2. What tree do you get if you insert 25 into the following binary search tree using the algorithm that performs rotations to keep the tree height-balanced?

              10
             /  \
            /    \
           5     15
          /     /  \
         /     /    \
        2     12     20
             /      /  \
            /      /    \
           11    18      21
    
    Answer