Computer Science 3510
Summer 2003
Programming Assignment 2

First version due: July 10
Second version due: July 16

An abstract data type

Programs often need to use tools that are provided as abstract data types. This assignment has you implement an abstract data type that you will use in the next assignment.

This abstract data type manages sets of integers, with some somewhat unusual operations on them. It is the operations on the sets that influence how the abstract data type is implemented. Given a positive integer n, you start with a collection of sets {1}, {2}, ..., {n}, where each integer from 1 to n is in a separate set. The functions that this abstract data type supports are as follows.

merge(m,n)

Merge the sets that contain m and n into a single set.

together(m,n)

This returns true (1) if m and n are in the same set, and false (0) if they are in different sets.

The following shows an example. The collection of sets is shown after each merge operation is performed. In the case of a together operation, the result (0 or 1) is shown.

Operation Result
  {1} {2} {3} {4} {5} {6} {7}
merge(3,6) {1} {2} {3,6} {4} {5} {7}
merge(4,5) {1} {2} {3,6} {4,5} {7}
together(3,6) 1
together(4,6) 0
merge(3,5) {1} {2} {3,4,5,6} {7}
merge(3,1) {2} {1,3,4,5,6} {7}
together(3,4) 1
together(1,6) 1
together(2,4) 0

We will call this abstract data type the Merge type.


Implementation of the abstract data type

There is a nice way to implement the Merge type. The idea is to have a leader of each set. For example, if numbers 3, 5 and 8 are in the same set, then they might decide that 5 is their leader. You write a function leader(k) that returns the leader of the set that contains k. Then, to find out whether two numbers are in the same set, just check whether they have the same leader.

To implement the leader function, keep an array of integers called boss. If L is a leader, then boss[L] = L. That is, every leader is its own boss.

If number k is not a leader, then ideally boss[k] is the leader of the set containing k. Sometimes, however, the boss of a number is not its leader. But the leader of k is the same as the leader of the boss of k. By going to boss[k], at least you get closer to the leader. To find the leader of k, it suffices to perform the following loop.

  r = k;
  while(boss[r] != r) r = boss[r];
Now r is the leader of k.

When you are told to merge(m,n) what you need to do is

  1. Find the leader r of m and the leader s of n.

  2. Force r to be the boss (and leader) of s, by making boss[s] = r.

Now s is no longer its own boss. All numbers that have s as their boss now automatically have r as their leader, since the loop will not stop at s any more. It is very important that you only change the boss of a number that is currently its own boss, or the method will not work.


The assignment

Implement the Merge type as a class. Provide not only the merge and together operations, but also a constructor that takes a parameter n and builds a boss array for numbers 1,...,n. (Be careful. You might want to have n+1 spots in the array, since you will not use index 0.) The constructor should set boss[k] = k for each k from 1 to n, so that every number is a leader. That puts each number in a separate set initially.

Keep as much private in the class as possible. There should be no public variables. Do not make a function public unless you need to, because it is one of the operations provided by the abstract data type. The leader function, for example, should be private.

Use the implementation method discussed here. Do not invent your own. In particular, be sure that neither the merge nor the together function performs a scan of every member of the boss array. That is not called for here, and it is much less efficient than what is described here.


Testing your implementation

You will need to produce a tester in order to check whether your implementation is working. Write a main program that creates some Merge objects, and performs some merge and together operations on them. Check the results. For debugging purposes, you can provide another public operation in the class that prints out the leader of each value. That way, you can check that things are working correctly.


Improving the efficiency

There are two ways to improve the efficiency of your implementation of the merge type. One involves a change to the leader function, the other a change to the merge function and to the representation of the abstract data type.

IMPORTANT. These improvements are part of the assignment, and you will lose points for not doing them. But only make these improvements after everything is working correctly without them. You will receive no credit for implementing these improvements if your basic implementation of the merge type does not work.

You should not need to modify your tester at all. All of the public things will look the same to other modules.

The first improvement

The first change involves a change to the leader function. After the leader function scans through a chain of boss links to find the leader of a number, it goes back through the chain and puts the genuine leader in the boss array, for every number that was looked at in the chain. That way, subsequent leader computations will go much more quickly. For example, if the boss array contains

  boss[1] = 2
  boss[2] = 4
  boss[3] = 3
  boss[4] = 6
  boss[5] = 3
  boss[6] = 6
  boss[7] = 4
  boss[8] = 5
then computing leader(1) requires chaining through 1, 2, 4, 6, stopping at leader 6. The improvement adds another loop that changes the contents of the boss array to the following, by installing the correct leader (6) of each of the numbers that was looked at.
  boss[1] = 6
  boss[2] = 6
  boss[3] = 3
  boss[4] = 6
  boss[5] = 3
  boss[6] = 6
  boss[7] = 4
  boss[8] = 5
It is just a matter of rescanning through the chain, the same way the chain was scanned the first time, but now putting the leader into the array as you go.

Notice that we have not scanned the entire boss array from beginning to end! Boss[8] is still 5, even though the leader of 8 is 3. Only the numbers that were looked at in the original scan have their boss values changed. If you try to change everything in the array, you make the implementation slower, not faster.

Also notice that it was not just the boss of 1 that was changed. All of the numbers that were examined in the chain have their bosss set to their leaders. For example boss[2] was changed too.

The second improvement

Each number that is a leader has a collection of constituents who have it as their leader. For example, in the above array, number 3 is a leader, and the constituents of 3 are 3, 5 and 8. So number 3 has three constituents, counting itself. The constituent count of a leader tells how many numbers are in the set that contains that leader.

When doing a merge, you find two values s and t that are leaders. You can then either change the boss of s to t (so s is no longer a leader) or change the boss of t to s (so t is no longer a leader). The choice of which to do influences the efficiency of the implementation. The best choice is to change the boss of the number that has the fewest constituents. That tends to keep the lengths of the boss chains up to the leader short.

Modify the data structure so that each number has not only a boss, but also a count of its constituents. A number that is not a leader has no constituents. So now the information is an array of structures, where each structure contains a boss and a constituent count.

A picture of the initial array, before any merges have been done, might look like this. Notice that each number has one constituent, itself.

  index   boss  numConstituents
    1        1           1
    2        2           1
    3        3           1
    4        4           1
    5        5           1

When doing a merge, compare the constituent counts. Change the boss of the value with the fewer constituents. (If they have the same number of constituents, then the choice is arbitrary. For example, change the first one.) If you change things so that the boss of s becomes t, then remember that all of the constituents of s become new constituents of t. For example, if you do merge(3,5) in the above array, you might arbitrarily decide to make the boss of 3 be 5. Then the array looks like this.

  index   boss  numConstituents
    1        1           1
    2        2           1
    3        5           0
    4        4           1
    5        5           2
If you now do merge(5,1), you must change the boss of 1, since it has fewer constituents than 5. The array ends up looking like this.
  index   boss  numConstituents
    1        5           0
    2        2           1
    3        5           0
    4        4           1
    5        5           3
As before, only change the boss of a number that is currently a leader. If you now do merge(2,1), you must realize that you are really being asked to merge 2 and 5, since 5 is the leader of 1. Since 5 has more constituents, you change 2, yielding
  index   boss  numConstituents
    1        5           0
    2        5           0
    3        5           0
    4        4           1
    5        5           4
As you can see, this improvement tends to lead to shorter chains of bosses before the leader is found. It is not guaranteed to ensure that the boss is always the leader, though. You still need to do the leader calculation using a loop.


Turning in the assignment

I will give instructions for turning in this assignment later.