Computer Science 3300
Section 001
Fall 2005
Programming Assignment 3

First version due: Friday, October 28, 11:59pm.
Second version due: Monday, November 7, 11:59pm.

An abstract data type

Programs often need to use tools that are provided as abstract data types. An abstract data type is a type of data, along with some operations. An object that belongs to the data type remembers some information, and the operations either get or modify that information. The stacks that we implemented in class are an example of an abstract data type; an object of type Stack remembers a stack of integers, and has operations such as Push, Pop and top for working on that stack.

This assignment has you implement an abstract data type that you will use in the next assignment. The information that is stored is a collection of sets of integers. For example, at a given point in time, an object might be remembering collection ({1,4}, {5}, {2,3,6}). The sets are always disjoint, meaning that no number is in more than one of the sets.

We need somewhere to start. With a stack, you normally start with nothing in the stack. For our data type here, we start with sets ({1}, {2}, {3}, ..., {n}) for some integer n. That is, each number from 1 to n is in a set by itself.

An abstract data type is characterized not only by the information that its objects remember, but also by the operations that you can perform. For our data type, the functions are as follows. For both functions, k and m are integers from 1 to n.

merge(m,k): Merge the sets that contain m and k into a single set. For example, if you are remembering collection of sets ({1,4}, {2,5}, {3}, {6,7}) and do operation merge(4,2), then, afterwards, you are remembering ({1,2,4,5}, {3}, {6,7}). The sets that contain 4 and 2 have been combined into a single set. Order does not matter in a set, so you can write the numbers in any order.
together(m,k): This returns true (1) if m and k are in the same set, and false (0) if they are in different sets. For example, if you are currently remembering collection of sets ({1,4}, {2,5}, {3}, {6,7}), then together(1,4) returns 1, but together(2,6) returns 0. The together function does not change the collection of sets that you are remembering.

The following shows an example. The collection of sets is shown after each merge operation is performed. In the case of a together operation, the result (0 or 1) is shown.

Operation	Result
	{1} {2} {3} {4} {5} {6} {7}
merge(3,6)	{1} {2} {3,6} {4} {5} {7}
merge(4,5)	{1} {2} {3,6} {4,5} {7}
together(3,6)	1
together(4,6)	0
merge(3,5)	{1} {2} {3,4,5,6} {7}
merge(3,1)	{2} {1,3,4,5,6} {7}
together(3,4)	1
together(1,6)	1
together(2,4)	0

We will call this abstract data type the Merge type.

Implementation of the abstract data type

There is a nice way to implement the Merge type. The idea is to have a leader of each set. For example, if numbers 3, 5 and 8 are in the same set, then they might decide that 5 is their leader. You write a function leader(k) that returns the leader of the set that contains k. Then, to find out whether two numbers are in the same set, just check whether they have the same leader.

To implement the leader function, keep an array of integers called boss. If L is a leader, then boss[L] = 0, indicating that L does not have a boss..

If number k is not a leader, then ideally boss[k] is the leader of the set containing k. Sometimes, however, the boss of a number is not its leader. Your leader might be your boss's boss, or your boss's boss's boss, etc. The leader of k is the same as the leader of the boss of k. By going to boss[k], at least you get closer to the leader. To find the leader of k, it suffices to perform the following loop.

  r = k;
  while(boss[r] != 0) r = boss[r];

Now r is the leader of k.

When you are told to merge(m,k) what you need to do is

Find the leader r of m and the leader s of k.
Force r to be the boss (and leader) of s, by making boss[s] = r.

All numbers that used to have s as their leader now automatically have r as their leader, since the loop will not stop at s any more.

It is very important that you only change the boss of a number that currently does not have a boss, or the method will not work. Only a leader can be given a new boss.

The assignment, part 1

Implement the Merge type as a structure. Provide a constructor that takes a parameter n and sets up the boss array so that all numbers from 1 to n are leaders. (Each is in a set by itself, so it is the leader of its set.) Write the merge and together operations. Be careful. C++ indexes arrays starting at 0. You need to store a boss for each number from 1 to n.

Create a header file that contains the Merge type and a prototype for merge and together, as follows.

  void merge(int m, int k);
  bool together(int m, int k);

Do not provide a prototype for any helper functions, such as leader, since other functions are not being exported.

Use the implementation method discussed here. Do not invent your own. In particular, be sure that neither the merge nor the together function performs a scan of every member of the boss array. That is not called for here, and it is much less efficient than what is described here. If you need to look at every member of the boss array to implement merge or together, then you are doing something wrong.

Testing your implementation

You will need to produce a tester in order to check whether your implementation is working. Write a main program that creates some Merge objects, and performs some merge and together operations on them. Check the results. For debugging purposes, you can provide another public operation that prints out the boss of each value. That way, you can check that things are working correctly.

The assignment, part 2: improving the efficiency

There are two ways to improve the efficiency of your implementation of the Merge type. One involves a change to the leader function, the other a change to the merge function and to the representation of the abstract data type.

IMPORTANT. These improvements are part of the assignment, and you will lose points for not doing them. But only make these improvements after everything is working correctly without them. You will receive no credit for implementing these improvements if your basic implementation of the merge type does not work.

The first improvement: improving leader

The first change involves a change to the leader function. After the leader function scans through a chain of boss links to find the leader of a number, it should go back through the chain and put the genuine leader in the boss array for every number that was looked at in the chain. That way, subsequent leader computations will go much more quickly. For example, if the boss array contains

  boss[1] = 2
  boss[2] = 4
  boss[3] = 0
  boss[4] = 6
  boss[5] = 3
  boss[6] = 0
  boss[7] = 4
  boss[8] = 5

then computing leader(1) requires chaining through 1, 2, 4, 6, stopping at leader 6. The improvement adds another loop that changes the contents of the boss array to the following, by installing the correct leader (6) of each of the numbers that was looked at.

  boss[1] = 6
  boss[2] = 6
  boss[3] = 0
  boss[4] = 6
  boss[5] = 3
  boss[6] = 0
  boss[7] = 4
  boss[8] = 5

It is just a matter of rescanning through the chain, the same way the chain was scanned the first time, but now putting the leader into the array as you go.

Notice that we have not scanned the entire boss array from beginning to end! Boss[8] is still 5, even though the leader of 8 is 3. Only the numbers that were looked at in the original scan have their boss values changed. If you try to change everything in the array, you make the implementation slower, not faster.

Also notice that it was not just the boss of 1 that was changed. All of the numbers that were examined in the chain have their bosss set to their leaders. For example boss[2] was changed too.

Do not make the mistake of presuming that your implementation is correct without checking it carefully. There are subtle mistakes that you can make. Print the boss array before and after doing a leader computation. Is it correct? You might want to put a prototype for the leader function in your header file temporarily, and to do direct tests of leader.

The second improvement: improving merge

Each number that is a leader has a collection of constituents who have it as their leader. For example, in the above array, number 3 is a leader, and the constituents of 3 are 3, 5 and 8. So number 3 has three constituents, counting itself. The constituent count of a leader tells how many numbers are in the set that contains that leader.

When doing a merge, you find two values s and t that are leaders. You can then either change the boss of s to t (so s is no longer a leader) or change the boss of t to s (so t is no longer a leader). Either one will accomplish the goal of merging the two sets, but the choice of which to do influences the efficiency of the implementation. The best choice is to change the boss of the number that has the fewest constituents. That tends to keep the lengths of the boss chains up to the leader short.

Modify the data structure so that each number has not only a boss, but also a count of its constituents. A number that is not a leader has no constituents. So now the information is an array of structures, where each structure contains a boss and a constituent count.

A picture of the initial array, before any merges have been done, might look like this. Notice that each number has one constituent, itself.

  index    boss  numConstituents
    1        0           1
    2        0           1
    3        0           1
    4        0           1
    5        0           1
    6        0           1
    7        0           1

When doing a merge, compare the constituent counts of the two leaders. Change the boss of the leader with the fewer constituents. (If they have the same number of constituents, then the choice is arbitrary.) If you change things so that the boss of s becomes t, then remember that all of the constituents of s become new constituents of t.

If you do merge(3,5) in the above array, you might arbitrarily decide to make the boss of 3 be 5. Then the array looks like this.

  index   boss  numConstituents
    1        0           1
    2        0           1
    3        5           0
    4        0           1
    5        0           2
    6        0           1
    7        0           1

If you now do merge(5,1), you must change the boss of 1, since it has fewer constituents than 5. The array ends up looking like this.

  index   boss  numConstituents
    1        5           0
    2        0           1
    3        5           0
    4        0           1
    5        0           3
    6        0           1
    7        0           1

As before, only change the boss of a number that is currently a leader. If you now do merge(2,1), you must realize that you are really being asked to merge 2 and 5, since 5 is the leader of 1. Since 5 has more constituents, you change 2, yielding

  index   boss  numConstituents
    1        5           0
    2        5           0
    3        5           0
    4        0           1
    5        0           4
    6        0           1
    7        0           1

As you can see, this improvement tends to lead to shorter chains of bosses before the leader is found.

Suppose you continue by merging 6 and 7. You might get the following.

  index   boss  numConstituents
    1        5           0
    2        5           0
    3        5           0
    4        0           1
    5        0           4
    6        7           0
    7        0           2

Now merge 1 and 6. Their leaders are 5 and 7. Since 5 has more constituents, change the leader of 7 to be 5. The new information is as follows.

  index   boss  numConstituents
    1        5           0
    2        5           0
    3        5           0
    4        0           1
    5        0           6
    6        7           0
    7        5           0

Notice that 5 now has six constituents. Also notice that, although 6 is one of 5's constituents, its boss is 7. The boss chains are shorter, but you still need to do the leader calculation using a loop.

Turning in the assignment

Do all parts of the assignment for each version. If, for the first version, you only do part 1, then you will be graded accordingly, and will receive partial credit.

Use the submit program to turn in this assignment. If your files are merge.h and merge.cpp, use command

  ~karl/3300/bin/submit 3a merge.cpp merge.h

for the first version, and

  ~karl/3300/bin/submit 3b merge.cpp merge.h

for the second version. You can check what you have submitted. For example,

  ~karl/3300/bin/submit 3a

will tell you what you have submitted for assignment 3a.

Computer Science 3300 Section 001 Fall 2005 Programming Assignment 3