Computer Science 3510
Fall 1999

Programming Assignment 2

First solution due: Mon, Oct, 18, 5:00pm.
Second solution due: Fri. Oct. 29, 5:00pm

A word to the wise

Begin working on this program right away. Do not wait. Create a refinement plan based on the plan presented below. Create a schedule, stating clearly the date and time at which you will have each step completed. Leave extra time for unforseen difficulties.

Get something done for the first pass. Make it as good as you can.

Preliminaries

A graph is a collection of vertices connected by a collection of edges. An edge connects exactly two vertices to one another. You can draw a picture of a graph by showing the vertices and the edges connecting them. Here is an example. The vertices are shown as circles with numbers in them and the edges are lines connecting the vertices.

A weighted graph is a graph in which each edge has a number attached to it, called the weight of the edge. Here is a picture of a weighted graph.

A spanning tree of a graph is obtained by deleting as many edges as possible, without making it impossible to get from any vertex to any other by following a sequence of edges. For example, the following is a spanning tree of the above weighted graph.

The weight of a spanning tree is the sum of the weights of its edges. For example, the weight of the above spanning tree is 59. Here is another spanning tree for the same graph. Its weight is 48.

Obviously, some spanning trees have smaller weight than others. A minimal spanning tree is a spanning tree with the smallest possible weight.

Motivation for computing minimal spanning trees

Suppose that we have a collection of towns, and we must build railroad tracks so that it is possible for a train to get from any town to any other town. Our budget for building tracks is small, so we choose to build as little track as possible. One approach to deciding which tracks to build is to construct a weighted graph, where the vertices are the towns, and the weight of the edge from town A to town B is the length of the railroad track that would need to be built to connect towns A and B directly. Then a minimal spanning tree of that graph would be a good choice for connecting the towns, since it is the cheapest way to connect all of the towns using railroad tracks between towns.

A similar problem occurs when a house is being wired for electricity. All of the outlets on a given circuit need to be connected to the circuit panel, and to each other. To connect them with a minimum amount of wire, you might build a graph having a vertex for each outlet and for the circuit panel, with the weight between two vertices being the wiring distance between those vertices. Then get a minimal spanning tree for the graph.

(A minimal spanning tree is not always the best solution to either of these problems. You can often do better by introducing new railroad track junctions or wiring junctions, called Steiner points. For example, if you have three towns in a triangle, you can connect them together by sending tracks to a point in the middle of the triangle. The middle point is a Steiner point. But good placement of Steiner points is difficult to find, so a minimal spanning tree is a reasonable compromise.)

The assignment

For this assignment, you will write a program that reads a description of a weighted graph, and prints the edges that are part of a minimal spanning tree of that graph, and also prints the total weight of the minimal spanning tree.

The input starts with a line that tells how many vertices the graph has. If there are five vertices, then those vertices have numbers 1, 2, 3, 4 and 5. In general, if there are n vertices, then they are numbered 1...n. Following the first line are the edges, one per line. Each edge line has three integers on it. Line

2 4 50

indicates that there is an edge between vertices 2 and 4, and that its weight is 50. The end of the input is signaled by a line with three 0's on it. An input describing graph

might look like this.

The output of your program for this input might look like this.


The input graph has 5 vertices, and its edges are as follows.

  vertices    weight
  1   2            9
  1   3           12
  2   4           18
  2   3            6
  2   5           20
  3   5           15
  
A minimal spanning tree uses the following edges.

  vertices    weight
  2   3            6
  1   2            9
  3   5           15
  2   4           18

The total weight of the spanning tree is 48.

Data structures

When designing a program, you need to decide how to represent your data. Typically, you draw diagrams showing arrays, structures, pointers, etc. to understand the data representation.

Once you have decided on a representation, you need to decide whether you want the data structure to be open or closed in your program.

Choosing an open data structure allows all parts of the program that use the data structure to see the exact representation, and to manipulate that representation directly. This has the advantage that you do not need to introduce any special functions to manipulate the data structure, and you have a lot of flexibility in how to use it. This approach has the disadvantage of making the form of data structure nearly impossible to change. The code is also typically less readable than if you use a closed approach. Also, you often end up writing the same code to manipulate the data in more than one place in your program.

When you choose a closed representation, you are creating an abstract data type. Choosing a closed data structure involves hiding the representation inside a class, and only allowing the data to be accessed through public functions of the class. This typically involves more initial work, but makes your program easier to modify and often makes it easier to write in the long run.

Representing a weighted graph

For this program I will have mercy on you and allow you to use an open representation for graphs. You will find this a little simpler.

Create a type Graph. You can store a graph as an array of edges, where each member of the array is a triple giving two vertex numbers and a weight. Of course, you must also store the number of edges and the number of vertices.

You may presume that there will be a maximum of 100 edges in the graph, but you must design your program so that it is very easy to change the maximum number of edges to some other number.

An abstract data type

An algorithm for computing a minimal spanning tree is described below. It builds up a spanning tree gradually. During the process, it needs to be able to ask whether two vertices are already connected by the edges that have been selected so far. Here, we describe a tool for keeping track of which vertices are connected.

The tool is an abstract data type for managing connectedness. Initially, a connection manager CM starts with no connections at all. That is, nothing is connected to anything else.

The connection manager allows you to place connections between certain vertices, and to ask if two vertices are connected. If you connect A to B, and then connect B to C, then A and C become connected automatically. It is not necessary to tell the connection manager that A and C are connected.

The tool must be implemented as a class. The operations are as follows.

A constructor with a single integer parameter n.: The constructor builds a connection manager that can handle up to n vertices, numbered 1,...,n.
CM.connect(m,n): This tells connection manager CM to make a connection between vertices m and n. Here, m and n are positive integers, up to the size that this connection manager can handle.
CM.connected(m,n): This returns true if m and n are connected, and false if they are not.

Implementation of the abstract data type

Here is how to implement these operations. The idea is to have a representative of each group of connected vertices. For example, if vertices 3, 5 and 8 are connected together, then they might decide that 5 represents all of them. Write a function representative(k) that returns the representative of vertex k. In the example, representative(3) would be 5, and representative(5) would also be 5. (Vertex 5 represents itself.) To find out whether two vertices are connected, just check whether they have the same representative.

To implement function representative, keep an array PseudoRep of integers. PseudoRep[k] is called the pseudo-representative of k. Ideally, the pseudo-representative of k is the representative of k. Sometimes, however, the pseudo-representative of a vertex is not its representative. Rather, the representative of k is the same as the representative of the pseudo-representative of k.

Suppose that r = Pseudo-Rep[k]. It might turn out that r is not a representative at all --- it does not represent itself. Instead, you must take its pseudo-representative. You continue doing this until you find a vertex that represents itself. That vertex is the true representative of the group. So to find the representative of k, you would do the following.

  r = k;
  while(PseudoRep[r] != r) r = PseudoRep[r];

Now r is the reprentative of k.

When you are told to make a connection between two vertices m and n, what you need to do is

Find the representative r of m and the representative s of n.
Force r to be the representative of s, by making PseudoRep[s] = r.

Now s no longer represents itself. All vertices that used to have s as their pseudo-representative now automatically have r as their representative, since the loop will not stop at s any more. It is very important that you only change the pseudo-representative of a number that is currently its own representative, or the method will not work.

Keep as much private in the class as possible. There should be no public data. Do not make a function public unless you need to, because it is one of the operations provided by the abstract data type.

Use the implementation discussed here. Do not invent your own. In particular, be sure than neither the connect nor the connected function performs a scan of every member of the PseudoRep array. That is not called for here, and it is less efficient than what is described here.

Computing minimal spanning trees

Here is a simple algorithm for computing a minimal spanning tree of a weighted graph.

If the array of edges of the graph is called G.edges, then sort the edges in G.edges by their weights, with smaller weights earlier in the array. (You can use function qsort, described below, to do this.)
Create a new connection manager. Initially, there are no connections, so every vertex represents itself.
Create an array that will end up holding the edges of the spanning tree. Initially, it has no edges in it, but you gradually add more edges. Also create a count, initially 0, of the total weight of the edges in the spanning tree array.
Scan through the array G.edges in increasing order of weight. (That is, do the following in a loop.) When you see edge (i,j), with weight w, ask the connection manager whether vertices i and j are already connected. If they are connected, then just skip this edge, and go to the next edge. If they are not connected then add edge (i,j) to the spanning tree, add w to the total weight of the tree, and tell the connection manager that i and j are connected. Then do the next edge.

When you are finished, you will have created an array holding the edges of a minimal spanning tree of the original graph, and you will have computed the total weight. Print them all.

Using qsort

There is a standard library function called qsort that implements a variant of the quicksort algorithm for sorting arrays. You should include header file stdlib.h to use qsort. This function is designed to be able to sort any kind of array into any desired order. In order to achieve this degree of generality, qsort needs information about the array and how to sort it.

The prototype for qsort is as follows.

     void qsort(void *base, 
                size_t nel, 
                size_t width,
                int (*compar) (const void *, const void *));

Parameter base is a pointer to the array that you want to sort. Notice that its type is void*. That just means that qsort knows that it is a pointer, but does not know what type of thing it points to.

Parameter nel is the number of elements in the array, and parameter width is the number of bytes occupied by each element. (Type size_t is typically the same as int, and is used for values that describe the size of an array.)

Parameter compar is a function. It is responsible for telling qsort the order in which you want the array sorted. qsort will sort into ascending order, according to what function compar says is ascending order. Function compar takes two parameters, which will be pointers to particular members of the array. compar(A,B) should return

-1 if A < B
0 if A = B
1 if A > B
according to the order in which you want your array to be sorted.

For example, suppose that you want to sort an array of long integers into descending order. You write a comparison function.

int compareLongInts(const long* A, const long* B)
{
  if(*A < *B) return 1;
  else if(*A > *B) return -1;
  else return 0;
}

Notice that compareLongInts returns 1 when A < B. This is because you want to sort into descending order, but qsort wants to sort into what it thinks is ascending order. So you tell qsort that 30 < 10. Also notice that the parameters to compareLongInts are pointers. qsort will call this function, and will pass it pointers to the members that qsort wants to compare. qsort will call the comparison function many times.

There is an obvious problem, however. The prototype for function compareLongInts says that its parameters have type const long*. But qsort wants to be passed a function whose parameters have type const void*. To fix this, you can us a cast. A cast is a way of telling the compiler to treat something as if it has a different type. Create a type QSORT_COMPARE_TYPE that you want to cast to, as follows.

   typedef int (*QSORT_COMPARE_TYPE)(const void*, const void*);

To sort an array Arr of n long integers, you use

   qsort((void*)Arr, n, sizeof(long), (QSORT_COMPARE_TYPE)compareLongInts);

We have passed qsort the array to sort, the number of elements, the number of bytes occupied by each element (given by sizeof(long), since each element has type long) and the comparison function, cast to the appropriate type.

Casts are, in general, very dangerous. They are a way of pulling the wool over the eyes of the compiler. This particular kind of cast is safe, and you can use it, but become sloppy with casts at your peril.

You will need to write a comparison function that is appropriate for sorting an array of edges into ascending order according to weight. Function compareLongInts is only an example, and cannot be used to compare edges.

Improving the efficiency

There is an easy way to greatly speed up your program for large numbers of edges. The improvement is made in the implementation of the connection manager abstract data type.

There are two modifications to do. One involves a change to the representative function, the other a change to the connect function and to the representation of the abstract data type.

IMPORTANT. These improvements are part of the assignment, and you will lose points for not doing them. But only make these improvements after everything is working correctly without them. Changing the abstract data type implementation should not involve changing anything else at all. Not a single character outside of the abstract data type should be changed. If you cannot do that, then you have done something wrong.

The first improvement

The first change involves a change to the representative function. After the representative function scans through a chain to find the representative of a vertex, it goes back through the chain and puts the genuine representatives in the PseudoRep array, for every vertex that was looked at in the chain. That way, subsequent representative computations will go much more quickly. For example, if the PseudoRep array contains

  PseudoRep[1] = 2
  PseudoRep[2] = 4
  PseudoRep[3] = 3
  PseudoRep[4] = 6
  PseudoRep[5] = 3
  PseudoRep[6] = 6
  PseudoRep[7] = 4
  PseudoRep[8] = 5

then computing representative(1) requires chaining through 1, 2, 4, 6. The improvement does a loop that changes the contents of the array to the following, by installing the correct representative (6) of each of those vertices. (It is just a matter of rescanning through the chain, the same way the chain was scanned the first time.)

  PseudoRep[1] = 6
  PseudoRep[2] = 6
  PseudoRep[3] = 3
  PseudoRep[4] = 6
  PseudoRep[5] = 3
  PseudoRep[6] = 6
  PseudoRep[7] = 4
  PseudoRep[8] = 5

Notice that we have not scanned the entire array from beginning to end! PseudoRep[8] is still 5, even though the representative of 8 is 3. Only the vertices that were looked at in the original scan have their PseudoRep values changed. If you try to change everything in the array, you make the program slower, not faster.

Also notice that it was not just the pseudo-representative of 1 that was changes. All of the vertices that were examined in the chain have their pseudo-representatives set to their actual representives. For example PseudoRep[2] was changed too.

The second improvement

Each vertex that represents itself has a collection of constituents who have it as a representative. For example, in the above array, vertex 3 represents itself, and the constituents of vertex 3 are 3, 5 and 8. So vertex 3 has three constituents, counting itself.

When making a connection, you find two values s and t that represent themselves. You can then either change the representative of s to t (so s no longer represents itself) or change the representative of t to s (so t no longer represents itself). The choice of which to do influences the efficiency of the implementation. The best choice is to change the representative of the vertex that has the fewest constituents.

Modify the data structure so that each vertex has not only a pseudo-representative, but also a count of its constituents. A vertex that represents itself has no constituents. So now the information is an array of structures, where each structure contains a pseudo-representative and a constituent count.

A picture of the initial array, before any connections have been made, might look like this.

  index   pseudoRep numConstituents
    1        1          1
    2        2          1
    3        3          1
    4        4          1
    5        5          1

When making a connection, compare the constituent counts. Change the pseudoRep of the value with the fewest constituents. (If they have the same number of constituents, then the choice is arbitrary. For example, change the first one.) If you change things so that the pseudoRep of s becomes t, then remember that all of the constituents of s become new constituents of t. For example, if you do connect(3,5) in the above array, you might arbitrarily decide to make the representative of 3 be 5. Then the array looks like this.

  index   pseudoRep numConstituents
    1        1          1
    2        2          1
    3        5          0
    4        4          1
    5        5          2

If you now do connect(5,1), you must change the representative of 1, since it has fewer constituents than 5. The array ends up looking like this.

  index   pseudoRep numConstituents
    1        5          0
    2        2          1
    3        5          0
    4        4          1
    5        5          3

As before, only change the representative of a vertex that currently represents itself. If you now do connect(2,1), you must realize that you are really being asked to connect 2 and 5, since 5 is the representative of 1. Since 5 has more constituents, you change 2, yielding

  index   pseudoRep numConstituents
    1        5          0
    2        5          0
    3        5          0
    4        4          1
    5        5          4

As you can see, this improvement tends to lead to shorter chains of pseudo-representatives before the true representative is found. It is not guaranteed to ensure that the pseudo-representative is always the representative, though. You still need to do the representative calculation using a loop.

Implementing the program

Here is a suggestion for how to implement this program. This is a refinement plan. It tells how to produce more and more of the program, gradually refining it until eventually it does the whole job. It is critical that a refinement plan have tests along the way, so that the part written so far is known to work. That way, if you get an error, it is almost always in the part of the program that was most recently written, and finding bugs is relatively easy.

Implement the connection manager ADT. Test it. Testing involves writing a test program that is not part of the finished product of the program. The extra work to write the tester is well worth it. Proceeding beyond this step with a faulty connection manager will cause huge problems later.
Implement the part of the program that reads in and prints out the input graph. Test it.
Implement the beginnings of the minimal spanning tree algorithm by sorting the edges and printing them after sorting. (It looks like printing a graph is something you might want to do in more than one place. What does that tell you?)
Implement the rest of the minimal spanning tree algorithm. Test it.
Make the program's output look nice, if necessary.
Implement the improvement to the connection manager. Test it.

Keep the program well commented throughout. Make sure every function has a clear contract. Do not plan to add in the comments after the program is finished.

When you do a test, if things don't work, then use the debugger or add some prints to see what is going on. Do not try to work in the dark. Before you try to fix the problem, understand what is wrong. After you have diagnosed the problem, fix it. Do not move on until everything that you have written so far is working.

What to turn in

Email me your solution, including the improvement to the connection manager. If you have not finished all of the steps in the refinement, turn in what you have done.

Attach all files that are part of your program. The connection manager must be in a separate file from the minimal spanning tree finder. Your program should be well documented, well indented and easy to read. Keep it that way during development. Do not try to work on a sloppy, poorly documented program, unless you have time to burn.

When you mail me your solution, clearly label the subject line

   3510 assn2 pass 1 your name

(or pass 2 for the second pass). Do not use such a subject for inquiries, since I might just file your message for grading later.

Asking questions

If you have questions, be sure to ask, and do so as soon as possible, so that you have time to deal with the response.

Computer Science 3510 Fall 1999 Programming Assignment 2