Computer Science 3300
Spring 2006
Section 001
Programming Assignment 3

Assigned: February 14
Design due: February 20, 11:59pm
First version due: March 1, 11:59pm
Second version due: March 21, 11:59pm


Read the entire assignment before starting on it.

Table of contents

  1. Background
  2. Requirements
  3. Algorithmic issues
  4. Design issues
  5. Implementation issues
  6. An abstract data type
  7. A plan
  8. Submitting your work
  9. Extra credit
  10. Why should I believe that Kruskal's algorithm works?
  11. Motivation for computing minimal spanning trees (you can skip this)


Background

A graph is a collection of vertices connected by a collection of edges. An edge connects exactly two vertices to one another. You can draw a picture of a graph by showing the vertices and the edges connecting them. Here is an example. The vertices are shown as circles with numbers in them and the edges are lines connecting the vertices.

We will only be concerned with connected graphs. A graph is connected if there is a path from every vertex to every other vertex, following edges. For example, in the above graph, there is a path from 3 to 4 that goes (3, 1, 2, 4). There are other paths from 3 to 4 as well.

A weighted graph is a graph in which each edge has a number attached to it, called the weight of the edge. Here is a picture of a weighted graph.

A spanning tree of a graph is obtained by deleting as many edges as possible, without making the graph so that it is no longer connected. That is, we still need to have a path from each vertex to each other vertex. For example, the following is a spanning tree of the above weighted graph.

The weight of a spanning tree is the sum of the weights of its edges. For example, the weight of the above spanning tree is 59. Here is another spanning tree for the same graph. Its weight is 48.

Obviously, some spanning trees have smaller weight than others. A minimal spanning tree is a spanning tree with the smallest possible weight.


Requirements

Write a program that reads a description of a weighted graph, and prints the edges that are part of a minimal spanning tree of that graph, and also prints the total weight of the minimal spanning tree. The input should be read from the standard input (cin), and the output should be written to the standard output (cout).

Input format

The input starts with a line that tells how many vertices the graph has. If there are five vertices, then those vertices have numbers 1, 2, 3, 4 and 5. In general, if there are n vertices, then they are numbered 1, ..., n.

Following the first line are the edges, one per line. Each edge line has three integers on it. Line

2 4 50
indicates that there is an edge between vertices 2 and 4, and that its weight is 50. The end of the input is signaled by a line that contains just a 0. An input describing graph

might look like this.

5
1 2  9
1 3 12
2 4 18
2 3  6
2 5 20
3 5 15
0

You can assume that there are no more than 100 edges, but that number must be easy to change. To increase this to 200, for example, should only require changing one line of your program.

Output format

The output of your program for this input might look like this.


The input graph has 5 vertices, and its edges are as follows.

  vertices    weight
  1   2            9
  1   3           12
  2   4           18
  2   3            6
  2   5           20
  3   5           15
  
A minimal spanning tree uses the following edges.

  vertices    weight
  2   3            6
  1   2            9
  3   5           15
  2   4           18

The total weight of the spanning tree is 48.

Remarks on input format

Your program is not required to insist that the three numbers that describe an edge are on one line, but it can if it wants to. For example, if your program allows an input like the following, that is acceptable. But your program is not required to allow such inputs. (You will find it easier to allow these additional input forms.)

5
1 2  
9
1 
3 12
2 4 18  2 3  6
2 5 20
3 5 15
0
The only difference between this and the previous input is the placement of line breaks.

Important note. The first number in the input is the number of vertices, not the number of edges. The only way you can determine the number of edges is by reading them and counting them.

Important note. The last line has only one number on it, not three.


Algorithmic Issues

For this problem, the algorithm is sufficiently difficult that it is a good idea to think about the algorithm before proceeding with the design.

Kruskal's algorithm

There is a well-known algorithm, called Kruskal's algorithm, for computing a minimal spanning tree of a weighted graph. It goes as follows.

  1. Sort the edges from small weight to larger weight. Call the sorted list of edges e1, ..., em. So, for example, e1 is the edge with the smallest weight.

  2. Start with an empty graph K. It has all of the vertices, but no edges yet. Edges will be added to it until, at the end, it is the minimal spanning tree.

  3. For i = 1, ..., m do

    1. Look at edge ei. Suppose that it connects vertices u and v.
    2. If there is not already a path in K between vertices u and v, then add edge ei to K. Otherwise, do not add it to K.

See below for an argument that Kruskals algorithm works.

Determining connections

A tricky issue with Kruskal's algorithm is how to determine whether two vertices are already connected by a path. Suppose we write u ~ v is there is a path between u and v. You have probably seen the notion of an equivalence relation (in CSCI 2427). Notice that ~ is an equivalence relation. That is,

  1. (~ is reflexive) u ~ u. That is, you can always get from u to itself (by taking no edges).

  2. (~ is symmetric) If u ~ v then v ~ u. That is, if there is a path from u to v, then there is also a path from v to u.

  3. (~ is transitive) If u ~ v and v ~ w then u ~ w. If there is a path from u to v and also a path from v to w, then there is a path from u to w.

An equivalence relation is always characterized by its equivalence classes. That is, at any point in time, you can characterize which vertices are connected to which others by writing a collection of sets of connected vertices. For example, suppose that there are 6 vertices. Sets

{1,3,4}, {2,5}, {6}
indicate that vertices 1, 3 and 4 are connected to one another, vertices 2 and 5 are connected to one another, and vertex 6 is not connected to any other vertex. Only vertices that are in the same set are connected to one another.

Look at what Kruskal's algorithm needs to do with this collection of sets.

  1. It needs to be able to ask whether two vertices are in the same set. That is, are they already connected by a path?

  2. It can add an edge, which causes two of the sets to be combined. For example, if it starts with

    {1,3,4}, {2,5}, {6}
    and adds an edge from 3 to 5, then sets {1,3,4} and {2,5} get combined, yielding a new collection of sets
    {1,2,3,4,5}, {6}

If we can figure out a way to handle a collection of sets, with an operation to test whether two numbers are in the same set, and another operation that combines two of the sets in to one, then we will be able to implement Kruskal's algorithm.

An algorithm for of this part of the program is discussed below.


Design Issues

Now, with the basics of the algorithms outlined, we can design the program. This program should have three modules.

  1. Module 1 implements the equivalence handling. It should consist of two files, equiv.h and equiv.cpp. In equiv.h, define a structure type Equiv, holding just one variable, a pointer to an array of integers. See the description of the equivalence manager algorithm for the intent of the array. Write contracts and prototypes for two functions, together(e,x,y) and combine(e,x,y). The idea is that together(e,x,y) returns true if x and y are in the same set, as described by equivalence relation e, and combine(e,x,y) modifies e so that the sets containing x and y are combined in to a single set. The headings are as follows.

       bool together(Equiv& e, int x, int y);
       void combine(Equiv& e, int x, int y);
    
    File equiv.cpp contains implementations of functions, including helper functions. Every function must have a contract.

  2. Module 2 implements Kruskal's algorithm. Keep it simple. In file graph.h, create a structure type Graph, where a graph stores

    You will also want a structure type that represents one edge. Create the following functions, and add prototypes and contracts for them to graph.h. Put the implementations of these functions into graph.cpp.

    1. A function that reads information about a graph, in the format shown in the requirements section, and that returns a graph holding the information that it read.

    2. A function to print a graph, by telling how many vertices it has, how many edges it has, and then showing the edges. It should not print any specific heading. For example, it should not say that it is printing the input graph. Just provide information about the graph.

    3. A function to compute the total weight of a graph, by adding up the weights of the edges.

    4. A function that takes a graph G as a parameter, and returns another graph that is a minimal spanning tree of G. This function contains an implementation of Kruskal's algorithm.

    Note that graph.cpp will need to include equiv.h, since it needs the equivalence manager as a tool.

  3. Module 3 contains only a main program. Call it main.cpp. It should include graph.h, but not equiv.h. The main program should be short. It just does the following steps.

    1. Read a graph. This should be one line of code, since there is already a function that does this job.
    2. Print the graph (two lines of code, one for a heading saying that this is the original graph, and another line to call the graph printer).
    3. Compute the minimal spanning tree of the graph (one line of code).
    4. Print the minimal spanning tree of the graph (two lines of code, since you need to say what is being printed).
    5. Print the weight of the minimal spanning tree (one line of code).


Implementation Issues

Edges and vertices

Do not confuse edges with vertices. Keep track of the number of vertices in a graph separately from the number of edges.

The vertices are numbered 1, ..., n. You will need to have an array of edges. It is a good idea to start numbering the edges from 0, not from 1. So put the first edge that you read at index 0 in the array.

Sorting the array of edges

You will need to sort the array of edges according to the weights of the edges. There is a function in the library, called qsort, that will sort an array. It implements a variant of the Quicksort algorithm. You should include header file <cstdlib> to use qsort.

Qsort is a general sorting function, designed to be able to sort any type of array into any desired order. In order to achieve this degree of generality, qsort needs information about the array and how to sort it. It also needs for you to perform some conversions that, in general, would be questionable, but that work correctly here.

Qsort takes four parameters. In general, you write

   qsort(base, numEl, elSize, compar);
where