CSCI 3300
Spring 2007
Section 001
Programming Assignment 4

Assigned:	February 20
First version due:	March 20, 11:59pm
Second version due:	April 2, 11:59pm

Read the entire assignment before you start working on it.

Background
Requirements
Algorithmic issues 1: Dijkstra's algorithm
Design issues
The status type
Algorithmic issues 2: Priority queues
A plan
Submitting your work

Background

This assignment involves weighted graphs, as described in assignment 3. For example, the following is a weighted graph.

Two vertices are said to be adjacent if there is an edge that connects them directly to one another. In the graph above, vertices 1 and 2 are adjacent, but 1 and 4 are not adjacent.

Think of the vertices of a graph as towns and the edges as roads. The weight of an edge is the length of the road. One thing that you might like to know is how to get from one town to another by the shortest possible route. For example, in the weighted graph above, the shortest route from vertex 1 to vertex 5 is to go from 1 to 3 and then from 3 to 5, and the length of that route is 27. You add the weights of the edges that you use.

Requirements

Write a program that reads a weighted graph from the standard input, in the same format as for programming assignment 3. For this assignment, however, the edge weights are real numbers. After the description of the graph will be two vertex numbers, s and t. For example, the input might look like this.

The sample input says that there are five vertices. There is an edge from vertex 1 to vertex 2, with weight 9.0, etc. The start vertex s is 1, and the end vertex t is 5.

Your program should print the graph, followed by the shortest path from s to t and the distance from s to t via that path, on the standard output. The example input might lead to the following output.

There are 5 vertices.
The edges are as follows.

 (1,3) weight 12.000
 (1,2) weight 9.000
 (2,5) weight 20.000
 (2,3) weight 6.000
 (2,4) weight 18.000
 (3,5) weight 15.000

The shortest path from 1 to 5 has length 27.000 and is
1 -> 3 -> 5

Please make the last two lines of your output have the form shown above, where the last line shows the path from s to t, with -> between vertices, and the line before that has the form shown. My tester will look for that.

Algorithmic Issues 1: Dijkstra's Algorithm

There is an algorithm called Dijkstra's algorithm that you can use to find shortest paths. This section describes how that algorithm works.

Information that the algorithm keeps as it runs

You will need to store some information for each vertex in the graph.

Each vertex has a status value (one of Done, Pending or Peripheral). A vertex with status Done is called a done vertex, a vertex with status Pending is called a pending vertex, and a vertex with status Peripheral is called a peripheral vertex.
Each vertex has a current distance from the start vertex. For convenience I will write dist(v) in this section for the current distance value stored for vertex v. (Do not presume that this is C++ notation.. You might write g.info[v].distance in your program, or something else, depending on how you define the Graph type. You will rarely find that an algorithm is described in exactly the terms that you will use for a particular implementation of that algorithm.)
The algorithm builds up the shortest path backwards, indicating, for each vertex v, how to get from v back to the start. Each vertex has a current next-vertex, which is the next vertex to go to on the way back to the start. For convenience I will write next(v) for the current next-vertex stored with vertex v. Again, this is not intended to be the way you will write this in your C++ program.
Each vertex has a linked list of edge descriptions, telling the vertices that are adjacent to this vertex, and the weight of the edge that connects them. So each cell in the linked list holds another vertex number and a weight. (This list is called the adjacency list of this vertex; it is described in more detail in the design section below.)

A done vertex v has been finished, and dist(v) is the correct shortest distance from v back to the start, s. The shortest path from v to s begins by going to next(v). It continues to vertex next(next(v)), then to next(next(next(v))), etc. until it reaches s.

A pending vertex v is adjacent to at least one done vertex, but is not done. When v is pending, dist(v) is the shortest distance from v to s, provided there is a restriction that the path taken from v to s must begin by going from v to a done vertex. (There might be shorter paths that begin with an edge from v to a vertex that is not yet done.) When v is pending, the currently best known way to get from v to s is to start by going to next(v), and continuing by the shortest path to s from there.

A peripheral vertex is one that is neither done nor pending. Its current distance and next-vertex values are meaningless. There is no edge between any done vertex and any peripheral vertex.

Starting up

The algorithm begins by marking the start vertex s as done and setting dist(s) to 0. Each vertex v that is adjacent to s is marked pending, with dist(v) set to the weight of the edge between v and s, and next(v) set to s. All other vertices are marked peripheral.

The main computation

After initializing, the algorithm repeatedly performs phases, until vertex t (the target vertex) is done, or there are no more pending vertices. Each phase works as follows.

One phase

(Find step) Find a pending vertex u such that dist(u) is as small as possible. Write a function that performs this step by scanning through all of the vertices, finding the pending vertex that has the smallest distance of all pending vertices.

Set the status of vertex u to Done.

(Update step) Update the status, distance and next vertex of every vertex v that is adjacent to u. Those vertices are found by scanning the adjacency list of u. The following cases tell how to update the information for a vertex v that is adjacent to u, according to the status of v.

If v is done, then do not update its information. It is finished, and all of the information is already correct.

If v is pending, then it remains pending, but it might be necessary to update dist(v) and next(v). An update is done if the path that goes from v to u and then from u to s is shorter than dist(v). For example, suppose dist(v) is currently 12. Suppose that the new done vertex u has dist(u) = 7, and that the edge between u and v has weight 3. Then it is better to go from v to s via u (total distance 3 + 7 = 10) than to take the previous path of distance 12. So you update dist(v) = 10 and next(v) =u. Here is pseudo-code for this. Suppose that w is the weight of the edge between u and v.
newdist = dist(u) + w if newdist < dist(v) then next(v) = u dist(v) = newdist end if

If v is peripheral, then do the following to v. Suppose that the edge between u and v has weight w.
status(v) = Pending dist(v) = dist(u) + w next(v) = u
Write a function that performs this update step.

One phase
(Find step) Find a pending vertex u such that dist(u) is as small as possible. Write a function that performs this step by scanning through all of the vertices, finding the pending vertex that has the smallest distance of all pending vertices. Set the status of vertex u to *Done. (Update step) Update the status, distance* and next vertex of every vertex v that is adjacent to u. Those vertices are found by scanning the adjacency list of u. The following cases tell how to update the information for a vertex v that is adjacent to u, according to the status of v. If v is done, then do not update its information. It is finished, and all of the information is already correct. If v is pending, then it remains pending, but it might be necessary to update dist(v) and next(v). An update is done if the path that goes from v to u and then from u to s is shorter than dist(v). For example, suppose dist(v) is currently 12. Suppose that the new done vertex u has dist(u) = 7, and that the edge between u and v has weight 3. Then it is better to go from v to s via u (total distance 3 + 7 = 10) than to take the previous path of distance 12. So you update dist(v) = 10 and next(v) =u. Here is pseudo-code for this. Suppose that w is the weight of the edge between u and v. newdist = dist(u) + w if newdist < dist(v) then next(v) = u dist(v) = newdist end if If v is peripheral, then do the following to v. Suppose that the edge between u and v has weight w. status(v) = Pending dist(v) = dist(u) + w next(v) = u Write a function that performs this update step.

Getting the final answer

When the algorithm is finished, you will be able to follow a path from t back to s by following the chain of next vertices.

     t -----> next(t) -------> next(next(t)) ----> ... -----> s

Print that chain out backwards, so that it goes from s to t instead of from t to s. The easiest way to do that is to use recursion. To print a path backwards, starting at vertex u, print the path starting at next(u) backwards, then print u. Of course, in the special case where u is the start vertex s, just print s. Remember to put -> between vertex numbers.

Design Issues

Graphs

Create a module consisting of two files, graph.h and graph.cpp, that is responsible for graphs and Dijkstra's algorithm.

You will need a method of storing a graph. Store the graph as a structure, containing a pointer to an array that has an index for each vertex. Also store the size of the array in the structure.

With each vertex, store all of the information required by Dijkstra's algorithm, including a linked list of the vertices that are adjacent to this vertex, each with the weight of the edge. For example, the graph shown above would be represented as follows. The upper number in each list cell is a vertex number, and the lower number is the edge weight. (Remember that the weight is, in general, a real number.) The ... parts of the array are the other information (status, distance, next).

Provide functions to do the following tasks.

Read a graph, in the format described under the requirements. Build up the adjacency lists as you read the graph. An edge from u to v needs to be put into two lists, the one for u and the one for v.
Print a graph. You only want to show each edge once. Scan each adjacency list, but only print an edge when it goes to a higher numbered vertex, to avoid printing it twice.
Run Dijkstra's algorithm, filling in the extra information. Break this down into smaller, well-thought-out functions, with a clear contract for each one. Choose functions that make the contracts easy to explain.
Print the path from s to t that was found by Dijkstra's algorithm.

Main program

Create a separate module, main.cpp, that holds the main program. The main program should (by calling functions) read the graph, print the graph, read s and t, run Dijkstra's algorithm, and print the results from Dijkstra's algorithm. It should be short and simple.

The Status Type

You can use an enumerated type to create the type of status values. The following line defines a new type StatusType, and creates constants Done, Pending and Peripheral, all of type StatusType.

  enum StatusType {Done, Pending, Peripheral};

In general, an enumerated type is a type with a fixed, finite set of members that are listed in the type definition. Line

  enum T {a,b,c,d};

creates a new type T that has four members, called a, b, c and d.

Algorithmic Issues 2: Priority Queues

The find step of Dijkstra's algorithm needs to find a pending vertex that has the smallest possible distance. One way to do that is to look at every vertex. But that can be expensive when there are many vertices, and, for large graphs, that is the most expensive part of the algorithm. An alternative is to put the pending vertices into a specialized data structure that is good for locating smallest things. Such a data structure is called a priority queue.

A priority queue holds a collection of items, each having a priority attached to it. (So the members of a priority queue are structures.) For our purposes, each item is an integer, and each priority is a real number. When a priority queue is created, it is empty. That is, it has no items in it.

The following operations are available for a priority queue q.

insert(q, v, p): Insert a item v with priority p into priority queue q.
remove(q, v, p): Remove the item from the queue that has the lowest priority, and set variable v to that item, and p to its priority. The parameters of remove must be reference parameters. Logically, v and p are out-parameters, since they represent information being sent from the remove function to its caller, not the other way around. If the queue is empty, remove should set v = 0 and p = 0.
update(q, v, p): Replace the priority of value v in the queue by p. This function assumes that v is already in the priority queue.
isEmpty(q): Return true if q is empty.

You can use a priority queue to keep track of the pending vertices. When a vertex is marked pending for the first time, insert it into the queue, with its current distance being its priority. When the distance of a pending vertex is updated, also update it in the queue. To find the pending vertex with the smallest distance, use the remove function to get it from the priority queue. The result is a much faster program, for large graphs.

Implementing priority queues

There are some clever and very efficient ways of implementing priority queues, and that is part of the reason for using them. However, we will not try to use any of them here. Instead, we will adopt a simple approach that is less efficient. But later on you could replace this simple implementation with a more efficient one. It is critical that your implementation of graph.cpp does not use any details of the priority queue implementation, so that pqueue.cpp and pqueue.h can be replaced later by a better implementation without making any changes to graph.cpp or graph.h.

Represent a priority queue as a linked list, storing an item and its priority in each list cell. Keep the list in ascending (or, really, nondescending) order of priority. So the item with the smallest priority is at the front of the list.

The remove function is very easy, since the item to remove is at the front of the linked list.

You will find the insert function very easy to implement if you use recursion. The new item either belongs at the beginning of the list or it doesn't. If it belongs at the beginning of the list, just create a new list cell and install it at the beginning. If the new item does not belong at the beginning, a recursive call to insert will finish the job.

You will want an auxiliary function to remove a given item from the list. This function is not to be used by any other module, since it is not an official part of the priority queue implementation. (A more efficient implementation of priority queues might not be able to handle removal of an arbitrary member, so you should other modules should not presume that such a function is available.) Again, recursion makes this very easy.

To do an update, just remove the item and re-insert it with a new priority. Do not try to be excessively efficient. Remember that, if you find that you need efficiency in this module, you will want to throw away this entire way of implementing it, and use a more efficient data structure, so don't waste your time trying to make this one really efficient. (But don't be extravagant either, and do things in wildly inefficient ways. If a priority queue in your implementation has n things in it, then the cost of doing an operation on it be no worse than proportional to n, not to n². More efficient algorithms drop the time to log₂(n).)

Create files pqueue.h and pqueue.cpp that together define the priority queue module. Only describe in pqueue.h what you need to for other modules to be able to use priority queues. Treat your priority queue module as an abstract data type.

A Plan

Here is a suggested plan for implementing this program.

Create all of the files. Define the types in the appriate header files. Add contracts and prototypes for the functions. Make sure that each contract is clear and concise, and tells what the function does, not how it works. Check that the purpose of each parameter is clear. Check that each heading is consistent with the contracts, and that things make sense. For example, if you have a function that reads a graph, but it has no way of telling its caller what the graph is, then something is wrong. If you have a function that prints a graph, but it has no way of knowing which graph to print, something is wrong. Be sure that each function gets the information that it needs. But do not send it any information that it does not need.
Implement the function that reads a graph, and the function that prints a graph. Test them, by reading and printing a graph. Store the edge information in the adjacency lists, and not by any other means. If you store redundant information about edges, then you will have to be sure that it is stored consistently, and you will have to test things very carefully to ensure consistency. It is much easier to store the information just once.
Implement Dijsktra's algorithm, using the simple implementation of the find step. Test it. Just print the information in the array in a raw form to see that it is right rather than trying to print the path in the final form. Your main function should be growing in the direction of the final version.
Implement the function that prints the path. Test it. Your main program should have reached its final form now.
Implement the insert and remove functions in the priority queue module. Add a function that prints the contents of the list. Create a new main function, in a different (test) module such as testq.cpp, that is only used to test the priority queue module. Do not modify main.cpp, since it is already correct.

You will probably find it useful to add debug prints. That can be awkward for a recursive function. The way to do that is to use a wrapper function that does the debug prints, and that uses another function to do the work. Here is an example wrapper function for insert. It presumes that the real insert function (that performs the insertion) is called doInsert, and that it takes, as a parameter, a pointer to a linked list. This is intended to be a sketch.
```
  void insert(PQueue& q, int v, double p)
  {
#   ifdef DEBUG
      cout << "Inserting item " << v << " with priority " << p << "\n";
      cout << "Current queue: ";
      printQueueList(q.ptr);
      cout << "\n";
#   endif

    doInsert(q.ptr, v, p);

#   ifdef DEBUG
      cout << "Done inserting: ";
      printQueueList(q.ptr);
      cout << "\n";
#   endif
  }
```
Implement the remaining priority queue functions. Test them.
Modify your implementation of Dijkstra's algorithm to use a priority queue. Only put pending vertices in the priority queue. You will need to modify the functions that perform the find and update steps. The find step removes a vertex from the priority queue. The update step performs insertions and updates in the priority queue, to keep the information about pending vertices stored there correct.
After you are confident that your program works, submit it.

Submitting Your Work

To turn in this program, log into one of the Unix computers in the lab. (You can log in remotely.) Ensure that your files are there. Change to the directory that contains those files. Then issue one of the following commands.

First version: ~karl/3300/bin/submit 4a graph.cpp graph.h pqueue.cpp pqueue.h main.cpp
Second version: ~karl/3300/bin/submit 4b graph.cpp graph.h pqueue.cpp pqueue.h main.cpp

CSCI 3300 Spring 2007 Section 001 Programming Assignment 4

Table of contents