Computer Science 2530
Fall 2019
Programming Assignment 4

Assigned: Thursday, October 3
Due: Wednesday, October 16, 11:59pm
Points: 350

Table of Contents

  1. Purpose of this assignment
  2. What you will submit
  3. Background
  4. The assignment
  5. An algorithm for implementing an equivalence manager
  6. Additional requirements
  7. A refinement plan
  8. Compiling and testing your program on xlogin
  9. Issues to be aware of
  10. Submitting your work
  11. Late submissions
  12. Asking questions
  13. Extra credit: improving the equivalence manager.

Purpose of This Assignment

This assignment exercises your knowledge of arrays and functions. It also gives you experience implementing a module that is not an application but is intended to be part of a larger application.

Initially, this will look difficult. In reality, it is very short and simple. You need to write six short functions, plus documentation. Strive to keep it simple and follow the instructions.


What You Will Submit

You will need to submit two files, equiv.cpp and equiv.h.

Read the entire assignment except for the extra credit part before you start working on it. Be sure to follow the instructions.


Background

An equivalence relation ≡ is a relation with the following properties for all x, y and z.

  1. ≡ is reflexive: xx.

  2. ≡ is symmetric: If xy then yx.

  3. ≡ is transitive: If xy and yz then xz.

An equivalence relation on a set always partitions the set into equivalence classes. Each value belongs to exactly one equivalence class. Two values x and y are equivalent if and only if they belong to the same equivalence class.

For example, suppose that ≡ is a relation on set {1, 2, 3, 4, 5, 6} with equivalence classes {1, 3, 4}, {2, 6}, {5}. Then 1 ≡ 4 and 2 ≡ 6 but 2 is not equivalent to 3.


The Assignment

Your goal for this assignment is to create a module that manages an equivalence relation that can be changed in a specific way by the program while the program runs. The assignment will involve creating two files, equiv.h and equiv.cpp.

For this assignment, the equivalence relation is always over a set of integers {1, 2, 3, …, n} for some positive integer n.

This module is not a complete program. It is intended to be part of a larger program. File equiv.cpp must not contain a main function.

Equiv.h will contain function prototypes, but it must not contain any full function definitions. (There must be no function bodies.) Equiv.cpp must contain line

  #include "equiv.h"
before any function definitions.

Interface

The interface tells exactly what this module provides for other modules to use. Other modules must not use anything that is not described here. Briefly, the interface includes a type, ER, which is the type of an equivalence relation, and the following functions.

  ER   newER(const int n);
  void destroyER(ER R);
  bool equivalent(ER R, const int x, const int y);
  void combine(ER R, const int x, const int y);

Additionally, there is one function that is solely for debugging.

  void showER(const ER R, const int n);

There is one more function that is part of the implementation but not part of the interface. You can use it for debugging, though.

  int  leader(ER R, const int x);

A module that uses this tool can create an equivalence relation called R by saying

  ER R = newER(n);
where n is a positive integer. Initially, each number is in its own equivalence class; the initial equivalence classes are {1}, {2}, …, {n}. There are two operations that can be performed on an equivalence relation.

  1. equivalent(R, x, y) returns a boolean value: true if x and y are currently in the same equivalence class in equivalence relation R, and false otherwise.

  2. combine(R, x, y) modifies equivalence relation R by making x and y equivalent. It combines the equivalence class that contains x with the equivalence class that contains y. The combine function does not return an answer.

Example

For example, suppose that n = 7. The following shows a sequence of operations and shows the equivalence classes after each combine operation.

Step Result
ER R = newER(7) R = {1} {2} {3} {4} {5} {6} {7}
combine(R, 1, 5) R = {1, 5} {2} {3} {4} {6} {7}
combine(R, 2, 7) R = {1, 5} {2, 7} {3} {4} {6}
equivalent(R, 1, 5) yields true
equivalent(R, 1, 7) yields false
combine(R, 5, 7) R = {1, 2, 5, 7} {3} {4} {6}
equivalent(R, 2, 5) yields true
equivalent(R, 2, 3) yields false yields false
combine(R, 2, 3) R = {1, 2, 3, 5, 7} {4} {6}
equivalent(R, 3, 7) yields true
equivalent(R, 4, 7) yields false
combine(R, 4, 6) R = {1, 2, 3, 5, 7} {4, 6}
combine(R, 2, 3) R = {1, 2, 3, 5, 7} {4, 6}

As you can see from the last step, it is allowed to combine two values that are already equivalent. That should cause no change.


An Algorithm for Managing an Equivalence Relation

You will not store the equivalence classes directly. Instead, you will store them implicitly, using the following ideas. You are required to implement an equivalence relation manager in this way. You will receive no credit for a module that does not follow this algorithm.

  1. Each equivalence class has a leader, which is one of the members of that equivalence class. You will create a function leader(R, x) that returns the current leader of the equivalence class that contains x in equivalence relation R.

    Two values are equivalent if they have the same leader.

  2. There is another idea that is similar to a leader, but not exactly the same. Each value has a boss, which is a value in its equivalence class. For the purposes of describing the idea, let's write boss[x] for x's boss.

    1. If x is the leader of its equivalence class then boss[x] = 0, indicating that x has no boss.

    2. If x is not the leader of its equivalence class then boss[x] ≠ 0 and boss[x] is closer to the leader, in the following sense. If you look at the values x, boss[x], boss[boss[x]], boss[boss[boss[x]]], … then you will eventually encounter x's leader (just before you encounter 0).

Details on the algorithm

Use an array to store the bosses. Declaration

  typedef int* ER;
defines type ER to be the same as int*. Put that line in equiv.h and not in equiv.cpp.

Additional details are given in the refinement plan.


Additional Requirements

It is important for you to follow the algorithms and design described here. Do not make up your own algorithm. Implement exactly the functions that are indicated. Keep the parameter order as shown here. If you change the parameter order, your module will not compile correctly with my tester. Do not add extra responsibilities to functions.

The definition of ER must only be in equiv.h. Do not duplicate that definition in equiv.cpp.


A Refinement Plan

Create a directory to hold assignment 4. Download

into that directory.

Development plan
1. Create a file called equiv.cpp.

Copy and paste the module-template into it. Add your name and the assignment number. The file is equiv.cpp. If you will use tabs, say how far apart the tab stops are. Add line

#include "equiv.h"


2. Write a comment telling what this module will provide when it is finished.

Equiv.cpp is not an application. It just provides a tool. Say that it is an equivalence relation manager and give an outline of the interface.


3. Create a file called equiv.h.

Copy the following into equiv.h, then edit it to add your name.

// CSCI 2530
// Assignment: 3
// Author:     ***
// File:       equiv.h
// Tab stops:  none

// These #ifndef and #define lines make it
// so that, if this file is read more than 
// once by the compiler, its body is skipped
// on all but the first time it is read.

#ifndef EQUIV_H
#define EQUIV_H

// An equivalence relation is an array of integers.
// So ER abbreviates int*.  

typedef int* ER;

// Public function prototypes

ER   newER      (const int n);
void destroyER  (ER R);
bool equivalent (ER R, const int x, const int y);
void combine    (ER R, const int x, const int y);

// The following is advertised here solely
// for debugging.  These must only be used
// for debugging.

void showER(const ER R, const int n);
int  leader(ER R, const int x);

#endif

Note. In the types of equivalent and leader, parameter R is not marked const, even though it seems to make sense to do that. The reason is that improvements that can be done for extra credit need to make changes to R, even in equivalent and leader.


4. In equiv.cpp, write a heading and contract, then fill in the body, of the newER function.

In physical terms, newER(n) returns an equivalence relation as an array of n+1 integers. (It needs n+1 integers because it includes index 0, which will not be used.) Allocate the array in the heap and initialize all of its variables to hold 0. The array will be used to store the bosses. If R has type ER then R[x] is x's boss.

In logical terms, newER(n) returns an equivalence relation that can handle set {1, 2, …, n}. Say that in the contract. Don't say that it returns an array. Where possible, express things in conceptual or logical terms rather than in physical terms.


5. In equiv.cpp, write a contract, then an implementation, of the showER function.

ShowER(R, n) prints the entire contents indices 1, …, n of array R on the standard output in a readable form for debugging. Be sure that showER shows both k and k's boss, for each k.

Don't try to be too fancy here. Do not try to show the equivalence classes. ShowER is a debugging tool. Make it show a table of x and boss[x] for x from 1 to n.


6. Create file test1.cpp for partial testing of equiv.cpp.

Add a main function to test1.cpp and make main create a new ER object (using newER) and use showER to show what it looks like. Test1.cpp should contain

#include "equiv.h"
to allow it to use what is described in equiv.h.

Compile test1.cpp and equiv.cpp together as follows.

  g++ -Wall -o test1 test1.cpp equiv.cpp
Then run test1 by
  ./test1

7. In equiv.cpp, write a heading and contract, then an implementation, of the leader function.

Leader(R, x) returns the leader of x in equivalence relation R. To compute x's leader, follow the bosses up to the leader. Here is a sketch of a loop that finds the leader of x.

  y = x
  while(boss[y] != 0)
    y = boss[y]
  return y
You can use a loop or recursion for the leader function. Any function that wants to compute a leader must use leader to do it.

Modify test1.cpp so that it tests leader by showing showing the leader of each value in the ER object that it creates. Note that, at this point, each number will be its own leader. Run test1.cpp. Does it work?


8. In equiv.cpp, write a heading and contract, then an implementation, of the combine function.

Combine(R, x, y) combines the equivalence classes of x and y in R as follows. First, it finds the leaders x′ and y′ of x and y. If x′ and y′ are different (so x and y are not already in the same equivalence class), then y′ becomes the new boss of x′ and y′ becomes the leader of the combined equivalence class.

Modify test1.cpp by making it combine just a few values, then show what the equivalence relation looks like using showER. Does it look right?

Important Note.

It is crucial for your combine function never to change the boss of a nonleader. If you are not sure that x is a leader, do not change R[x].

Pay attention to this! In the past, many students have ignored this requirement. Needless to say, their modules did not work and their scores were low.


9. In equiv.cpp, write a contract, then an implementation, of the equivalent function.

Equivalent(R, x, y) returns true if x and y have the same leader in R. That is not the same as saying that they have the same boss.

Now you have enough to use the automated tester. Run it. If there are errors, fix them. You can read testequiv.cpp to see what it is doing, but only change equiv.cpp to fix errors; changing the tester will not help since I will not use your modified tester when I grade your submission.


10. In equiv.cpp, write a contract, then an implementation, of the destroyER function.

DestroyER(R) deallocates array R.


11. Check your program.

Proofread your contracts. Look for spelling and grammatical errors. Ensure that you have paid attention to the issues to be aware of.


12. Submit your work.


Compiling and Testing Your Program on Xlogin

Use the following commands to compile and run your program.

make equiv.o
Compile equiv.cpp if it needs compiling.

make testequiv.o
Compile testequiv.cpp if it needs compiling.

make testequiv
Compile equiv.cpp (if necessary) and testequiv.cpp (if necessary) and link them, yielding executable file testequiv.

make test
Do make testequiv and then run an automated tester.

Note. To write the output from the tester into file testout.txt, use command

  make test > testout.txt
Then you can look at the test results using a text editor.

make debug
Do make testequiv then run the tester via the gdb debugger.

make clean
Remove all machine-generated files.

Issues to Be Aware of

As always, your module must follow the coding standards. Pay attention to the following issues.

  1. An equivalence relation has type ER. Any time you need to write the type of an equivalence relation, write ER, not int*.

  2. Be cautious not to use a nonexistent index of an array. If array A has size n, then A[n] does not exist. Pay attention to this. In the past, it was a very common source of serious mistakes.

  3. Be cautious not to change the boss of a nonleader. This has also been a source of mistakes in prior terms.

  4. Every function is required to have a clear, concise and precise contract. Pay attention to this. In the past, students have lost a lot of points because they did not put enough care into writing clear, understandable contracts.

  5. Put a blank line before and after each contract. Use correct spelling and punctuation. Do not omit the subject of a sentence. That can't be difficult.

  6. Indent well. Avoid long lines. Limit lines to about 80 characters.

  7. A function body must not change the value of a call-by-value parameter. Pay attention to this. In the past, many students have lost points for violating this requirement.

  8. Avoid code duplication.

  9. Do not use redundant tests in if-statements.


Submitting Your Work

You must submit your program using the following method. Email submissions will not be accepted. An excuse that you do not know how to use Linux will not be accepted.

To turn in your work, log into xlogin, change your directory for the one for assignment 4, and use the following command.

  ~abrahamsonk/2530/bin/submit 4 equiv.cpp equiv.h
After submitting, you should receive confirmation that the submission was successful. If you do not receive confirmation, assume that the submission did not work.

Command

  ~abrahamsonk/2530/bin/submit 4
will show you what you have submitted for assignment 4.

You can do repeated submissions. New submissions will replace old ones.


Late Submissions

Late submissions will be accepted for 24 hours after the due date. If you miss a late submission deadline by a microsecond, your work will not be accepted.


Asking Questions

To ask a question about your program, first submit it, but use assignment name q4. For example, use command

  ~abrahamsonk/2530/bin/submit q4 equiv.cpp equiv.h
Then send me an email with your question. Do not expect me to read your mind. Tell me what your questions are. I will look at the files that you have submitted as q4. If you have another question later, resubmit your new file as assignment q4.


Extra Credit: Improving the Equivalence Relation Manager.

For extra credit (up to 30 points out of 350), implement two improvements on the equivalence manager algorithm.

But be careful. Make sure that they work and that they do not ruin the equivalence relation manager. You will get no extra credit for an improvement that does not work. If your "improvements" ruin the equivalence relation manager, you will received less credit than you would for a working equivalence relation manager without the improvements.

You can only receive extra credit if your program passes the tests that I run on it.

The First Improvement: Improving the Leader Function (15 pts)

The first improvement involves a change to the leader function.

Suppose array R contains the following.

  R[1] = 2
  R[2] = 4
  R[3] = 3
  R[4] = 6
  R[5] = 3
  R[6] = 6
  R[7] = 4
  R[8] = 5
  R[9] = 1
Then computing the leader of 1 requires chaining through 1, 2, 4 and 6, stopping at 6. If the program computes the leader of 1 several times, it must redo that scan each time.

After the leader function scans through a chain of boss links to find the leader of a value, it should go back through the chain and put the current leader in the boss array for every number that was looked at in the chain. That way, subsequent leader computations will reach the leader much more quickly. The improvement changes the contents of the array to the following, by installing the correct leader (6) of each of the numbers that was looked at.
  R[1] = 6
  R[2] = 6
  R[3] = 3
  R[4] = 6
  R[5] = 3
  R[6] = 6
  R[7] = 4
  R[8] = 5
  R[9] = 1
You can do this with another loop that rescans through the chain, the same way the chain was scanned the first time, but now putting the leader into the array as you go. Alternatively, make the leader function be recursive, and just change the boss after each recursive call.

Notice that we have not scanned the entire array from beginning to end! R[8] is still 5, even though the leader of 8 is 3. R[9] is still 1, since 9 was not encounted in the scan from 1 to 6. Only the numbers that were looked at in the original scan have their boss values changed. If you try to change everything in the array, you make the implementation slower, not faster.

Also notice that it was not just the boss of 1 that was changed. All of the numbers that were examined in the chain have their bosses set to their leaders. For example R[2] was changed too.

Be sure to test your improved leader function. It is easy to write it incorrectly. Try it by hand to see whether it seems to do the right thing.

(In the past, most students who did this improvement using a loop got it wrong, so be careful. Most students who did it with recursion got it right.)

The Second Improvement: Improving Combine (35 pts)

To do this improvement you will need to use structures and arrays of structures which we will cover a little later.

Each number that is a leader has a collection of constituents, namely the members of its equivalence class. For example, in the above array, number 3 is a leader, and the constituents of 3 are 3, 5 and 8, so 3 has three constituents, counting itself.

When doing a combine operation, you find two values s and t that are leaders. You can then either change the boss of s to t (so s is no longer a leader) or change the boss of t to s (so t is no longer a leader). Either one will accomplish the goal of combining the two sets, but the choice of which to do influences the efficiency of the implementation. The best choice is to change the boss of the value that has the fewest constituents. That tends to keep the lengths of the boss chains up to the leader short.

Modify your ER type so that it is an array of structures. Each structure holds a boss and a constituent count. So, if R has type ER then R[k].boss is the boss of k in equivalence relation R. If k is a leader then R[k].numConstituents is the size of the equivalence class that k leads. (If k is not a leader then R[k].numConstituents should be 0, since a nonleader has no constituents.)

A picture of the initial array, before any combines have been done, might look like this. Notice that each number has one constituent, itself.

  index    boss  numConstituents
    1        1           1
    2        2           1
    3        3           1
    4        4           1
    5        5           1
    6        6           1
    7        7           1

If you do combine(R, 3, 5) with the above array, you might arbitrarily decide to make the boss of 3 be 5, since they have the same constituent count. Then the array looks like this.

  index   boss  numConstituents
    1        1           1
    2        2           1
    3        5           0
    4        4           1
    5        5           2
    6        6           1
    7        7           1
If you now do combine(R, 5, 1), you must change the boss of 1, since it has fewer constituents than 5. The array ends up looking like this.
  index   boss  numConstituents
    1        5           0
    2        2           1
    3        5           0
    4        4           1
    5        5           3
    6        6           1
    7        7           1
As before, only change the boss of a number that is currently a leader. If you now do combine(R, 2, 1), you must realize that you are really being asked to combine 2 and 5, since 5 is the leader of 1. Since 5 has more constituents, you change 2's boss, yielding
  index   boss  numConstituents
    1        5           0
    2        5           0
    3        5           0
    4        4           1
    5        5           4
    6        6           1
    7        7           1
As you can see, this improvement tends to lead to shorter chains of bosses before the leader is found.

Suppose you continue by combining 6 and 7. You might get the following.

  index   boss  numConstituents
    1        5           0
    2        5           0
    3        5           0
    4        4           1
    5        5           4
    6        7           0
    7        7           2
Now combine 1 and 6. Their leaders are 5 and 7. Since 5 has more constituents, change the boss of 7 to be 5. The new information is as follows.
  index   boss  numConstituents
    1        5           0
    2        5           0
    3        5           0
    4        4           1
    5        5           6
    6        7           0
    7        5           0
Now 5 has six constituents. Although 6 is one of 5's constituents, its boss is still 7. The boss chains are shorter, but you still need to do the leader calculation using a loop (or recursion). A value's boss is still not necessarily its leader.

Important note. This improvement requires you to change the definition of type ER. The types of functions newER, equivalent and combine must not change because of this improvement. For example, the type of equivalent should be

  bool equivalent(ER R, const int x, const int y);
Do not change that to
  bool equivalent(ER* R, const int x, const int y);

Efficiency Before and After Improvements

If you do neither improvement, then the equivalence relation manager can take time proportional to n2 to process n combine and equivalent requests. With both improvements, the time is no worse than proportional to n α(n) where α(n), the inverse of Ackerman's function, is a very slowly growing function of n, so slow that, for all remotely practical values of n, α(n) ≤ 6.