Computer Science 3510
Summer 2003
Programming Assignment 1

Due date:
First solution: Thu. Jul 3
Second solution: Wed. Jul 9

START EARLY ON THIS ASSIGNMENT. DO NOT WAIT

If you experience difficulty, ask for help early. Do not wait until the last minute to ask for help.

Try to turn in a high quality solution for your first version, so that the second version is just touch up. Do not skip the first version.

This is an exercise in the use of dynamic memory allocation.


Hash Tables

Accompanying this assignment is an implementation of hash tables. This is not an application, but is a library module that might be used in many applications. For this assignment, imagine that this hash table module is already part of a much larger program, and that it is used in several other modules. You are not creating a new tool, but modifying an existing one.

Hash tables are described in file table.cc. Read that description.


Hash Table Modification 1.

Hash tables use arrays to hold data. In table.cc, the array is called the cell array, and is pointed to by a pointer called cells.

The implementation in table.cc has the difficulty that it has a fixed size array, and the table can fill up. Your assignment is to modify the table implementation so that the table does not fill up.

It is not acceptable merely to increase the fixed size to some very large value. That would be wasteful of memory when few things are actually put into the table. Instead, your new implementation must reallocate the table when it runs out of memory. Do this by adding another component, or variable, to the table type, indicating the physical size of the array --- that is, how many slots the cell array currently has. When the cell array becomes nearly full, allocate a new, larger array, and copy everything that was in the old array into the new array. Be sure to update the component that indicates the physical size of the array of cells.

When you allocate a new array, don't get just one more cell. If you do, then a growing hash table becomes very expensive, since each insertion needs to make a copy of the entire table. One idea is to double the size of the table when it is reallocated. At any rate, increase the size by some factor that is greater than one.

You will need to modify function hashFull. Do not remove it, since it has been advertised in the header file. It is possible, as far as you know, that some other module uses hashFull, and you do not want your modification of this table modules to require modifications of other modules. Your hash tables never fill up. Think carefully about the correct implementation of hashFull. Read the contract for hashFull to see what its advertised behavior is. Think about how hashFull might some other module. You want that other module to continue to work correctly, without modification.

Hash Table Modification 2.

A second difficulty is that this implementation allows the table to become almost completely full. A hash table of the kind implemented here that is almost completely full is very inefficient to search. It is better to keep at least 1/3 of the cells empty. Modify the implementation to reallocate the table when it becomes 2/3 full. Be careful: if you use integer arithmetic, you might get incorrect results. For example, in C++, expression 2/3 yields result 0. How can you test whether x > (2/3)y using only integer arithmetic? Can you rearrange this inequality so that it does not require division?


Hints and warnings

Warning. In a hash table, the position of an item is computed based on a hash function, and the position computation depends on the physical size of the hash table. So when the physical size changes, the positions of entries also change. To copy the old values into the new array, you will need to do something like hashInsert calls, so that the positions are set correctly. Think about this.

Warning. The key and value strings are not stored in the cell array. Instead, the cell array holds pointers to other arrays that hold the sting contents. This allows the strings to have different lengths. It is very important that you draw pictures to see this. Pay attention to ownership issues. Arrays that you have allocated are yours, and nobody but you will change them. Arrays that other modules have allocated might be changed by those other modules. Do not make the brash assumption that another module will not change what is in one of its arrays.

Hint. There is a constant called HASH_TABLE_SIZE in table.h. It is defined by a #define directive as the (fixed) physical size of a hash table. After your modification, HASH_TABLE_SIZE is no longer meaningful, since hash tables do not have a fixed size. So get rid of HASH_TABLE_SIZE. Do not use this name for a different purpose. Instead, just get rid of it.

Hint. You will find it convenient to add at least one new function. (Do not try to make the existing functions a lot larger. Instead, call a new function.) Do not advertise your new functions. No new functions are being provided to other modules. You will lose points for advertising functions that should not be advertised.

Hint. Be sure to keep documentation up to date when you make modifications.


Testing

Write a program to test the hash table implementation. Test things so that you think all aspects of the implementation have been tested. Do not be lulled into a false sense of security just because your program passes a few very simple tests. In the past, I have had assignments turned in whose tests did not even put enough things in the hash table to trigger a single reallocation. As a result, the newly added code was never even run during the tests. Do not make that mistake. How can you be absolutely sure that you are running your new code? A good idea is to force more than one reallocation, and also to use more than one table, since some mistakes only show up when you do something two or more times.


Turning in your solution

This information will be provided shortly, as soon as I can determine it. The lab systems have been undergoing substantial modifications recently.