Computer Science 3510
Spring 2001
Programming Assignment 1

Due date:
First solution: Jan 31
Second solution: Feb 16

START EARLY ON THIS ASSIGNMENT. DO NOT WAIT

If you experience difficulty, ask for help early. Do not wait until the last minute to ask for help.

Try to turn in a high quality solution for your first version, so that the second version is just touch up. Do not skip the first version.


This is mainly an exercise in the use of dynamic memory allocation. It also uses a cgi program to process a web page query.

Hash Tables

Accompanying this assignment is an implementation of hash tables. This is not an application, but is a library module that might be used in many applications. The implementation is in the following two files. Copy those files so that you can work on them.
  • table.cc
  • table.h
  • Hash tables

    A table is an association. You look up a particular key in the table, and get back the associated value. An example of a table is a telephone book, where the key is a person's name, and the associated value is the person's telephone number. (Of course, sometimes, instead of getting the associated value, you get a notification that the key is not in the table.)

    A hash table is a particular way to implement tables. The idea is to keep the table in an array of structures, where each structure contains a key and an associated value. But the really clever idea of hash tables is not use of an array, but in deciding where to put the key/value pairs in the array.

    A hash table implementation starts with a hash function that takes a key and produces an integer. For example, a hash function for strings might be designed so that

    hash("John") = 20
    hash("Mindy") = 42

    Typically, the integers produced by the hash function are fairly large. If the array that holds the key/value pairs has N slots in it (numbered 0 to N-1) then the pair with key k is put at index hash(k) mod N. For example, suppose that an array of size 5 is to be used. Since hash("John") = 10 and 20 mod 5 = 0, the table entry for "John" is put at index 0. Similarly, since hash("Mindy" = 42 and 42 mod 5 = 2, the entry for "Mindy" is put at index 2. The array with entries for "John" and "Mindy" might look like this.

    indexkeyvalue
    0 "John" "555-3545"
    1 - -
    2 "Mindy" "555-6921"
    3 - -
    4 - -

    Some of the entries are empty. Some special key value needs to be set aside to stand for an empty cell.

    To look up a key k in the table, you compute i = (hash(k) mod N) and look at index i.

    There is a difficulty, however. What happens if two different keys want to be stored at the same place? This is called a collision. Some rule needs to be used to deal with collisions. One approach is to store several key/value pairs (for example, in a linked list) at each spot in the array. That is called chaining. A different approach is called open addressing. The idea is that, if the location where a given key wants to be stored is already occupied, you simply try the next location after that. You keep looking ahead in the array until you find an unoccupied location, and put the key/value pair there. If you reach the end of the array, you wrap around to the beginning.

    To look up a key using open addressing, you compute the hash value of the key, and start by looking at the index given by that hash value. If the key stored there is the one you are looking for, then you are done. If not, you must try the next slot, and keep looking ahead (wrapping around to the beginning at the end of the array) until either you find the desired key or you encounter an empty cell.

    File table.cc implements hashing by open addressing.

    Program modification 1.

    The implementation in table.cc has the difficulty that it has a fixed size table, and the table can fill up. Your assignment is to modify the table implementation so that the table does not fill up.

    It is not acceptable merely to increase the fixed size to some very large value. That would be wasteful of memory when few things are actually put into the table. Instead, your new implementation must reallocate the table when it runs out of memory. Do this by adding another field to the table type, indicating the physical size of the array --- that is, how many slots the cell array currently has. When the cell array becomes nearly full, allocate a new, larger array, and copy everything that was in the old array into the new array. Be sure to update the field that indicates the physical size of the array of cells.

    Watch out. In a hash table, the position of an item is computed based on a hash function, and the position computation depends on the physical size of the hash table. So when the physical size changes, the positions of entries also change. To copy the old values into the new array, you will need to do hashInsert calls, so that the positions are set correctly. Think about this.

    When you allocate a new array, don't get just one more cell. If you do, then a growing hash table becomes very expensive, since each insertion needs to make a copy of the entire table. One idea is to double the size of the table when it is reallocated. At any rate, increase the size by some factor that is greater than one.

    You will need to modify function hashFull. Do not delete it, since it is part of the data type, even if now a useless one. (It is possible, as far as you know, that someone has already written a program that uses these hash tables, and that employs function hashFull to ask whether the table is full. You need for such a program to continue to work with the new table implementation.) Your hash tables never fill up. Think carefully about the correct implementation of hashFull. Read the contract for hashFull to see what its advertised behavior is. How might some other program be using hashFull?

    Program Modification 2.

    A second difficulty is that this implementation allows the table to become almost completely full. A hash table using open addressing that is almost completely full is very inefficient to search. It is better to keep at least 1/4 of the cells empty. Modify the implementation to reallocate the table when it becomes 3/4 full. Be careful: if you use integer arithmetic, you might get incorrect results. For example, in C++, 3/4 is 0. How can you test whether x > (3/4)y using only integer arithmetic? Can you rearrange this inequality so that it does not require division?


    A CGI Program

    World wide web pages are not necessarily fixed. They can be computed on the fly, either on the client computer or on the server computer. CGI is a method of allowing the server to build dynamic web pages.

    File form.html is a form that can be filled out. When the submit button is pushed, the values that were put into the form are passed to a cgi program called formquery.cgi. There is nothing really special about a cgi program; it is an ordinary program. The information on the form is sent to the cgi program as its standard input. For example, if you enter a name of Hilbert and a password of Topology, then the standard input to formquery.cgi will contain

        LASTNAME=Hilbert&PASSWORD=Topology
    
    Program formquery.cgi is supposed to look up information in a file called grades. The lines are of the following form.
        name password info
        name password info
        ...
    
    For example, file grades might contain the following.
        Hilbert     Topology A
        Grothendiek xxxxx    B
        Dahl        r:z$$    I am not finished grading your work.  Sorry.
    
    The cgi program works by writing, to the standard output, a preamble describing what the form of the output is, followed by the page to show. To cause the web browser to display a new Html page, the preamble can be
    Content-type = text/html
    
    
    The Html page to be displayed follows the preamble in the standard output. (There is additional information provided to the cgi program. Some environment variables are set. For example, environment variable REQUEST_METHOD is set to POST and variable CONTENT_LENGTH is set to the number of characters passed to the cgi program in its standard input. The values of environment variables can be obtained from function getenv. We do not use these variables in this simple program.)

    What to Do

    Get file formquery.cc and the html files given here.
  • formquery.cc
  • resultform.html
  • form.html
  • badinfo.html
  • badname.html
  • badpassword.html
  • Most of formquery.cc is written for you. You need to write the main program. Write it. You will need to read the contracts of the prewritten functions. The main program should not be very long. Roughly 25 to 35 noncomment lines should suffice. That does not mean that you should not provide any comments!


    Testing

    First test the hash tables. Do not just make the modifications to the table.cc and table.h file and then start on the cgi program without testing the table implementation!!!!!!!!!!!!!!!!

    Write a program to do the testing of the hash table implementation. Test things so that you think all aspects of the implementation have been tested. Do not be lulled into a false sense of security just because your program passes a few very simple tests. In the past, I have had assignments turned in whose tests did not even put enough things in the hash table to trigger a reallocation! As a result, the newly added code was never even run during the tests. Do not make that mistake. How can you be absolutely sure that you are running your new code?

    After you are satisfied that the table implementation is working correctly, work on the cgi program. Compile it so that the executable code is put in file formquery.cgi. For example, a short compile line would be

      g++ -g -Wall -o formquery.cgi formquery.cc table.cc
    
    (Of course, using a makefile is easier than typing this over and over.)

    You can test your program without using a web browser. Create your own grade file. Write some data files that are what the web browser might provide to your program. Run the program, redirecting the standard input to those files. For example, if you create a data file called testdata containing

    LASTNAME=Hilbert&PASSWORD=Topology
    
    then run your program using command
    formquery.cgi <testdata
    
    The program should print the Html page that would be sent to the browser for display.

    When your cgi program works to your satisfaction, try it with a browser. Do the following.

    1. Compile your program on one of the Sparc machines. This is necessary, because you will be running it on darla, which is a Sparc machine.

    2. Create a directory called public_html in your home directory. Create a subdirectory of that directory called prog1, or whatever else you want to call it. Put, in that directory, all of the html files that are part of this program, as well as formquery.cgi and the grades file. Make sure the permissions are correct. (formquery.cgi needs to be world executable, and the html files need to be world readable. Use ls -l to see the permissions, and chmod to change them. See the Unix notes on the course web page.)

    3. Start a web browser, and visit page
        http://darla.csci.ecu.edu/~myid/prog1/form.html
      
      where you should replace myid by your login id.

    4. Try the form.
    (Note: By placing the grades file in the public_html directory, you lose control over it, since it will be readable by everybody. Anybody can get it directly using a web browser, making all grades available to everybody. That is a security issue that can be solved, but we do not attempt to deal with here.)

    At the time of this writing Darla was not connected, due to temporary networking problems. It should be fixed soon.


    Turning in your solution

    Write your first version. It should, ideally, be the full program, and it will be graded on the assumption that you should be turning in the full, completed program. If you have not completed the whole thing, turn in what you have done. After receiving feedback, write the second version, also implementing the full assignment.

    Turn in your solution using the handin program. To turn in version 1 of program 1, give command

      /export/stu/classes/csci3510/bin/handin csci3510 1 formquery.cc table.cc table.h
    
    and to turn in version 2 use command
      /export/stu/classes/csci3510/bin/handin csci3510 2 formquery.cc table.cc table.h
    

    I will compile and test your solution. I will also test your hash table implementation separately with another tester. Rest assured that I will insert many things into the table, enough to cause more than one reallocation. I will create more than one hash table as well, so your implementation must handle each hash table independently of any others.

    Be sure that your solution is prepared to link with my tester. It must, therefore, have the same function prototypes as the original implementation. If you change the function prototypes, it will be inconsistent with what my test program expects. If you delete function hashFull, then your program will not link with my tester.

    Note: Your solution is expected to compile. Solutions that do not compile will receive no credit, but will receive some feedback for the first version.