←	• HOME • UP •	→
	Top/Data structures/Trees/Tables

10.2.4. Tables and Sets

Tables

Imagine that you want to create a lookup table, such as a telephone book, in your program. You might want to look up a name and find the associated telephone number, or look up a word and find how many times you have seen that word. The value being looked up is called a key.

Typical operations on tables are:

Insert a (key, value) pair into the table;
Remove a key from the table (along with its associated value);
Look up a key and find the associated value, or say that the key does not occur in the table;
Ask whether the table is empty.

For a given key there can only be one (key, value) pair in the table with that key. After all, in a telephone book, there should only be one entry for a particular name. The insert operation can replace an entry, but it cannot add a second entry for a key. There is no such requirement for the values. Several keys can have the same associated value.

Implementation of tables as arrays

An obvious way to implement a table is to store it in an array. For example, a table whose keys and associated values are both strings can be represented as an array of TableEntry's, where type TableEntry is defined as follows.

  struct TableEntry
  {
    char* key;
    char* value;
  };

At any given point, only part of the array is used, with a vacant part after that for insertions. So the array's logical size can be smaller than its physical size.

But using an array to represent a table has some serious efficiency issues.

Suppose that the entries are stored in an arbitrary order. Then adding a (key, value) pair is easy: add it to the end of the used part. But to look up a key requires searching the array. If there are n entries currently in the table then the time required in the worst case is Θ(n). (Because the time is a linear function of n, this algorithm is called linear search.) Removing a key is also difficult, since it requires finding the key first.
Suppose that the entries are stored in order by key. For example, when the keys are strings, the entries can be stored in alphabetical order. That makes lookup much more efficient since you can use binary search, which only takes time Θ(log₂(n)) in the worst case. But now adding an entry is slow because you need to move other entries in order to make room for the new entry. The time to insert or remove a value is Θ(n) in the worst case.

Using a linked list would appear to help to do an insertion without moving things around. But binary search requires an array, so that rules out using a linked list.

In the next few pages we explore a way to combine the best features of linked lists and ordered arrays to achieve an implementation of tables that takes time Θ(log₂(n)) to look up a key, insert an entry or remove an entry, in the worst case.

Sets

The main issue in a table is keeping track of the keys. The associated values are just attached to them by being in the same structure. If we can find an efficient way to store just the keys, then it is easy to modify that to store (key, value) pairs.

Since storing just the keys is simpler, we look at the problem of storing a set of values. The operations are:

Ask whether a particular key is a member of a given set;
Insert a key into a set;
Remove a key from a set;
Ask whether the set is empty.

Also, it makes the presentation a little simpler to look at sets of integers instead of sets of strings. Handling strings is just a matter of changing types and using a function such as strcmp to compare strings. You also have to be careful about memory management of the strings in the table, which is not an issue with integers.

←	• HOME • UP •	→
	Top/Data structures/Trees/Tables