Computer Science 2611
Summer 1999
Laboratory Assignment 11

Edit distance

The edit distance between two strings A and B is the smallest number of single character insertions, deletions and replacements needed to change string A into string B. For example, the edit distance between "robot" and "rabbit" is 3, since robot can be changed into rabbit by (1) replacing the first 'o' by 'a', (2) replacing the second 'o' by 'i', and (3) inserting a 'b', and that is the fewest possible operations that can change robot into rabbit.

Let A be an array that holds a string of length m, and let B be an array that holds a string of length n. Number the members of arrays A and B starting at 0, in the C++ convention. Define D[i][j] to be the edit distance between the length i prefix of A and the length j prefix of B. Then the following facts can be shown.

  1. D[0][j] = j

  2. D[i][0] = i

  3. D[i][j] = D[i-1][j-1] when A[i-1] = B[j-1].

  4. D[i][j] = 1 + min(D[i][j-1], D[i-1][j], D[i-1][j-1]) when A[i-1] != B[j-1].

Assignment

Write a program that reads two strings and prints the edit distance between those strings. Your program MUST contain a function that takes two strings as parameters and returns the edit distance between those two strings. That function should neither read anything from cin nor write anything to cout.

The facts about D, if used naively, might suggest a recursive algorithm. If you write this program using naive recursion, it will be ASTOUNDINGLY inefficient. The same thing will be computed over and over. Instead, store the values in a two-dimensional array, so that you can look them up. (If you think about the algorithm, you will see that the same value gets looked up more than once. Using recursion would require recomputing them. Recursion amplifies bad as well as good things, and makes the recursive approach very bad. The recursive approach can end up computing some things thousands of times.)

Test your program, and check whether the answers are correct. Turn in a printout of your program.