Computer Science 3200
Fall 2015
Section 001
Programming Assignment 6

Assigned: Monday, November 16
Due: Wednesd, December 2, 11:59pm

Table of contents

  1. Edit distance
  2. Assignment requirements
  3. An algorithm
  4. Extra credit
  5. Submitting your work


Edit distance

There are three basic edit operations on a string.

  1. Remove any single character.
  2. Insert any single character anywhere in the string.
  3. Replace any single character by another character (of your choosing).
The edit distance between strings s and t is the fewest number of basic edit operations that will change s into t. For example, to change "cat" into "cute", you can do the following steps. So the edit distance betweeen "cat" and "cute" is 2.

  cat
  cut    (replace a by u)
  cute   (insert e at the end)
You can do the steps in a different order, but you cannot use fewer than two basic edit operations.


Assignment requirements

Your assignment is to write a Java program that reads two lines of text from the standard input, holding two strings. Trim any white space from the front and back of each string. (s.trim() yields the result of trimming s.) Then your program should show s and t and the edit distance between s and t. The output must be readable. Here is a sample of a readable output.

The edit distance between "cat" and "cute" is 2.


An algorithm

To derive an algorithm for the edit distance problem, we turn again to dynamic programming. Here are two key ideas.

  1. It is not enough to compute the edit distance between strings s and t. We must compute the edit distance between every prefix of s and every prefix of t. Suppose s = s1sm and t = t1tn. Define d(i, j) to be the edit distance between s1si and t1tj. It is convenient to define s(i) to be s1si and t(j) to be t1tj. Then d(i, j) is defined to be the edit distance between s(i) and t(j).

  2. It is important to store the answers for the prefixes so that they can be used to compute the values for longer prefixes. Eventually, we compute d(m, n), which is the edit distance between s and t, since s = s(m) and t = t(n).


Notice that


If i > 0 and j > 0, there are two cases.

Putting it into practice

The algorithm to find the edit distance between s = s1sm and t = t1tm starts by creating a two-dimensional array D of integers with m+1 rows and n+1 columns. Then it fills it in, with D[i][j] = d(i, j).

First fill in row 0 and column 0 using facts 1 and 2. Then fill in the rest of the values be rows, from top to bottom. (Top is index 1 and bottom is index m.) Fill in each row from left to right. For each entry, use either fact 3 or fact 4, whichever is appropriate.

Finally, the edit distance between s and t is found in D[m][n].


Extra credit

For extra credit, show a sequence of modified versions of s as you convert s into t. For example, if s = "forest" and t = "frost", then the program would show the following.

  forest
  forst
  foost
  frost

Do this after you have filled in the D array. Keep track of the current version of s, after zero or more modifications (say, in variable current). Start at the lower right corner (D[m][n]) of the D array. So set i = m and j = m.

You can tell which operation was used to compute D[j][j]. If sj = tj then no operation was done, so move up and to the left (by decrementing i and j).

If sitj, then some basic edit operation must have been performed. Recompute which one was performed, just as you did when using fact 4 to fill in D[i][j]. But pay attention to which operation was selected in the min operation. (It suffices to try each possibility and to select any one that yields the answer found in D[i][j].) Alter variable current by performing the appropriate operation at index i. Change i and j to move to the appropriate location (the one that occurs in the min). For example, if a replace operation was selected, then replace current[i] with tj and subtract 1 for both i and j, since in that case the min operation selected D[i−1][j−1].

If you reach i = 0, then there are j remaining steps. If you reach j = 0, then there are i remaining steps. Try an example to see what they are. Stop when both i and j are 0.


Submitting Your Work

To turn in your work, log into the Linux system, change your directory for the one for assignment 6, use the following command.

  ~abrahamsonk/3200/bin/submit 6 EditDistance.java
After submitting, you should receive confirmation that the submission was successful. If you do not receive confirmation, assume that the submission did not work.

Command

  ~abrahamsonk/3200/bin/submit 6
will show you what you have submitted for assignment 6.

You can do repeated submissions. New submissions will replace old ones.

Late submissions

Late submissions will be accepted for one day after the due date. If you miss a late submission deadline by a microsecond, your work will not be accepted.