Computer Science 2311
Fall 2007
Laboratory Assignment 9

Handed out: 11/5/07

Note. Read this entire assignment before beginning to work on it.


Edit distance

We studied the edit distance problem in class. I will repeat the problem definition and the basic facts that we derived.

The problem

A basic edit operation on a string is either to insert a character anywhere, to delete any character, or to replace one character by another.

You are given two strings x and y. You want to know the smallest number of basic edit operations that can be used to change x into y. Define that to be distance(x, y). For example, distance("cat", "cute") = 2, since you can change "cat" into "cute" by performing two basic edit operations. (Replace the a by u, then insert e at the end.)

Prefixes of strings

Suppose that x and y are two strings. Define x(i) to be the length i prefix of x, and y(j) to be the length j prefix of y. For example, if x is "kangaroo", then x(5) is "kanga", x(3) is "kan" and x(0) = "" (an empty string).

Facts

Suppose that the characters in strings x and y are numbered starting at 1. So the first character of x is x1, the second character is x2, etc.

We derived the following facts in class.

  1. distance(x(0), y(j)) = j

  2. distance(x(i), y(0)) = i

  3. distance(x(i), y(j)) = distance(x(i-1), y(j-1))  provided xi = yj

  4. distance(x(i), y(j)) = 1 + min(distance(x(i-1), y(j-1)), distance(x(i-1), y(j)), distance(x(i), y(j-1)))  provided xiyj


Algorithm 1

A straightforward algorithm is to use the above facts in a recursive function definition. Here is a definition of the distance function written in Cinnameg. The idea is that dist(x, y, i, j) produces distance(x(i), y(j)).

Define
  case dist(x,y,i,j) = j                    when i == 0

                     = i                    when j == 0

                     = dist(x, y, i-1, j-1) when x#i == y#j

                     = 1 + min[dist(x, y, i-1, j-1),
                               dist(x, y, i-1, j),
                               dist(x, y, i,   j-1)]
%Define

Define distance(x,y) = dist(x, y, length(x), length(y)).

Algorithm 2

Algorithm 1 has the problem that it recomputes the same thing many times. That makes it very slow. Recomputation can be avoided by writing things down, and using the results that you wrote down earlier rather than recomputing them. You can create a two-dimensional array (a grid) d where di, j will be set to hold distance(x(i), y(j)). If you store values in the d array in the right order, (such as by rows, from row 0 to row n, left-to-right within a row) you will find that the values you need have always been computed earlier, and you just get them out of the array. Just

  1. Create the two-dimensional array d.
  2. Use rules 1 and 2 to fill in row 0 and column 0.
  3. Fill in the remaining entries of d, by rows, using rules 3 and 4. Remember to start at row 1 and column 1. Be sure to do a test, for each position, whether rule 3 or rule 4 is appropriate.
  4. The answer is in the last column of the last row.


The assignment

  1. Define a class, DistanceSolver, that has two private variables (x and y) of type String. Create a constructor that takes two strings as parameters, and installs them as the values of x and y. The idea is that an object of class DistanceSolver knows how to compute the distance between strings x and y.

    Note. Class DistanceSolver will not be public. Just start it with

      class DistanceSolver
    

  2. Implement Algorithm 1 in Java, as a method in class DistanceSolver. Call your method distance1. Notice that it will not take any parameters, since it will get strings x and y from the variables. You will need a helper method that takes two integer parameters i and j. Just translate the Cinnameg program to Java, but omit parameters x and y, since they are obtained from the variables in the class. Make the helper method private.

    You can use method Math.min(a,b) to get the smaller of a and b. For example,

      int r = Math.min(4,2);
    
    makes r = 2.

  3. Create another (public) class that holds the main method, and make that program test your distance1 method. For example, a start is as follows.

      public static void main(String[] args)
      {
        DistanceSolver solver = new DistanceSolver("cat", "cute");
        System.out.println("The distance between \"cat\" and \"cute\" is "
                           + solver.distance1());
      }
    
    (Writing \" in a string constant causes the quote character to be put in the string.) Try at least two examples. Test this. Do not proceed until you believe that distance1 is working correctly.

  4. Implement Algorithm 2 in Java, as a method called distance2 in class DistanceSolver. Do not remove Algorithm 1. Leave your distance1 function in your program. Just add this new one.

    Change your main method to call distance2. Test it. Do not proceed until it works.


  5. Comment out your main method that performs tests. Do not remove it. (Just put // in front of each line to make it a comment.)


  6. Write a new main method that gets two strings x and y from the user using message boxes. There are notes below on message boxes. Using another message box, your program should ask the user for a number, either 1 or 2, telling which algorithm to use. Your main program should then pop up a box telling the edit distance between x and y, using algorithm 1 if the number is 1, and algorithm 2 if the number is 2.

    You can use Java method Integer.parseInt(s) to convert from a string to an integer. For example,

      int r = Integer.parseInt("35");
    
    makes r = 35. Alternatively, just read the method as a string and compare strings. Important. Do not test whether two strings are equal using operator ==. Expression u.equals(v) is true if strings u and v are equal.

    Test your program before continuing.

  7. Try your algorithms on some long strings. Your algorithms will take the most time when the two strings do not contain any characters in common. For example, compute the edit distance between abcdefg and hijklmn. Next try abcdefghijklm and nopqrstuvwxyz. Which algorithm appears to be more efficient? Is the difference significant? What do you think will happen if you use two strings of length 30?


Submit a printed version of the program. On the printed version, write (by hand) about how long algorithm 1 and algorithm 2 took to find the distance between abcdefghijklm and nopqrstuvwxyz. (Just time it by hand. If the time is too fast to notice, say "too short to tell".


Strings in Java

A string has type String. Characters are numbered from 0 in Java. You can get the i-th character of a string x (numbering from 0) using Java expression x.charAt(i). So Cinnameg expression x#i is equivalent to Java expression x.charAt(i-1) (since Cinnameg numbers from 1).

You can get the length of string x using x.length( ). For example, "abc".length( ) = 3.


Two-dimensional arrays

For algorithm 2, you need to use a two-dimensional array. A two dimensional array of integers has type int[][]. So

  int[][] d;
creates a variable d that can refer to a two-dimensional array of integers. But it does not actually create the array. To build an actual array, and make d refer to it you can write
  d = new int[m][n];
where m is the desired number of rows and n is the desired number of columns. For example,
  d = new int[2][3];
creates an array that looks as follows, where Each slot can hold one integer.

              
              

To use the value at row i and column j in array d, write d[i][j]. But remember that numbering is from 0, so the first row of the array is row 0, and the first column is column 0.

What a two-dimensional array really is

A two-dimensional array is really just an array of arrays. It is an array of its rows, and each row is an array of columns. If d is a two-dimensional array, then d.length is the number of rows, and d[0].length is the number of columns in row 0. Typically, all of the rows have the same number of columns, but it is possible to create a two-dimensional array where different rows have different sizes. (Can you see how to do it?)


Message boxes using the Swing library

The Java library has been built up over time, and has different sections that were contributed by different groups, or at different times. One part is the Swing library, which helps you do graphics, and is popular and easy to use. To use Swing, you should put

  import javax.swing.*;
in your program. Java statement
  String xstr = JOptionPane.showInputDialog("What is the first string?");
causes a message box to pop up that contains the text "What is the first string?", and also has a box where the user can type a string. It will set xstr to the string that the user types.

You can also just show a message without asking for information. If msg is a string that you want to show, then statement

  JOptionPane.showMessageDialog(null, msg);
will pop up a box showing message msg, with an ok button to allow the user to say when he or she is done reading the message.

Important. If you use graphics, then you need to shut down the graphics support when the program is done. At the end of main, add statement

  System.exit(0);
to shut everything down. If this statement is performed anywhere in a program, the entire program is stopped. You cannot do anything after this statement.