Computer Science 3650
Solutions to exam 2

These are the non-multiple-choice questions.
  1. Show the Knuth/Morris/Pratt failure function for pattern aabbaaabaab.

       a  a  b  b  a  a  a  b  a  a  b
       0  1  0  0  1  2  2  3  1  2  3
      

  2. You are given a string, and must find the longest proper prefix of that string that is also a suffix of that string. For example, if the input is abacaba, the longest proper prefix that is also a suffix is aba. (A proper prefix is a prefix that is not the entire string.) Describe an efficient algorithm to solve this problem.

    This is exactly what the failure function tells you. On input x1...xn, compute the failure function, and let k = f(n). The longest proper prefix that is also a proper suffix is x1...xk.

  3. You are given a list L of n points in the plane, in ascending order by their x-coordinates. (That is, they are already sorted from left to right.) No two of the points in list L has the same x-coordinate. The points in L are guaranteed to be the vertices of a convex polygon. But notice that, since they are sorted from left to right, they are not given to you as they occur in order around the polygon.

    You are also given another point P in the interior of the convex polygon given by list L. You would like to sort the list of points of list L clockwise around P, starting with the leftmost point. Describe an O(n) time algorithm to solve this problem. Make sure that your algorithm runs in linear time. Assume that all standard arithmetic operations can be done in constant time each. (Hint: think about convex hulls.)

    Run the Graham scan algorithm to find the upper hull, and again to find the lower hull. Read off the points in the upper hull from left to right, followed by the points in the lower hull from right to left (backwards). That is the list of points, in clockwise order.

  4. You are given two lists of real numbers u1,..., um and x1,..., xn, possibly of different lengths. Both lists are presorted into ascending order. Neither list contains any number more than once.

    Each ui is thought of as standing for the set Si of all values y such that ui <= y <= ui + 1. That is, Si is the interval of the real line of length 1 with left endpoint ui.

    You would like to find the smallest collection of the given unit intervals whose union contains all of the numbers in list x1,..., xn. That is, you want to select as few of the given intervals as possible so that each of the xj values is in at least one of the intervals that you selected.

    You will need to select at least one interval that contains the smallest number, x1. A greedy principle would be to select the interval that is farthest to the right that contains x1, since that interval has the greatest potential to contain other points. (Any space to the left of x1 is wasted, since x1 is the smallest number in the x list.)

    Describe a greedy algorithm that uses this idea to obtain the entire collection of intervals. It is not necessary to prove that your algorithm works, but make sure that it makes sense in some examples, so that it might work in general. A correctly designed algorithm along these lines does work in general.

    Scan the intervals from left to right. Select each one if it is the rightmost interval that contains a point that is not contained in the previous intervals, or in the next interval. Here is the algorithm. For convenience, make um+1 = infinity.

        output list = empty list
        j = 1
        For i = 1,...,m
          If xj < ui+1 then
            Add ui to the output list.
            While j <= n and xj <= ui+1 do
              j = j + 1
            end while
            If j > n then stop the program, and produce the output list
          end if
        end for
      

  5. The longest increasing subsequence problem is as follows. The input is a list of n numbers. The problem is to remove some members of the list, without reordering the list, to end up with a list that is strictly increasing, and that is as long as possible. For example, if the input is 3,4,2,6,9,5,10 then the longest increasing subsequence is 3,4,6,9,10, of length 5, obtained by removing the 2 and 5. (There can be more than one longest increasing subsequence; any one that has the longest possible length is acceptable.)

    There is a scan algorithm for the longest increasing subsequence problem. To keep the problem simple, suppose that the goal is to find only the length of the longest increasing subsequence, not the sequence itself. At each point, what is remembered about the list seen so far is an array L, where L[k] is the smallest value that can be the last value of an increasing subsequence of length k. A length z of array L is also kept. For example, after reading sequence 3,4,2,6,9,5,10 the array L would hold the following numbers, and z would be 4. (The sequence that L[k] refers to is also shown for explanation, but it is not remembered by the algorithm.)

                  k          L[k]        (sequence)
                -----        ----        ----------
                  1           2           2
                  2           4           3,4
                  3           5           3,4,5
                  4          10           3,4,5,10
      
    Notice that the sequence for L[k] has length k, and ends on L[k].

    1. Explain how to update the array L when another value is added to the end of the current sequence. For example, what will L hold if number 7 is added, so that the input is 3,4,2,6,9,5,10,7? What if the next number were 3? What if it were 12? Suppose that the next input is X and give a general algorithm for updating the L array. It must work for any next input, and any current content of array L. Give a clear description that can be followed. Note that new entries sometimes need to be added to the end. It suffices to add 1 to z to add a new slot to the array.

      If the next number is x, search the L array for the first number L[k] that is >= x. Replace that member by x. (Think of x as being added to the end of the sequence on the prior line, k-1. This gives you a better sequence of length k.) If all members in the L array are less than x, then add a new member to the list by setting z = z + 1 and then setting L[z] = x. This extends the longest sequence that you have so far.

    2. In the worst case, how long does your update algorithm take to add the k-th number in the sequence, to within a constant factor? Your answer should be in terms of k. Explain briefly.

      When the k-th number is added, the longest increasing sequence seen so far has length at most k-1. So it takes O(k-1) time to scan the L array.

    3. How long does the scan algorithm take to scan the entire sequence of n numbers, to within a constant factor, in the worst case?

      0 + 1 + 2 + ... + (n-1 = O(n2).

    4. Explain briefly how the update algorithm can be improved to process the k-th number in time O(log(k)).

      Use a binary search instead of a linear search. The numbers in the L array have to be in ascending order, so you can do a binary search to find the smallest one that is >= x.