1. Suppose that the matrix
below gives the distances between genes. Use the UPGMA method to build an (unrooted) phylogeny on these species.



2. Suppose that these
three sequences are on the leaves of a phylogeny with structure ((1,2),3): 1: AACCC 2: CCAAA 3: ACAAC We are interested in finding an optimally parsimonious set of subsequences of length k = 2 from each leaf sequence. a. Find the W[] tables on the leaves. b. Find the W[] tables on the two internal nodes c. Find the X[] table on the edge between leaf 1 and its parent. d. Find B_{0}, B_{1}, B_{2}, ... on that same edge. 
a. Leaf 1: AA0, AC0, CAinfinity, CC0 Leaf 2: AA0, ACinfinity, CA0, CC0 Leaf 3: AA0, AC0, CA0, CCinfinity b. Parent of leaves 1 and 2: AA0, AC1, CA1, CC0 At the root: AA0, AC1, CA1, CC1 c AA0, AC0, CA1, CC0. d. B_{0} = {AA, AC, CC} B_{1} = {CA} All other B sets are empty. 

3. Build a "reasonable"
secondary structure from the following RNA sequence: UCCAUAUUCUGGGCAAUUAUAAGCCU. * Hairpin loops may not have fewer than 3 bases * You get "3" for each AU pairing, and "4" for each CG pairing. Try to make your score as low as possible. Any fairly good attempt will get full credit. 
Lots of possible answers. 

4. Below is an array like
the ones we were filling in during
class, with our RNA secondary structure prediction dynamic programming algorithm, without the loop considerations, using the scoring system above. Complete the top row, and describe how you found the rightmost value in that row. That is, list the values you considered, and which cases of the dynamic program they came from. Note that all of the values should be thought of as negative... I wrote them positively for clarity.

I corrected an entry below, highlighed in bold. It used to be a "13," but it should have been a "10," as it now appears.

