Computational Methods in Molecular Biology
Spring 2006
Final homework assignment

Note:  These problems are intended to serve as a study
guid for the sorts of questions I'll be asking on the Final
Exam.

1.  Suppose that the matrix below gives the distances
between genes.  Use the UPGMA method to build
an (unrooted) phylogeny on these species.
 A B C D E A 0 11 21 53 60 B 0 13 10 17 C 0 33 44 D 0 62 E 0

2.  Suppose that these three sequences are on the leaves of
a phylogeny with structure ((1,2),3):
1:  AACCC
2:  CCAAA
3:  ACAAC
We are interested in finding an optimally parsimonious
set of subsequences of length k = 2 from each leaf
sequence.
a.  Find the W[] tables on the leaves.
b.  Find the W[] tables on the two internal nodes
c.  Find the X[] table on the edge between leaf 1 and its parent.
d.  Find B0, B1, B2, ... on that same edge.
a.
Leaf 1:  AA-0, AC-0, CA-infinity, CC-0
Leaf 2:   AA-0, AC-infinity, CA-0, CC-0
Leaf 3:   AA-0, AC-0, CA-0, CC-infinity

b.
Parent of leaves 1 and 2:  AA-0, AC-1, CA-1, CC-0
At the root:  AA-0, AC-1, CA-1, CC-1

c
AA-0, AC-0, CA-1, CC-0.

d.
B0 = {AA, AC, CC}
B1 = {CA}
All other B sets are empty.
3.  Build a "reasonable" secondary structure from the following
RNA sequence:  UCCAUAUUCUGGGCAAUUAUAAGCCU.
*  Hairpin loops may not have fewer than 3 bases
*  You get "-3" for each AU pairing, and "-4" for each CG pairing.
Try to make your score as low as possible.  Any fairly good
attempt will get full credit.
4.  Below is an array like the ones we were filling in during class, with
our RNA secondary structure prediction dynamic programming algorithm,
without the loop considerations, using the scoring system above.
Complete the top row, and describe how you found the rightmost value in
that row.  That is, list the values you considered, and which cases of the
dynamic program they came from.  Note that all of the values should be
thought of as negative...  I wrote them positively for clarity.
 A C G U C A U G U G C A 0 C 0 4 4 4 7 7 15 15 15 19 G 0 0 4 4 7 7 7 7 15 U 0 0 3 3 7 7 7 11 C 0 0 3 7 7 7 11 A 0 3 3 3 3 7 U 0 0 0 0 4 G 0 0 0 4 U 0 0 4 G 0 4 C 0

I corrected an entry below, highlighed in bold. It used to be a "13," but it should have been a "10," as it now appears.

 A C G U C A U G U G C A 0 0 4 7 7 7 10 14 18 18 22 C 0 4 4 4 7 7 15 15 15 19 G 0 0 4 4 7 7 7 7 15 U 0 0 3 3 7 7 7 11 C 0 0 3 7 7 7 11 A 0 3 3 3 3 7 U 0 0 0 0 4 G 0 0 0 4 U 0 0 4 G 0 4 C 0