|
Graphs
are Models
Many notions in the sciences, from Astronomy to Zoology, use the notion of a graph to demonstrate relationships between objects. A graph consists of nodes and edges. ("nodes" are also called "vertices.") The nodes are typically objects of some sort, and the edges depict the relationships between the objects. The figure to the right is an example of a graph. This graph has 9 nodes and 8 edges. It depicts the relationship between 8 species of animal which are represented by the 8 nodes of the graph. The five species at the bottom are extant species, and the four others, labeled A, B, C and D are hypothetical ancestral species. The edges of this graph show "evolved from" relationships. For example, this graph asserts that Mouse and Rat both evolved from common ancestor A, which evolved from D. |
![]() |
||||||
![]() From http://www.mcb.mcgill.ca/~hallett/GEP/PLecture6/PLecture6.html |
Protein Map
The graph shown here is a protein interaction network. (Note: The link is to a technical paper which is outside the scope of what we'll be doing.) It shows several yeast proteins, as nodes of the graph, with an edge drawn between the proteins if they interact "enough," according to the definition used by the authors. The numbers on the edges are called weights. Weights on edges usually say something about the relationship between the nodes they connect. Follow the link to find out what those weights mean. We will say more about weights below. |
||||||
|
Social
Network
The graph to the right, which looks better once you click on the link, is a social interaction network describing the relationships among attendees at various conferences. So, as you can see, graphs are used in a wide variety of contexts. |
![]() From http://joi.ito.com/archives/images/xpWed16740328.gif |
||||||
![]() |
Abstract Graph The graph to the left is an abstract graph. That is, it is not intended to model any particular real-world situation. Since graphs are used so widely to model real-world situations, the more we know about graphs in general, the more we can infer about the situations they model. The study of graphs is called graph theory. An understanding of some basic notions in graph theory will be an aid to understanding the introductory material in the DIMACS BioMath program. |
||||||
Trees
A tree is a graph which is connected (it's possible to walk from any vertex to any other, along edges in the graph) but has no cycles. The graph above is connected, but has many cycles, so it is not a tree. The graph shown to the right is a tree. Note that it is connected, but that if you start at any node and walk along edges from node to node, it is impossible to get back where you started without retracing your steps. As we saw in the first graphic at the top of this page, trees are frequently used to model evolutionary relationships. |
![]() |
||||||
![]() |
Shortest Path Suppose we have 12 sites in a city (labeled A-L in the graph to the left) and that the travel time between nearby sites is shown as a number along the edge connecting those sites. Then we might get a figure like that shown to the left. A question we might want to answer is: What route from A to B yields the shortest travel time? ***Assumption***
Let's assume that our trip from
A to B may not backtrack. That is,
each step must be either "right" or "down." |
||||||
|
What Would Help? What would you like to know in order to answer the question above? Suppose I gave you the two pieces of information shown to the right.
|
Free Facts:
|
||||||
|
Conclusions:
|
From Smaller Cases to Larger Cases There are only two ways into vertex B: From vertex J or from vertex L. Any path from A to B must pass through either J or L, and, what is more important, the shortest path from A to B must build on a shortest path to whichever of J or L it passes through. We thus obtain the conclusions shown to the left. |
||||||
|
Memoization The "29" in the figure to the right comes from the "21" and "22" as described above. But how did the "21" and "22" arise? In just the same fashion! The "21" came from taking the smaller of "18+6" vs. "17+4," and the 22 came from taking the smaller of "17+5" vs. "18+8." And so on... We see the larger cases building on smaller cases. So if we start at the beginning with a "0" at A (since there is no distance from A to A) and put a memo at each vertex indicating the length of the shortest path from A to that vertex, we can easily generate all the numbers shown in red in the figure to the right, including the "29," which is what we really want to know. |
|
||||||
![]() |
More
Memoization
In addition to recording at each node the length of the shortest path to that node, we could also record where the shortest path came from. For example, when we found the "29," we could have highlighted the edge from J to B to make a memo of where that shortest path came from. If we do this, then recovering the actual shortest path, and not just its length, is easy. A glance at the figure to the left reveals that the shortest path from A to B is A->E->H->I->J->B. |
||||||
| Graph
Theory Links |
Chris Caldwell's
Tutorials |
||||||
|
Introduction
to Couting
Another topic that will help us with the BioMath is Counting. DNA can be thought of as a string of letters from the set {A, C, T, G}. For example, ACCCAGATTTGTCCCTGCGTGGGT could be a section of some part of a person's DNA. One thing we might wish to do is count how many DNA strings there are of a certain length. For example, the figure to the right shows the 16 strings of length 2. This is called a tree diagram. It shows the 4 possible ways to select the first letter of our string, and then for each of those ways, the four ways to select the second letter. We can conclude, from counting the number of boxes at the right side of this tree, that there are 16 DNA strings of length 2. |
![]() |
||||||
![]() |
Just Multiply Of course, you might have suspected that there is an easier way to count DNA strings of length 2. Here's one way to think of it: The string we are trying to build has 2 positions, and each position can be filled in 4 ways. That is, there are 4 ways to fill the first position, and 4 ways to fill the second position. (This is indicated in the tree diagram above, as the "start" node has 4 edges leaving it to the right, and each of those four nodes has 4 edges leaving it to the right.) So the shortcut comes from imagining what the tree diagram would look like for this counting problem. "Start would have four children, and each of those nodes would have four children, and four fours is sixteen." In other words, 4x4=16. |
||||||
The
Multiplication Rule
In general, if a task consists of several parts, and each part must be performed to complete the task, then the number of ways the task can be accomplished is the product of the number of ways to do each part. Thus, the number of ways to complete the task shown schematically to the right is: a × b × c × d × e × f |
![]() |
||||||
| Some counting links |
Counting LEGO towers Omega Math |
||||||
|
Probability
We won't need much in the way of probability, but a little bit by way of introduction wouldn't hurt. An experiment is any activity which produces some outcome. The set of all possible outcomes is called the outcome set, and an event is any subset of the outcome set. To the left are some examples to illustrate. Notice that in each case, the outcomes within each outcome set are equally likely. For example, in the case of three coin tosses, the outcome "HHH" is just as likely to be the outcome from tossing a coin three times as the outcome "THT." |
||||||
Equally
Likely Outcomes
The probability of any event is an indication of how likely that event is. A probability of "0" means the event will not occur (for example, rolling a die and getting a "7"). A probability of "1" means the event is certain (such as the event "Getting Heads or Tails" = {H, T} when tossing a coin one time). If the possible outcomes of an experiment are all equally likely, and there are n of them, then each outcome has probability 1/n, and the probabilty of an event consisting of k elements would be k/n. |
![]() The probability of getting heads exactly twice out of three tosses is 3/8. |
||||||
![]() |
Multiplication
Rule for Probability
Suppose we wish to generate a random DNA string. We can imagine a four-sided die bearing the letters A, C, G and T (representing the four nucleotides) which is tossed repeatedly until we have generated a "DNA" string of the desired length. Suppose we do this to generate a DNA string of length 5. What is the probability that "AAAAA" is the sequence generated? We can answer this question as follows: For the first nucleotide, there is a 1/4 chance of getting an "A." For the second nucleotide, the chances of getting an "A" is also 1/4. And so on for the other three nucleotides. Therefore the probability of getting exactly the string shown would be 1/4 x 1/4 x 1/4 x 1/4 x 1/4 = 1/1024 which is about 0.0009765625. In fact, 1/1024 is the probability that any particular given string appears in this way. |
||||||
| Additional Probability Links |
Some
interesting probability problems Counting in genetics |
||||||
|
Matrices A matrix is a rectangular array of numbers. Two examples of matrices are shown to the right. These are square matrices, meaning that they have the same number of rows as columns. But matrices may be non-square also, as shown below. That's it. That's all a matrix is. They're very easy to understand. And there are lots of interesting things one can do with a matrix. |
![]() ![]() Taken from http://www.algebralab.org/ |
||||||
![]() ![]() ![]() |
Matrix
Multiplication
We will use matrices to store information about the relationships between pairs of DNA sequences, pairs of amino acids or pairs of nucleotides. In some situations it will be useful to multiply together two of these matrices. This is a straightforward process which is explained very well on this page. Two matrices, B and C, are shown to the left, together with their product, BC. |