Assignment 1 – Evolution Trees

Dean ZellerDue: Wednesday, March 15th by 9:00 pm

CS1005110 points

Spring, 2006

ObjectiveThe student will use a graphics package to create diagrams of binary evolution trees.

Background

This assignment deals with the cutting-edge topic of bioinformatics, also called computational biology. It is a complex field of graph theory with applications to mathematics, computer science, and genetics. An evolutionary tree (or phylogeny) is a tree-structure the demonstrates evolution of species over time. A tree consists of vertices (nodes) connected by edges (connections). In evolution, a node represents a point in which a species population “splits” into two genetically different species. Once a species splits, the two species created are genetically unable to produce offspring. Nodes not producing offspring are called the leaves of the tree, representing the extant (non-extinct) species. Two trees are isomorphic if they contain the same structure and are different only through symmetry of a node. In order to simplify the problem, isomorphic trees are considered the same for purposes of this research.

In order to easily show trees, they are given a text description called a classification label. This allows a text label instead of a visual picture to represent a tree structure. A good classification system has a unique name for each tree. For this assignment, the classification system is simply a listing of the number of nodes at each level. Trees A, B, C, and D above have the label (1242), indicating the first level has one node, the second has two nodes, the third has four nodes, and the fourth has two. Since the four trees are isomorphic, the same label represents all four trees. Ultimately, this classification system is incomplete. While simple to understand, it does not provide unique names for each tree at the higher levels. Trees E and F are not isomorphic, and thus are given separate labels (1244a and 1244b). This system will suffice for now, but can get confusing as the size of the trees increase.

Introductory study of phylogenies makes another assumption that greatly simplifies the problem. For purposes of this assignment, the only non-leaf structure possible are nodes with exactly two offspring, representing a point in time in which a population “splits” into two populations. A node with a single offspring is called a redundant node and does not significantly add to the tree structure. Nodes with three or more children can be isomorphically approximated, and thus can be ignored at this stage in the research. While all nodes in evolution trees are unique species, at this point only the leaf nodes need to be considered.

Task 1 – Draw Given Trees (5 points)

Given below are thirteen evolution trees. This represents the complete set of all non-isomorphic evolution trees of up to six leaves (11 nodes). Use a graphics package to recreate these trees. Give the classification for the tree and label its leaves with successive letters (a, b, c, etc…) Your tree style may differ from the trees below, but the structure must be correct.

Task 2 – Generate Trees (5 points)

Use a graphics package to create the eleven isomorphically unique evolutionary trees of 7 leaves (13 nodes). The following labels are the classifications for the unique trees: 1222222, 122224, 122242, 122422, 12244a, 12244b, 124222, 12424, 12442a, 12442b, and 1246.

Grading:

You will be graded on the following criteria:

AccuracyCorrectly drawing the evolutionary trees, attention to detail.

Extra Credit:

Extra credit will be given for including the following:

  • Create the sixteen 8-leaf trees: 12222222, 1222224, 1222242, 1222422, 1224222, 1242222, 124224, 124242, 124422a, 124422b, 12444aa, 12444ab, 12444ba, 12444bb, 12462a, 12462b, 1248.
  • Create the 9 leaf (17 node) non-isomorphic trees and classifications.
  • Create the trees without the redundant node and isomorphic approximation assumptions, allowing for any number of offspring from a node.

Assignment 2 – Phylogenetic Distance Graphs

Dean ZellerDue: Wednesday, March 22nd by 9:00 pm

CS1005110 points

Spring, 2006

ObjectiveThe student will construct distance graphs (k-leaf roots) for a series of evolution trees.

Background: Distance graphs, cliques

As an alternate to phylogenies, evolution can be modeled by a series of distance graphs, (also called k-leaf roots). Distance graphs contain visual information indicating the distances between species. The tree leaves serve as the graph vertices. Edges within the distance graph indicate the distance between the two species is no more than a specified k-value. Below each distance graph are the corresponding cliques within the graphs. A clique is defined as a subset of vertices such that all members are connected to all other members within the clique. Cliques can overlap with other cliques.

Lab Notes (5 points)

  1. Use a word processor to create a table similar to the example below.
  2. Set your document Page Setup to Landscape to give more space widthwise.
  3. The first cell in each row should contain the phylogenies from assignment 7.
  4. The column header should contain increasing k-values from 2 up to the number of leaves in the tree.
  5. Within each row…
  6. Draw the distance graphs for increasing k-values. Start at k=2 and end when the graph is fully connected. Draw a left arrow () for distance graphs that show no change from the previous k-value. (You do not need to use a graphics package to create these graphs, but you are being graded on neatness and readability. At minimum, use a ruler to draw straight lines.)
  7. Indicate the cliques created within each distance graph.

Lab Report (5 points)

Answer the following questions using a word processor and/or graphics package. Answer questions 1-3 before doing assignment 8, and questions 4-5 after.

  1. Create a phylogeny with 10 leaves. Give the corresponding distance graphs as completed in the lab notes.
  2. Same as question 1, but for a 13-leaf tree.
  3. Same as question 1, but for a 15-leaf tree.
  4. Review the distance graphs in your notes and questions 1-3. Which distance graphs do you think contain the most useful information for geneticists. Use examples in your answer.
  5. In your process of creating the phylogenies and distance graphs, you have learned a cutting-edge technique of phylogeny reconstruction. Write and answer your own question dealing with the work completed in this lab. Indicate conclusions you can reach that was not mentioned in the assignment.

Grading:

You will be graded on the following criteria:

AccuracyCorrectly drawing the evolutionary trees.

Graphic DesignAttention to detail in the diagrams.

Extra Credit:

Extra credit will be given for any of the following:

  • Use a graphics package for all graphs.
  • Create more complex phylogenies for the report.

2 leaves (3 vertices)
phylogeny / k = 2 / k = 3 / k = 4 / k = 5
3 leaves (5 vertices)
phylogeny / k = 2 / k = 3 / k = 4 / k = 5
4 leaves (7 vertices)
phylogeny / k = 2 / k = 3 / k = 4 / k = 5
5 leaves (9 vertices)
phylogeny / k = 2 / k = 3 / k = 4 / k = 5

Assignment 3 – Phylogeny Reconstruction

Dean ZellerDue: Wednesday, March 22nd by 9:00 pm

CS1005110 points

Spring, 2006

ObjectiveThe student will demonstrate an algorithm of phylogeny reconstruction from the results of experiments.

Background: Experiments, Reconstruction Algorithm

Genetic experiments can produce a wide range of results based on the similarities and differences between species. The experiment used for this assignment computes an integer number representing the distance between the given species. At this point, the experiments is completely hypothetical, but plausible enough to be converted from existing tests. The input to the test is two species, and the output is the distance between the species in the resulting phylogeny.

In order to use the reconstruction algorithm, is it helpful to create a summary table of the data. The table diagonal should be all 0’s, and the table itself will be diagonally symmetric. Reconstructing the phylogeny starts with the species pairs with a distance of k=2, identifying the direct neighbors within the tree. At k=3, connect branches to the logical pairs. Build the tree up from there until all leaves are connected.

Instructions

Answer the following questions using a word processor and/or graphics package.

  1. Reconstruct the phylogeny from data table 1. Draw the distance graphs and phylogenies for each k-value until the tree is complete. The reconstruction has been done for you – just follow the diagrams.
  2. Reconstruct the phylogeny from data table 2. Draw the distance graphs and phylogenies for each k-value until the tree is complete. The reconstruction has been done for you – just follow the diagrams.
  3. Reconstruct the phylogeny from data table 3. Draw the distance graphs and phylogeny at each stage until the tree is complete.
  4. Reconstruct the phylogeny from data table 4. Draw the distance graphs and phylogeny at each stage until the tree is complete.
  5. Create your own example data of at least 10 species and reconstruct the phylogeny. In order to make sure your data can correctly generate a tree, create the tree beforehand and generate the distance data.

Grading:

You will be graded on the following criteria:

AccuracyCorrectly drawing the evolutionary trees and distance graphs.

Graphic DesignAttention to detail in the diagrams.

Extra Credit:

Extra credit will be given for any of the following:

  • Create more complex examples of phylogeny reconstruction.
  • Use actual genetic data in your examples.

Data Table 1
a / b / c / d / e / f
a / 0 / 2 / 3 / 5 / 6 / 6
b / 0 / 3 / 5 / 6 / 6
c / 0 / 4 / 5 / 5
d / 0 / 3 / 3
e / 0 / 2
f / 0

Phylogeny reconstruction complete at k=4.

Data Table 2
a / b / c / d / e / f / g / h / i
a / 0 / 2 / 3 / 4 / 6 / 7 / 8 / 8 / 6
b / 0 / 3 / 4 / 6 / 7 / 8 / 8 / 6
c / 0 / 3 / 5 / 6 / 7 / 7 / 5
d / 0 / 4 / 5 / 6 / 6 / 4
e / 0 / 3 / 4 / 4 / 4
f / 0 / 3 / 3 / 5
g / 0 / 2 / 6
h / 0 / 6
i / 0

Phylogeny reconstruction complete at k=4.

Data Table 3

a / b / c / d / e / f / g / h / i / j / k / l / m / n / o
a / 0 / 2 / 4 / 6 / 7 / 7 / 5 / 6 / 6 / 9 / 9 / 8 / 8 / 8 / 6
b / 0 / 4 / 6 / 7 / 7 / 5 / 6 / 6 / 9 / 9 / 8 / 8 / 8 / 6
c / 0 / 4 / 5 / 5 / 3 / 6 / 6 / 9 / 9 / 8 / 8 / 8 / 6
d / 0 / 3 / 3 / 3 / 8 / 8 / 11 / 11 / 10 / 10 / 10 / 8
e / 0 / 2 / 4 / 9 / 9 / 12 / 12 / 11 / 11 / 11 / 9
f / 0 / 4 / 9 / 9 / 12 / 12 / 11 / 11 / 11 / 9
g / 0 / 7 / 7 / 10 / 10 / 9 / 9 / 9 / 7
h / 0 / 2 / 7 / 7 / 6 / 6 / 6 / 4
i / 0 / 7 / 7 / 6 / 6 / 6 / 4
j / 0 / 2 / 3 / 5 / 5 / 5
k / 0 / 3 / 5 / 5 / 5
l / 0 / 4 / 4 / 4
m / 0 / 2 / 4
n / 0 / 4
o / 0

Data Table 4

a / b / c / d / e / f / g / h / i / j / k / l / m / n / o / p
a / 0 / 2 / 4 / 4 / 6 / 6 / 6 / 6 / 8 / 8 / 8 / 8 / 8 / 8 / 8 / 8
b / 0 / 4 / 4 / 6 / 6 / 6 / 6 / 8 / 8 / 8 / 8 / 8 / 8 / 8 / 8
c / 0 / 2 / 6 / 6 / 6 / 6 / 8 / 8 / 8 / 8 / 8 / 8 / 8 / 8
d / 0 / 6 / 6 / 6 / 6 / 8 / 8 / 8 / 8 / 8 / 8 / 8 / 8
e / 0 / 2 / 4 / 4 / 8 / 8 / 8 / 8 / 8 / 8 / 8 / 8
f / 0 / 4 / 4 / 8 / 8 / 8 / 8 / 8 / 8 / 8 / 8
g / 0 / 2 / 8 / 8 / 8 / 8 / 8 / 8 / 8 / 8
h / 0 / 8 / 8 / 8 / 8 / 8 / 8 / 8 / 8
i / 0 / 2 / 4 / 4 / 6 / 6 / 6 / 6
j / 0 / 4 / 4 / 6 / 6 / 6 / 6
k / 0 / 2 / 6 / 6 / 6 / 6
l / 0 / 6 / 6 / 6 / 6
m / 0 / 2 / 4 / 4
n / 0 / 4 / 4
o / 0 / 2
p / 0