Biological Frameworks

Environmental Engineering Module

Homework #4

As necessary, refer to previous class materials, recommended reading(s)/web resources, and the attached chapter on molecular systematics for background information.

Suggested Readings

Woese, C.R. 1987. Bacterial evolution. Microbiol. Rev. 51: 221-271

Doolittle, W.F. 1999. Phylogenetic classification and the universal tree. Science 284: 2124-2128

As introducedi in the first module of the course, comparative analysis of DNA sequence (or of the translated product protein) has become a fundamental tool in basic and applied biological sciences. Your final homework question asks you to consider the general themes of sequence conservation, organismal distribution, and genealogy in developing a more comprehensive appreciation of genetic mechanisms that foster metabolic diversity among bacteria.

Although bacteria are not sexual organisms in the sense of many eukaryotes, they have developed a variety of very effective mechanisms for transferring large (>50 Kb) and small (< 1Kb) blocks of DNA between genetically disparate organisms (e.g., transfer between domains has been documented). However, in contrast to this genetic promiscuity, a number of genes have also been used to infer the geneology of bacteria. Most notably, the RNA components of the ribosome (the ribosomal rRNAs) have received greatest application. Sequence divergence of these RNA polymers is generally thought to reflect the evolutionary divergence of the corresponding bacterial species. Thus, comparative sequencing of rRNAs (usually the 16S rRNAs) is now commonly used to identify and relate microorganisms.

However, today there is also greater controversy than several years ago concerning the utility of comparative gene sequencing to infer microbial genealogy, in particular to determine the order of divergence of the major divisions of life. This controversy is mostly a consequence of the completion of sequencing a large number of microbial genomes. For the first time biologists have a complete inventory of the genetic makeup of a variety of life forms. Comprehensive comparative genome analyses has revealed that many genes have been inherited through lateral transfer (exchange of genes between established lineages) rather than through vertical descent (parent to daughter cell, to daughter cell, etc.). The most extreme revisionist viewpoint is that there is no single molecular phylogeny that can fully describe the divergence of life of earth.

At the extremes, we could suggest that there are two types of genes: 1) those that are never transferred between lineages, and 2) those that are highly promiscuous, easily moved between lineages (via one of several alternative general exchange mechamisms) and functioning within the receiving bacterium. The reality is something between these two extremes for most genes.

Consider the following comparative analysis of two genes derived from a collection of 9 phenol-degrading organisms. These phenol degrading bacteria were isolated from the surface soil of a contaminated trucking facility and characterized by sequencing their 16S rRNAs and a key enzyme in the metabolic pathway for phenol degradation, phenol hydroxylase. Phenol hydroxylase is a monooxygenase that inserts a single atom of oxygen adjacent to the ring hydroxyl of phenol. The resulting product, catechol, is further degraded by a pathway common to most organisms capable of degrading aromatic polutants (refer to your previous handouts that show catechol as a key intermediate in aromatic compound degradation). You are provided three types of information for each set of genes:

- A sequence alignment (recall your earlier module using the BLAST algorithm provided by the NCBI web interface)

- A similariity matrix, the % of identical residues between each pair of aligned sequences. For each matrix, consider only the % sequence identity values provided in the upper right half.

- A phylogenetic tree inferred from the similarity values. These trees are the best fit of the similarity data (or an evolutionary distance estimate derived from these data). Only the horizontal distances of the tree segments reflect the estimated divergence value between pairs of sequences. For the example below, the distance between organisms A & C is 6 units.

Only one phylogenetic tree is provided for the 16S rRNA sequence comparisons and closely reflects the data provided in the corresponding similarity matrix.

Question 1a. Which of the alternative trees provided for phenol hydroxylase most closely reflects the similarity data? Provide specific data from the similarity matrix to support your answer.

Question 1b. Compare the best fit phenol hydroxylase tree to that inferred from the 16S rRNA alignment. Do the trees show the same topology? If not, what might account for the differences? If they share a common topology, what is the significance with respect to the evolution of new catabolic pathways for aromatic pollutant degradation among different bacterial lineages using this collection (family) of oxygenases? Be as specific as possible in your answer, e.g., by determining whether inheritance of the phenol hydroxylase by each named species was likely via lateral gene transfer or speciation

  1. The following are important attributes of a gene that could serve as a useful marker for phylogeny inference.

- Not subject to lateral gene transfer

- High sequence conservation

- Of sufficient length (sufficieint number of residues) to provide for reasonable statistical treatment.

- Share common cellular function.

Question 2a. Compare and contrast the 16S rRNA and the phenol hydroxylase (or similar oxygenase) with respect to each of the above attributes. Be specific, considering the following

-What specific cellular factors would restrict the exchange (or allow exchange) between genetically disparate lineages?

-What determines conservation of sequence?

-How might specific function (e.g., enzyme substrate) diverge or be retained?

Question 2b. Using appropriate resources, suggest two other genes that would likely serve as stable markers for phylogeny inference. Justify your answer using the general criteria listed above.

Sequence alignment of partial 16S rDNA sequences

* 20 * 40 * 60 * 80 n1.16S : CCTACGGGAGGCAGCAGTGGGGAATATTGGACAATGGGCGAAAGCCTGATCCAGCCATGCCGCGTGTGTGAAGAAGG-TCTTCGG : 84n5.16S : CCTACGGGAGGCAGCAGTGGGGAATATTGGACAATGGGCGAAAGCCTGATCCAGCCATGCCGCGTGTGTGAAGAAGG-TCTTCGG : 84n6.16S : CCTACGGGAGGCAGCAGTGGGGAATATTGGACAATGGGCGAAAGCCTGATCCAGCCATGCCGCGTGTGTGAAGAAGG-TCTTCGG : 84c4.16S : CCTACGGGAGGCAGCAGTGGGGAATATTGGACAATGGGCGAAAGCCTGATCCAGCAATGCCGCGTGTGTGAAGAAGG-CCTTCGG : 84n3.16S : CCTACGGGAGGCAGCAGTGGGGAATTTTGGACAATGGGCGCAAGCCTGATCCAGCAATGCCGCGTGCAGGATGAAGG-CCTTCGG : 84n7.16S : CCTACGGGAGGCAGCAGTGGGGAATTTTGGACAATGGGCGCAAGCCTGATCCAGCAATGCCGCGTGCAGGATGAAGG-CCTTCGG : 84P5.16S : CCTACGGGAGGCAGCAGTGGGGAATTTTGGACAATGGGCGCAAGCCTGATCCAGCAATGCCGCGTGCAGGATGAAGG-CCTTCGG : 84n9.16S : CCTACGGGAGGCAGCAGTGGGGAATTTTGGACAATGGGCGAAAGCCTGATCCAGCCATGCCGCGTGCAGGATGAAGG-CCTTCGG : 84p6.16S : CCTACGGGAGGCAGCAGTGACGAATATTGGTCAATGGGCGAGAGCCTGAACCAGCCAAGTCGCGTGAAGGAAGAAGGATCTATGG : 85 * 100 * 120 * 140 * 160 * n1.16S : ATTGTAAAGCACTTTAAGTTGGGAGGAAGGGC-AGTAAGTTAATACCTTGCTGTTTTGACGTTACCGACAGAATAAGCACCGGCT : 168n5.16S : ATTGTAAAGCACTTTAAGTTGGGAGGAAGGGC-AGTAAGCGAATACCTTGCTGTTTTGACGTTACCGACAGAATAAGCACCGGCT : 168n6.16S : ATTGTAAAGCACTTTAAGTTGGGAGGAAGGGC-AGTAACCTAATACGTTGCTATTTTGACGTTACCGACAGAATAAGCACCGGCT : 168c4.16S : GTTGTAAAGCACTTTAGGCTGGAAAGAAAAAG-CTTTGGCTAATATCCAAA-GCCTTGACGGTACCAGCAGAATAAGCACCGGCT : 167n3.16S : GTTGTAAACTGCTTTTGTACGGAACGAAAAGGCTCTCT-CTAATACAGAGAGCCGATGACGGTACCGTAAGAATAAGCACCGGCT : 168n7.16S : GTTGTAAACTGCTTTTGTACGGAACGAAAAGGCTCTCT-CTAATACAGAGAGCCGATGACGGTACCGTAAGAATAAGCACCGGCT : 168P5.16S : GTTGTAAACTGCTTTTGTACGGAACGAAAAGACTCTCT-CTAATACAGGGGGTTGATGACGGTACCGTAAGAATAAGCACCGGCT : 168n9.16S : GTTGTAAACTGCTTTTGTACGGAACGAAAAGA-TCTCTTCTAATAAAGGGGGTCCATGACGGTACCGTAAGAATAAGCACCAGCT : 168p6.16S : TTCGTAAACTTCTTTTGCAGGGGAATATAGTGCAGGA--C----G-TGTCCTGTTTTGTATGTACCCTGAGAATAAGGATCGGCT : 163 180 * n1.16S : AACTCTGTGCCAGCAGCCGCGGTAAT : 194n5.16S : AACTCTGTGCCAGCAGCCGCGGTAAT : 194n6.16S : ACCTCCGTGCCAGCAGCCGCGGTAAT : 194c4.16S : AACTCTGTGCCAGCAGCCGCGGTAAT : 193n3.16S : AACTACGTGCCAGCAGCCGCGGTAAT : 194n7.16S : AACTACGTGCCAGCAGCCGCGGTAAT : 194P5.16S : AACTACGTGCCAGCAGCCGCGGTAAT : 194n9.16S : AACTACGTGCCAGCAGCCGCGGTAAT : 194p6.16S : AGCTCCGTGCCAGCAGCCGCGGTAAT : 189

Similarity-matrix of partial 16S rDNA sequences

n1.16S n5.16S n6.16S c4.16S n3.16S n7.16S P5.16S n9.16S p6.16S

n1.16S 194 98% 96% 86% 77% 77% 78% 78% 72%

n5.16S 192 194 96% 86% 77% 77% 78% 78% 72%

n6.16S 188 188 194 84% 78% 78% 78% 78% 72%

c4.16S 167 167 163 193 82% 82% 80% 81% 69%

n3.16S 152 152 153 161 194 100% 97% 94% 71%

n7.16S 152 152 153 161 194 194 97% 94% 71%

P5.16S 153 153 154 157 189 189 194 95% 71%

n9.16S 152 152 153 158 184 184 187 194 70%

p6.16S 142 143 143 136 139 139 139 138 189

Sequence alignment of partial phenyl-hydroxylase protein sequences

* 20 * 40 * 60
N6.PRO : IDELRHVQTQVHAMSHYNKHFNGLHDFSHMHDRVWFLSVPKSFFDDARTAGPFEFLTAISFSFEYVLTN : 69
P6.PRO : IDELRHVQTQVHAMSHYNKHFNGLHDFAHMHDRVWFLSVPKSFFDDARSAGPFEFLTAISFSFEYVLTN : 69
N1.PRO : IDELRHVQTQVHAMSHYNKHFNGLHDFAHMHDRVWFLSVPKSFFEDARTAGPFEFLTAISSSSEYVLTN : 69
N5.PRO : IDELRHVQTQVHAMSHYNKHFNGLHDFAHMHDRVWFLSVPKSFFEDARTAGPFEFLTAISSSFEYVLTN : 69
N9.PRO : IDELRHVQTQVHAMSHYNKHFDGLHDFAHMYDRVWYLSVPKSYMDDARTAGPFEFLTAVSFSFEYVLTN : 69
C4.PRO : IDELRHAQTQAHTISHYNKFFNGLHDYTHMHDRVWYLSVPKSYFEDAMTAGPFEFVTAISFSFEYVLTN : 69
N7.PRO : IDELRHFQTETHALSHYNKYFNGLHSATQWYDRVWFLSVPKSFFEDAMTAGPFEFLTAVSFSFEYVLTN : 69
N3.PRO : IDELRHFQTETHALSHYNKYFNGLHSATQWYDRVWFLSVPKSFFEDAMTAGPFEFLTAVSFSYEYVLTN : 69
P5.PRO : IDELRHFQTETHALSHYNKYFNGLHNATQWYDRVWFLSVPKSFFEDAMTAGPFEFLTAVSFSFEYVLTN : 69

Similarity-matrix of partial protein-sequences

N6.PRO P6.PRO N1.PRO N5.PRO N9.PRO C4.PRO N7.PRO N3.PRO P5.PRO

N6.PRO 69 97% 94% 95% 89% 82% 79% 78% 79%

P6.PRO 67 69 94% 95% 89% 81% 78% 76% 78%

N1.PRO 65 65 69 98% 86% 81% 78% 78% 78%

N5.PRO 66 66 68 69 88% 82% 79% 78% 79%

N9.PRO 62 62 60 61 69 79% 76% 75% 76%

C4.PRO 57 56 56 57 55 69 78% 76% 78%

N7.PRO 55 54 54 55 53 54 69 98% 98%

N3.PRO 54 53 54 54 52 53 68 69 97%

P5.PRO 55 54 54 55 53 54 68 67 69

Tree P1

Tree P2

Tree P3