BIO/CS 251 Bioinformatics final projectSpring 2006
Dr. James
Genehunt in a newly released fungal genome
Scientific objective:
You will use bioinformatic approaches to identify, map, and analyze the genes contained in an uncharacterized chunk of the genome of a dangerous pathogenic fungus, Histoplasma capsulatum. H. capsulatum is a dimorphic fungus that can exist in either filamentous or yeastlike form. This means it can form cells in long, branching chains (filamentous) or can switch its lifestyle to grow as single, yeastlike cells. This is the form it takes when it establishes infections in the lungs or other tissues, where it grows and spreads, causing great damage. More information on H. capsulatum and photographs of the organism can be found here:
http://botit.botany.wisc.edu/toms_fungi/jan2000.html
You will be issued a 50,000 bp (50 kb) segment of the recently sequenced genome of H. capsulatum. This genome sequencing effort was performed by the Broad Institute at Massachusetts Institute of Technology (MIT), as part of the Fungal Genome Initiative (FGI): http://www.broad.mit.edu/annotation/fungi/fgi/
The initial sequence release for this genome was in September 2005. The H. capsulatum genome sequence is so new that it has not yet been annotated, meaning no one has systematically identified and mapped all of its genes. Therefore, you may be the first human to discover the genes in this stretch of “virgin” DNA.
Project format:
Your portal of entry into the H. capsulatum genome is the Histoplasma capsulatum Database at http://www.broad.mit.edu/annotation/fungi/histoplasma_capsulatum/
This site is designed in much the same way as the Aspergillus nidulans website that you worked with in Laboratory 9 (http://www.broad.mit.edu/annotation/fungi/aspergillus_nidulans/). Refer to this lab exercise for general information about navigating the H. capsulatum website, and for discovering genes, etc.
You may choose one of the following 50 kb segments for your project:
1.Supercontig 1.8, nt 205000 – 255000 (rif1)
2.Sc 1.2, 340000 – 390000(cdc7)
3.Sc 1.4, 60000 – 115000(rad53)
4.Sc 1.7, 190000 – 240000(rap1)
5.Sc 1.1, 2965000 – 3115000(taz1)
6.Sc 1.7, 615000 – 665000(nimO)
Project objectives:
A.Locate and identify all of the bona fide genes in the 50 kb stretch of H. capsulatum genomic
DNA. Produce a scale map showing the following information:
1.Gene location and gene direction.
2.Number of exons composing each gene.
3.Identity, or probable identity of each gene. If the sequence is novel, list it as a novel hypothetical
protein.
B.For each gene,
1.Present the predicted amino acid sequence. If the gene is distributed into multiple exons, edit
the files so that they are merged to form a single, full-length protein sequence.
2.An alignment of the edited full-length protein sequence with (1) its closest relative (ortholog) in
Aspergillus nidulans, and (2) an ortholog that has a clear identity and function ascribed to it [(1)
and (2) may be the same alignment, or in the case where the best alignment is to a protein of
unknown function, show a second alignment to a protein of known function].
Make sure that the alignments contain the e-value + % identity and % similarity.
3.Identify any conserved domains, and briefly explain the nature and likely function of the
conserved domain.
4. List the name or names of the protein encoded by the gene, and a brief 2-3 line description
of the protein’s function.
C. tBlastn the H. capsulatum genome with each gene that you discovered:
1. Determine whether each gene is unique, or whether it is paralogous with other members of a
gene family.
2. Provide the output from each of the paralog searches, including alignments.
D.Choose one gene family identified in C. above and use it to create a phylogram.
What is the minimum number of gene duplication events leading to the chosen family of genes in
H. capsulatum?
E.Orthology search: Choose a newly discovered H. capsulatum gene that is unique, i.e., for which no
similar or paralogous genes exist in H. capsulatum, but for which a blastp search of GenBank
reveals orthologous genes from other species. Determine the approximate time of origin of your
chosen gene by establishing the range of organisms in which it can be found. In other words, is this
a gene that appears to fungal-specific? Present only in fungi and animals? Universal in eukaryotes
but absent from prokaryotes? or universal to all life?
Depending upon the level of conservation and degree of functional constraint, it may be necessary
to perform iterative blastp searches, in other words you may need to use the H. capsulatum gene
to obtain the orthologous worm protein, but then use the worm protein to find the human protein, etc.
1. Choose representative orthologs spanning the widest range of organisms possible, and present
a multiple alignment of the H. capsulatum gene the chosen orthologs.
2. Create a phylogram that includes all of the chosen orthologs.
F.Choose one conserved gene that has an ortholog in the budding yeast, Saccharomyces cerevisiae,
and perform in silico microarray analysis of the yeast ortholog, using the microarrays available at the
Saccharomyces Genome Database: as follows:
1.Search for the S. cerevisiae orthologs using the “Search SGD” box at
and then use the ‘GO annotation’ +
‘Function Junction’ to gather information about its function in budding yeast. This
information will be located on the SGD BASIC INFORMATION page for your chosen yeast
ortholog.
2.Genomic and proteomic analysis of one gene in Saccharomyces cerevisiae.
Your entry point for each of these questions will be the SGD
BASIC INFORMATION page for your chosen gene.
3.Is this gene essential in budding yeast? What is the phenotype of a systematic deletion, or
null allele?
4.Expression analysis (DNA microarray analysis): in yeast, DNA microarray analysis is like
performing 5000+ Northern blots simultaneously, on a surface no larger than a microscope
slide. This allows one to assay the expression of 5000+ different genes in a single
experiment.
Use the SGD to analyze the expression of your chosen gene. Use ‘Functional Analysis’ to
perform these analyses. Include in your answers the links to each microarray experiment.
How does the expression of the gene vary under the following conditions
that have been assayed for all or nearly all budding yeast genes?
-- expression in response to alpha factor? (treatment of cells with
alpha factor synchronizes them at G1 phase of the cell cycle).
-- expression in response to agents that damage DNA.
-- expression during diauxic shift (what is a diauxic shift?)
-- expression in response to environmental changes
-- expression during the cell cycle
-- expression during sporulation (= meiosis)
5.Protein-protein interaction: does your protein interact with any other proteins in
S. cerevisiae?
-- use the ‘Interactions’ on the BASIC INFORMATION page, and examine the BIND, DIP,
and GRID databases to learn if your protein has interacting partners. For each protein-
protein interaction, list the method used to detect the interaction (two-hybrid, affinity
chromatography, synthetic lethality, etc.)
--Go to the following site:
Click to ‘PATHCALLING’, then ‘YEAST DATABASE’, and enter your ‘Gene/Keyword’,
and click enter. If an entry is found for the gene, click on the link under “__ entry(ies)
found for that keyword”. This link will take you to a two-dimensional interaction map
showing the various interactions between your protein and its interacting partners.
-- print out the interaction map, and incorporate it into your final project report.
Protein families involved in secretory function in Aspergillus nidulans:
Your task is discover and characterize the following gene families according to their
function and phylogeny, using the guidelines presented above:
1. Rab family GTP-binding proteins: these are involved in membrane trafficking, i.e., docking
and fusion of transport vesicles and membranes
2.SNARE family of proteins: these interact with Rabs and with vesicles to facilitate
membrane fusion events.
3.Mannose-6-phosphate receptors: these act as address labels to ensure that lysosomal
enzymes synthesized in the ER and Golgi arrive at the correct destination (the lysosome).
4.KDEL-containing proteins: proteins carrying a C-terminal KDEL (Lys-Asp-Glu-Leu)
address label are confined to the lumen of the Endoplasmic Reticulum. They include a
diverse variety of enzymes and chaperones whose task is to modify and fold proteins that will
eventually be secreted from the cell. KDEL proteins often hop a ride to the Golgi Apparatus,
in which case these errant children are rounded up by KDEL receptor proteins, who escort
them back to the ER.
5.ARF family of GTP-binding proteins: these proteins act as adaptors to facilitate the
formation/pinching off of vesicles from the Golgi Apparatus.
6.Do filamentous fungi possess Clathrin-coated vesicles and Golgin proteins?
7.Identify and characterize Golgi-specific enzymes that function in protein glycosylation.