Bioinformatics Project: the Genehunt

BIO/CS 251 Bioinformatics final projectSpring 2006

Dr. James

Genehunt in a newly released fungal genome

Scientific objective:

You will use bioinformatic approaches to identify, map, and analyze the genes contained in an uncharacterized chunk of the genome of a dangerous pathogenic fungus, Histoplasma capsulatum. H. capsulatum is a dimorphic fungus that can exist in either filamentous or yeastlike form. This means it can form cells in long, branching chains (filamentous) or can switch its lifestyle to grow as single, yeastlike cells. This is the form it takes when it establishes infections in the lungs or other tissues, where it grows and spreads, causing great damage. More information on H. capsulatum and photographs of the organism can be found here:

http://botit.botany.wisc.edu/toms_fungi/jan2000.html

You will be issued a 50,000 bp (50 kb) segment of the recently sequenced genome of H. capsulatum. This genome sequencing effort was performed by the Broad Institute at Massachusetts Institute of Technology (MIT), as part of the Fungal Genome Initiative (FGI): http://www.broad.mit.edu/annotation/fungi/fgi/

The initial sequence release for this genome was in September 2005. The H. capsulatum genome sequence is so new that it has not yet been annotated, meaning no one has systematically identified and mapped all of its genes. Therefore, you may be the first human to discover the genes in this stretch of “virgin” DNA.

Project format:

Your portal of entry into the H. capsulatum genome is the Histoplasma capsulatum Database at http://www.broad.mit.edu/annotation/fungi/histoplasma_capsulatum/

This site is designed in much the same way as the Aspergillus nidulans website that you worked with in Laboratory 9 (http://www.broad.mit.edu/annotation/fungi/aspergillus_nidulans/). Refer to this lab exercise for general information about navigating the H. capsulatum website, and for discovering genes, etc.

You may choose one of the following 50 kb segments for your project:

1.Supercontig 1.8, nt 205000 – 255000 (rif1)

2.Sc 1.2, 340000 – 390000(cdc7)

3.Sc 1.4, 60000 – 115000(rad53)

4.Sc 1.7, 190000 – 240000(rap1)

5.Sc 1.1, 2965000 – 3115000(taz1)

6.Sc 1.7, 615000 – 665000(nimO)

Project objectives:

A.Locate and identify all of the bona fide genes in the 50 kb stretch of H. capsulatum genomic

DNA. Produce a scale map showing the following information:

1.Gene location and gene direction.

2.Number of exons composing each gene.

3.Identity, or probable identity of each gene. If the sequence is novel, list it as a novel hypothetical

protein.

B.For each gene,

1.Present the predicted amino acid sequence. If the gene is distributed into multiple exons, edit

the files so that they are merged to form a single, full-length protein sequence.

2.An alignment of the edited full-length protein sequence with (1) its closest relative (ortholog) in

Aspergillus nidulans, and (2) an ortholog that has a clear identity and function ascribed to it [(1)

and (2) may be the same alignment, or in the case where the best alignment is to a protein of

unknown function, show a second alignment to a protein of known function].

Make sure that the alignments contain the e-value + % identity and % similarity.

3.Identify any conserved domains, and briefly explain the nature and likely function of the

conserved domain.

4. List the name or names of the protein encoded by the gene, and a brief 2-3 line description

of the protein’s function.

C. tBlastn the H. capsulatum genome with each gene that you discovered:

1. Determine whether each gene is unique, or whether it is paralogous with other members of a

gene family.

2. Provide the output from each of the paralog searches, including alignments.

D.Choose one gene family identified in C. above and use it to create a phylogram.

What is the minimum number of gene duplication events leading to the chosen family of genes in

H. capsulatum?

E.Orthology search: Choose a newly discovered H. capsulatum gene that is unique, i.e., for which no

similar or paralogous genes exist in H. capsulatum, but for which a blastp search of GenBank

reveals orthologous genes from other species. Determine the approximate time of origin of your

chosen gene by establishing the range of organisms in which it can be found. In other words, is this

a gene that appears to fungal-specific? Present only in fungi and animals? Universal in eukaryotes

but absent from prokaryotes? or universal to all life?

Depending upon the level of conservation and degree of functional constraint, it may be necessary

to perform iterative blastp searches, in other words you may need to use the H. capsulatum gene

to obtain the orthologous worm protein, but then use the worm protein to find the human protein, etc.

1. Choose representative orthologs spanning the widest range of organisms possible, and present

a multiple alignment of the H. capsulatum gene the chosen orthologs.

2. Create a phylogram that includes all of the chosen orthologs.

F.Choose one conserved gene that has an ortholog in the budding yeast, Saccharomyces cerevisiae,

and perform in silico microarray analysis of the yeast ortholog, using the microarrays available at the

Saccharomyces Genome Database: as follows:

1.Search for the S. cerevisiae orthologs using the “Search SGD” box at

and then use the ‘GO annotation’ +

‘Function Junction’ to gather information about its function in budding yeast. This

information will be located on the SGD BASIC INFORMATION page for your chosen yeast

ortholog.

2.Genomic and proteomic analysis of one gene in Saccharomyces cerevisiae.

Your entry point for each of these questions will be the SGD

BASIC INFORMATION page for your chosen gene.

3.Is this gene essential in budding yeast? What is the phenotype of a systematic deletion, or

null allele?

4.Expression analysis (DNA microarray analysis): in yeast, DNA microarray analysis is like

performing 5000+ Northern blots simultaneously, on a surface no larger than a microscope

slide. This allows one to assay the expression of 5000+ different genes in a single

experiment.

Use the SGD to analyze the expression of your chosen gene. Use ‘Functional Analysis’ to

perform these analyses. Include in your answers the links to each microarray experiment.

How does the expression of the gene vary under the following conditions

that have been assayed for all or nearly all budding yeast genes?

-- expression in response to alpha factor? (treatment of cells with

alpha factor synchronizes them at G1 phase of the cell cycle).

-- expression in response to agents that damage DNA.

-- expression during diauxic shift (what is a diauxic shift?)

-- expression in response to environmental changes

-- expression during the cell cycle

-- expression during sporulation (= meiosis)

5.Protein-protein interaction: does your protein interact with any other proteins in

S. cerevisiae?

-- use the ‘Interactions’ on the BASIC INFORMATION page, and examine the BIND, DIP,

and GRID databases to learn if your protein has interacting partners. For each protein-

protein interaction, list the method used to detect the interaction (two-hybrid, affinity

chromatography, synthetic lethality, etc.)

--Go to the following site:

Click to ‘PATHCALLING’, then ‘YEAST DATABASE’, and enter your ‘Gene/Keyword’,

and click enter. If an entry is found for the gene, click on the link under “__ entry(ies)

found for that keyword”. This link will take you to a two-dimensional interaction map

showing the various interactions between your protein and its interacting partners.

-- print out the interaction map, and incorporate it into your final project report.

Protein families involved in secretory function in Aspergillus nidulans:

Your task is discover and characterize the following gene families according to their

function and phylogeny, using the guidelines presented above:

1. Rab family GTP-binding proteins: these are involved in membrane trafficking, i.e., docking

and fusion of transport vesicles and membranes

2.SNARE family of proteins: these interact with Rabs and with vesicles to facilitate

membrane fusion events.

3.Mannose-6-phosphate receptors: these act as address labels to ensure that lysosomal

enzymes synthesized in the ER and Golgi arrive at the correct destination (the lysosome).

4.KDEL-containing proteins: proteins carrying a C-terminal KDEL (Lys-Asp-Glu-Leu)

address label are confined to the lumen of the Endoplasmic Reticulum. They include a

diverse variety of enzymes and chaperones whose task is to modify and fold proteins that will

eventually be secreted from the cell. KDEL proteins often hop a ride to the Golgi Apparatus,

in which case these errant children are rounded up by KDEL receptor proteins, who escort

them back to the ER.

5.ARF family of GTP-binding proteins: these proteins act as adaptors to facilitate the

formation/pinching off of vesicles from the Golgi Apparatus.

6.Do filamentous fungi possess Clathrin-coated vesicles and Golgin proteins?

7.Identify and characterize Golgi-specific enzymes that function in protein glycosylation.