ECS 129
Assignment:Option1 (No programming)
Due:Thursday, February27, 2014
Genes and proteins
Problem1: Gene Finder
You have just sequenced a shot segment of DNA. You wish to analyze this DNA sequence to determine whether it could encode a protein.
5’ TCAATGTAACGCGCTACCCGGAGCTCTGGGCCCAAATTTCATCCACT 3’
1)Find the longest possible coding region (also called open reading frame, or ORF). Remember that there are six possibilities to check
2)Label which strand on the DNA will be the coding strand, and which will be the template strand when this DNA is transcribed
3)Transcribe this ORF into mRNA, indicating the 5’ and 3’ ends
4)Translate this mRNA into a protein sequence, indicating the N and C termini.
Problem 2: Identification of a disease
A short fragment of a protein has been identified as a marker for a human genetic disease. This fragment is:
GLASLGTPDEYIEKLAT
1)Identify the wild type human protein associated with this fragment. (Blast can be run at:
2)Obtain the one letter AA code sequence of the full wild type protein
3)Describe the single mutation between thewild type sequence and the marker given above.
4)This mutation has been associated with an inherited disorder. Find the name of that disorder, and write a small paragraph on the nature, and consequences of this disease.
5)Visit the Genome pages at Ensembl ( Search the Human Genome using the wild type sequence corresponding to the fragment of protein given above (make sure to use the wild type fragment of the same size; use BLASTP, and use “exact match”). Show the location(s) of the corresponding gene projected on a human karyotype.
Problem 3: Accessing and Analyzing Structures from the Protein Data Bank
The Protein Data Bank, or PDB, is the best place to obtain 3-Dimensional structures of proteins, nucleic acids and other macromolecules. From the PDB site you can locate proteins by keyword searching or by entering the PDB accession number for the structure file, like 5PTI. Details on the molecule (how the structure was determined, pertinent research articles, position of secondary structures, unusual amino acids, etc) can be found on the RCSB web site but also in the PDB file itself.
PDB files are just formatted text files, so you can open them in a text editor or even Word and read them. There is a wealth of information there! The molecular viewing programs use the ATOM records in the file, which contain residue name, residue number, atom name and atom number. Often, the structure files include other molecules besides the protein, such as water molecules, nucleic acid and bound ligands.
1)Find a protein that contains the unusual amino acid HYP. What is this amino acid?
2)We first look at the structure of an HIV protease. We choose the PDB file 1A30. Download this PDB file, and display its content using RASMOL or PYMOL. This structure shows HIV protease as a dimer (chain A and B), bound to a small tri-peptide (chain C, sequence EDL). Generate an image in which the ligand is shown in spacefilling mode (CPK), and the protein dimer is shown in cartoon mode. Save this picture in GIF or PNG format, and include it in your report
3)Now we look at the PDB file 1CWA, which contains the structure of the complex between two proteins. Find the protein names, sizes and secondary structure contents. Identify the non standard amino acids in chain C.
a)Describe briefly the importance of this structure
b)Generate a GIF or PNG image containing a CPK representation of the complex
Good Luck !