BIT150 – Fall 2009
Homework 9
Due on Tuesday December1st by email to the TA as Hwk9_Lastname
1. 35 points Follow the steps we used in Lab9 to prepare the directory structure needed to finally run Phred, Phrep, Consed, and PolyPhred with the mountain pine (Pinus mugo) files located in your folder in ‘plantgenome’, into the Hw9 folder. The identification name of the locus is 0_771_01.
Note: you will not need to run the fasta2Phd.perl script to create a phd.1 file (you will just create an empty phd_dir). Simply export the consensus generated during the phredPhrap run AFTER completion.
Run Phred and Phrap.
1.1. What does Phred do?
1.2. What does Phrap do?
Run Consed.
1.3. What is Consed needed for?
Open the ace1 file.
1.4. How many contigs are generated?
1.5. How many reads are included in each contig?
2.50 points Now identify polymorphisms.
Run PolyPhred with the homozygote snp flag, with tagged polymorphisms, indel flag, and with 50 bp flanking each side. Make sure to direct the PolyPhred output to an output file.
2.1. What does PolyPhred do? In general, what does PolyPhred consider to determine a true positive SNP (HINT:
2.2. What file is modified by PolyPhred and read by Consed to display SNP positions graphically?
2.3. What is a SNP? What is an INDEL?
2.4. According to the PolyPhred results, which contig(s) contain(s) SNP identifications?
2.5. Copy and paste the SNP information of the “POLY” section of the PolyPhred report (output file) here.
2.6. How many total SNPs are identified? How many total indels? What are their consensus positions?
2.7. How many SNPs have a score greater than 80?
2.8. What rank is a score of 80 and what color is the tag for this rank?
2.9. Attach here a screen shot (using Shift + Print Screen) of one Consed view of a tracefile with a correct SNP identification.
3.15 points Annotate the consensus sequence.
3.1. Export the consensus sequence(s) of the contig(s) that contain(s) SNP identifications.
3.2. What is the length of the consensus sequence (base pairs)
3.3. Use BLASTX to annotate the consensus sequence(s). Report the accession/version number of the top 5 hits.