BIT150 – Fall 2010 – Homework 3 KEY
Due on Thursday October 14th by email to TA: as Hwk3_Lastname BEFORE the Lab
1. 10 points The following are two orthologous proteins from different members of the grass family:
>Tm_Vrn1
MGRGKVQLKRIENKINRQVTFSKRRSGLLKKAHEISVLCDAEVGLIIFSTKGKLYEFSTESCMDKILERYERYSYAEKVLVSSESEIQGNWCHEYRKLKAKVETIQKCQKHLMGEDLESLNLKELQQLEQQLESSLKHIRSRKNQLMHESISELQKKERSLQEENKVLQKELVEKQKAHAAQQDQTQPQTSSSSSSFMLRDAPPAANTSIHPAAAGERAEDAAVQPQAPPRTGLPPWMVSHING
>Os_AP1-Like
MGRGKVQLKRIENKINRQVTFSKRRSKLLKKANEISVLCDAEVALIIFSTKGKLYEYATDSCMDKILERYERYSYAEKVLISAESDTQGNWCHEYRKLKAKVETIQKCQKHLMGEDLESLNLKELQQLEQQLENSLKHIRSRKSQLMLESINELQRKEKSLQEENKVLQKELVEKQKVQKQQVQWDQTQPQTSSSSSSFMMREALPTTNISNYPAAAGERIEDVAAGQPQHERIGLPPWMLSHING
1.1. Use the appropriate blast program to perform an alignment between these two protein sequences, from wheat and rice.
- What is the percentage of identity between them? 87% Identity
- Can you identify a region that is more conserved? Does it correspond to any known conserved domain? The first part of the protein is more conserved, It corresponds to the MADS box domain.
1.2. Use one of the dynamic-programming methods shown in discussed in lecture to align both protein sequences.
- Would you perform a global or a local alignment? Global. The proteins are the same length, making Global alignment more appropriate in this case.
- Which BLOSUM matrix would you use? BLOSUM62, greater than 30% identity (EMBOSS only goes up to BLOSUM62)
- Answer these questions and based on your answers run the alignment. Present your results reporting the length of the sequence aligned, similarity, identity, number of gaps, and final score.
ANSWER
Global – BLOSUM62
Length: 247
Identity: 84.2%
Similarity: 90.7%
Gaps: 4 (1.6%)
Score: 1025.5
2. 15points Using the following sequence:
>YFG_DNA
CGGCAACGGCGCCGATCAGAGGTGGAGTAGATCTAATCGCCGGTGAGTCTTTTCTTGGGAGGAGGAAAGAACCGAGTCCTCTCTTATAAAATATAAAGGCACCCATAGTGTGTTGCTGCCAAATTTGCCCATTATTCAGAATTTGGCATTGTCTCCACAACGTATGTGGCTCCCAATTACGGTATGCTGGGTCCCACCAGATGAGTACGGTGGGACGACCAAAAAGAATAAGAGATCTGGTGCATTTGGTAATGTGATCAGACGGTTAAGGAGTCTAAATCGCTCCGAGGGCTCCTAGCTAACTCTGTTTTTTTGGCAGCAACACACTATGGGTGCCTTTATATTTTAACCAATTGATAATGGTGTGGAAACAGTTATGGTGACTAGATGGTATTTATGGTGGGATCGCCGGAAAATTAGTCACCAGGAGAATGTTGGTTGCCCGGTTCGTTCAGCTATGAGCATTCGTGGTATCGTGGCAAACTCATTGGCTGCCAAGGCAAAGGAACCTACGAAACAGATTAGATGGGTGAGACCAGCTATAGGGGTGATTAAACTCAACGTTGATGCC
2.1. Use Dotter to align the sequence with itself.
- Using Shift+PrintScreen, present a plot of the alignment in your homework.
- Report the Dotter parameters used (window size and stringency).
- What kind of repeats are you observing? Indicate their approximate coordinates.
ANSWER
Stringency: 50/100, Window: 22
Inverse repeat at 87-123 and 313-350.
>YFG_DNA
CGGCAACGGCGCCGATCAGAGGTGGAGTAGATCTAATCGCCGGTGAGTCTTTTCTTGGGAGGAGGAAAGAACCGAGTCCTCTCTTATAAAATATAAAGGCACCCATAGTGTGTTGCTGCCAAATTTGCCCATTATTCAGAATTTGGCATTGTCTCCACAACGTATGTGGCTCCCAATTACGGTATGCTGGGTCCCACCAGATGAGTACGGTGGGACGACCAAAAAGAATAAGAGATCTGGTGCATTTGGTAATGTGATCAGACGGTTAAGGAGTCTAAATCGCTCCGAGGGCTCCTAGCTAACTCTGTTTTtttggcagcaacacactatgggtgcctttatattttaACCAATTGATAATGGTGTGGAAACAGTTATGGTGACTAGATGGTATTTATGGTGGGATCGCCGGAAAATTAGTCACCAGGAGAATGTTGGTTGCCCGGTTCGTTCAGCTATGAGCATTCGTGGTATCGTGGCAAACTCATTGGCTGCCAAGGCAAAGGAACCTACGAAACAGATTAGATGGGTGAGACCAGCTATAGGGGTGATTAAACTCAACGTTGATGCC
2.2. Name the type of repeat in each of the dotter alignments below, with an estimate of their position.
C. / ANSWERS
A) Microsatellite
B) Inverse repeat
C) Direct repeat
3. 30 points The following is a multiple sequence alignment of a 41-bp fragment from a putative plant cytochrome P450 gene from rice, maize, sorghum, and rye:
3.1. Using the Kimura’s two-parameter model shown below, scoring transitions (A<->G and C<->T) as 1 unit of distance and transversions as 2 unit of distance,
- Calculate pair-wise distances between the sequences and construct BY HAND a distance matrix.
- Show your calculations.
- Present the distance matrix.
ANSWER
Pair-wise distances
10
Rice-Maize: 10
Rice-Wheat: 12
Rice-Rye: 10
Maize-Wheat: 14
Maize-Rye: 12
Wheat-Rye: 4
10
Distance matrix:
Maize / Wheat / Rye / RiceMaize / - / - / - / -
Wheat / 14 / - / - / -
Rye / 12 / 4 / - / -
Rice / 10 / 12 / 10 / -
3.2. Using the distance-based method UPGMA,
- Construct BY HAND a phylogenetic tree based on the distance matrix created in 3.1.
- Provide distances for all the branches.
- Include all your intermediate matrices.
- Show your calculations.
- Manually draw the phylogenetic tree.
ANSWER.
Wheat and rye are the most closely related. Branch length: 4/2= 2
Merge wheat and rye.
Calculate average distance of Maize and Rice to Wheat-Rye.
Maize to Wheat-Rye (Maize to Wheat + Maize to Rye)/2 = (14+12)/2= 13
Rice to Wheat-Rye (Rice to Wheat + Rice to Rye)/2 = (12+10)/2= 11
Maize / Wheat-RyeWheat-Rye / 13
Rice / 10 / 11
Rice and Maize are the next closest. Branch length: 10/2= 5
Average distance between Wheat-Rye and Maize-Rice = (13+11)/2= 12
Branch group Wheat-Rye (12/2) = 6 à 6 – 2 = 4
Branch group Maize-Rice (12/2) = 6 à 6 – 5 = 1
4. 15 points From the following trees (A, B, C, D):
4.1. Construct BY HAND:
- a strict consensus tree (groups present in ALL trees);
- a 50% majority-rule consensus tree (groups in >50% of the trees).
4.2. What are consensus trees used for?
ANSWER
Strict Consensus
50% Consensus
ANSWER
Consensus trees, because they are composite trees that summarize information from different trees, are used to present results from a tree-construction method that produces several equally parsimonious trees, and also to combine results from different tree-construction methods.
5. 30 points Given the following 4 PRR domain protein sequences:
>wheat
mdrhhhhhqqqqqqppspqeehaaqprcweeflhrktirvllvetddstrqvvtallrhcmyqvipaenghqawaylqdmqsnidlvltevfmhgglsgidllgrimnhevckdipvimmsshdsmgtvlsclsngaaeflakpirknelknlwahvwrrshsssgsgsgsaiqtqkctksksgddsnnnsnnrnddasmglnardgsdngsgtqsswtkraveidspqdmspdqsidppdstcahvshlkseicsnrlrgtdnkkcqkpketngdefkgkeleigapgnlntddqsspnessvkptdgrceylpqnnsndtvmensdepivraadligsmaknmdaqqaaraidapncssqvpegkdadrenampylelslkrsrstadgadaaiqeeqrnvvrrsdlsaftryntcavsnqggagfvgscspngngseaaktdaaqmkqgsngssnnndmgsttksvvtkpaggnnkvspingnthtsafhrvqpwtpataagkdkadetskknaataaaaakdmsgeaqskhpcaaahdanggsaggtaqsslvnpsgpveghaanygsnsgsnnntnngstaataagaaaavhaetggidkrsnmmhmkrerrvaavnkfrekrkernfgkkvryqsrkrlaeqrprvrgqfvrqppppaaver*
>hordeum
mfplgarqpppsamdnqqqqqpprgehaaqprcweeflhrktirvllvetddstrqvvaallrhcmyqvipvenghqawaylqdmqsnidlvltevfmhgglsgidllgrimnhevckdipvimmsshdsmgtvlsclsngaadflakpirknelknlwahvwrrshsssgsgsgsaiqtqkctksksgddsnnnsndrnddasmglnardgsdngsgtqaqsswtkraveidspqdmspdqsadppegtcahvnhpkseicsnrwlpgtnnkkcqkpkettngdgfkgkeleigapgnlntddqsspnessvkptdgrceylpqnnssdtvmenleepivraadligsmaknmdaqqaaaraidapncslqapegkdtdgenampylelslkrsrstgegagpiqeeqrnvvrrsdlsaftrynmcavsnqggagfvgscspngdsseaaktvaaqmkqgsngssnnndmgsttksvvtkpcgnnkvspingnthtsafhrvqpwtpataagkdkadevskknvaaaaaaakemgceaqskhpcaaaddvnggsaggtaqssvvnpsgpveghaanygsnsfsnnntnngstaattaaaaavhaetgggvdkrsnmmnmkrerrvaavnkfrekrkernfgkkvryqsrkrlaeqrprvrgqfvrqppapaaver
>rice
mmgtahhnqtagsalgvgvgdandavpgaggggysdpdggptsgvqpppqvcwerfiqkktikvllvdsddstrqvvsallrhcmyevipaengqqawtyledmqnsidlvltevvmpgvsgisllsrimnhnicknipvimmssndamgtvfkclskgavdflvkpirknelknlwqhvwrrchsssgsgsesgiqtqkcaksksgdesnnnngsnddddddgvimglnardgsdngsgtqaqsswtkraveidspqamspdqladppdstcaqvihlksdicsnrwlpctsnknskkqketnddfkgkdleigsprnlntayqsspnersikptdrrneyplqnnskeaamenleessvraadligsmaknmdaqqaaraanapncsskvpegkdknrdnimpslelslkrsrstgddanaiqeeqrnvlrrsdlsaftryhtpvasnqggtgfvgscsphdniseamktdsaynmksnsdaapikqgsngssnnndmgsttknvvtkpstnkervmspsavkanghtsafhpaqhwtspanttgkektdevannaakraqpgevqsnlvqhprpilhyvhfdvsrenggsgapqcgssnvfdppveghaanygvngsnsgsnngsngqngsttavdaerpnmeiangtinksgpgggngsgsgsgndmylkrftqrehrvaavikfrqkrkernfgkkvry
>arabidopsis
mnaneegegsrypitdrktgetkfdrvesrtekhseeektngitmdvrngssgglqiplsqqtaatvcwerflhvrtirvllvenddctryivtallrncsyevveasngiqawkvledlnnhidivltevimpylsgigllckilnhksrrnipvimmsshdsmglvfkclskgavdflvkpirknelkilwqhssgsgsesgthqtqksvksksikksdqdsgssdenengsiglnasdgssdgsgaqsswtkkavdvddspravslwdrvdstcaqvvhsnpefpsnqlvappaeketqehddkfedvtmgrdleisirrncdlalepkdeplskttgimrqdnsfekssskwkmkvgkgpldlssespsskqmhedggssfkamsshlqdnrepeapnthlktldtneasvkiseelmhvehsskrhrgtkddgtlvrddrnvlrrsegsafsrynpasnankisggnlgstslqdnnsqdlikkteaaydchsnmneslphnhrshvgsnnfdmssttennaftkpgapkvssagsssvkhssfqplpcdhhnnhasynlvhvaerkklppqcgssnvynetiegnnntvnysvngsvsgsghgsngpygssngmnaggmnmgsdngagkngngdgsgsgsgsgsgnladenkisqreaaltkfrqkrkercfrkkvryqsrkklaeqrprvrgqfvrktaaatddndiknieds
5.1. Use tCOFFEE to produce an alignment of conserved protein regions.
- Do you see any well conserved regions? Please identify their approximate locations.
5.2. Produce a multiple sequence alignment using ClustalW. Using BOXSHADE, prepare a publishable alignment for these sequences. Paste the alignment into your homework document.
- Between tCOFFEE and ClustalW which program seems to have better identified conserved regions between these genes? TCOFFEE provides better alignment
- Use the appropriate BLAST program to identify any known conserved domains in the conserved regions. List their names and putative functions.
ANSWER
REC domain: receives the signal from the sensor partner in a two-component systems; contains a phosphoacceptor site that is phosphorylated by histidine kinase homologs; usually found N-terminal to a DNA binding effector domain
CCT domain: contains a putative nuclear localization signal within the second half of the CCT motif, important in interactions among proteins involved in flowering regulation.
5.3. Construct a phylogenetic tree with the NJ method (using Number of differences as the substitution model) with bootstrap values. Include the tree here.
- What do the bootstrap values indicate in phylogenetic trees?
ANSWER.
Bootstrap values indicate the number of times that these two species appeared to be joined by the single same node in the phylogenetic tree. For example, maize and sorghum were joined 100% of the times, according to both the NJ tree the UPGMA tree, while other relationships are less certain from these alignments.
tCOFFEE:
pg3 / pg4
BOXSHADE:
wheat 1 ------MDRHHHHHQ-----QQQQQPP------SPQE
barley 1 ------MFPLGARQPPPSAM-----DNQQQQQ------PPRG
rice 1 ------MMGTAHHNQTAGSALGVGVGDANDAVPGAGGG------GYSDPDGGPTSG
arabidopsis 1 MNANEEGEGSRYPITDRKTGETKFDRVESRTEKHSEEEKTNGITMDVRNGSSGGLQIPLS
wheat 21 EHAAQPRCWEEFLHRKTIRVLLVETDDSTRQVVTALLRHCMYQVIPAENGHQAWAYLQDM
barley 26 EHAAQPRCWEEFLHRKTIRVLLVETDDSTRQVVAALLRHCMYQVIPVENGHQAWAYLQDM
rice 45 VQPPPQVCWERFIQKKTIKVLLVDSDDSTRQVVSALLRHCMYEVIPAENGQQAWTYLEDM
arabidopsis 61 QQTAATVCWERFLHVRTIRVLLVENDDCTRYIVTALLRNCSYEVVEASNGIQAWKVLEDL
wheat 81 QSNIDLVLTEVFMHGGLSGIDLLGRIMNHEVCKDIPVIMMSSHDSMGTVLSCLSNGAAEF
barley 86 QSNIDLVLTEVFMHGGLSGIDLLGRIMNHEVCKDIPVIMMSSHDSMGTVLSCLSNGAADF
rice 105 QNSIDLVLTEVVMPG-VSGISLLSRIMNHNICKNIPVIMMSSNDAMGTVFKCLSKGAVDF
arabidopsis 121 NNHIDIVLTEVIMPY-LSGIGLLCKILNHKSRRNIPVIMMSSHDSMGLVFKCLSKGAVDF
wheat 141 LAKPIRKNELKNLWAHVWRRSHSSSGSGSGSA-IQTQKCTKSKSGDDSNNN--SNNRNDD
barley 146 LAKPIRKNELKNLWAHVWRRSHSSSGSGSGSA-IQTQKCTKSKSGDDSNNN--SNDRNDD
rice 164 LVKPIRKNELKNLWQHVWRRCHSSSGSGSESG-IQTQKCAKSKSGDESNNNNGSNDDDDD
arabidopsis 180 LVKPIRKNELKILWQH------SSGSGSESGTHQTQKSVKSKSIKKSDQDSGSSDENEN
wheat 198 --ASMGLNARDGSDNGSGT--QSSWTKRAVEID-SPQDMSPDQSIDPPDSTCAHVSHLKS
barley 203 --ASMGLNARDGSDNGSGTQAQSSWTKRAVEID-SPQDMSPDQSADPPEGTCAHVNHPKS
rice 223 DGVIMGLNARDGSDNGSGTQAQSSWTKRAVEID-SPQAMSPDQLADPPDSTCAQVIHLKS
arabidopsis 233 --GSIGLNASDGSSDGSGA--QSSWTKKAVDVDDSPRAVS---LWDRVDSTCAQVVHSNP
wheat 253 EICSNR-LRGTDNKKCQKPKE-TNGDEFKGKELEIGAPGNLNTDDQSSPNESSVKPTDGR
barley 260 EICSNRWLPGTNNKKCQKPKETTNGDGFKGKELEIGAPGNLNTDDQSSPNESSVKPTDGR
rice 282 DICSNRWLPCTSNKNSKKQKE--TNDDFKGKDLEIGSPRNLNTAYQSSPNERSIKPTDRR
arabidopsis 286 EFPSNQLVAPPAEKETQEHDD-KFEDVTMGRDLEISIRRNCDLALEPKDEPLSKTTGIMR
wheat 311 CEYLPQNNSNDTVMENSDEPIVRAADLIG------SMAKNMDAQQAAR-AIDAPNCS
barley 320 CEYLPQNNSSDTVMENLEEPIVRAADLIG------SMAKNMDAQQAAARAIDAPNCS
rice 340 NEYPLQNNSKEAAMENLEESSVRAADLIG------SMAKNMDAQQAAR-AANAPNCS
arabidopsis 345 QDNSFEKSSSKWKMKVGKGPLDLSSESPSSKQMHEDGGSSFKAMSSHLQDNREPEAPNTH
wheat 361 SQVPEGKDADRENAMPYLELSLKRSRSTADGADAAIQEEQRNVVRRSDLSAFTRYNTCAV
barley 371 LQAPEGKDTDGENAMPYLELSLKRSRSTGEGAG-PIQEEQRNVVRRSDLSAFTRYNMCAV
rice 390 SKVPEGKDKNRDNIMPSLELSLKRSRSTGDDAN-AIQEEQRNVLRRSDLSAFTRYHTPVA
arabidopsis 405 LKTLDTNEASVKISEELMHVEHSSKRHRGTKDDGTLVRDDRNVLRRSEGSAFSRYNPASN
wheat 421 SNQGGAGFVGSCSPNGNGSEAAKT------DAAQMKQGSNGSSNNNDMGSTTKSVV
barley 430 SNQGGAGFVGSCSPNGDSSEAAKT------VAAQMKQGSNGSSNNNDMGSTTKSVV
rice 449 SNQGGTGFVGSCSPHDNISEAMKTDSAYNMKSNSDAAPIKQGSNGSSNNNDMGSTTKNVV
arabidopsis 465 ANKISGGNLGSTSLQDNNS------QDLIKKTEAAYDCHSNMNESLPHNH
wheat 471 TKPAGGNNKVSP---INGNTHTSAFHRVQPWT-PATAAGKDKADETSKKNAATAAAAAKD
barley 480 TKPCG-NNKVSP---INGNTHTSAFHRVQPWT-PATAAGKDKADEVSKKNVAAAAAAAKE
rice 509 TKPSTNKERVMSPSAVKANGHTSAFHPAQHWTSPANTTGKEKTDEVAN-NAAKRAQPGEV
arabidopsis 509 RSHVGSNNFDMS-----STTENNAFTKPGAPKVSSAGSSSVKHSSFQPLPCDHHNNHASY
wheat 527 MSGEAQSKHP-----CAAAHDANGGSAGGTAQSSLVNPSGPVEGHAANY---GSNSGSNN
barley 535 MGCEAQSKHP-----CAAADDVNGGSAGGTAQSSVVNPSGPVEGHAANY---GSNSFSNN
rice 568 QSNLVQHPRPILHYVHFDVSRENGGSGAPQCGSSNVFDP-PVEGHAANYGVNGSNSGSNN
arabidopsis 564 NLVHVAERKK------LPPQCGSSNVYNETIEGNNNTVNYSVNGSVSGSGH
wheat 579 NTN--NGSTAATAAG------AAAAVH--AETGG-IDKRS---NMMHMK----RERRVAA
barley 587 NTN--NGSTAATTA------AAAAVH--AETGGGVDKRS---NMMNMK----RERRVAA
rice 627 GSNGQNGSTTAVDAERPNMEIANGTINK-SGPGGGNGSGSGSGNDMYLKRFTQREHRVAA
arabidopsis 609 GSNGPYGSSNGMNAGGMNMGSDNGAGKNGNGDGSGSGSGSGSGNLADENK---ISQREAA
6. 10 points From the following induced multiple sequence alignment:
Induced multiple sequence alignment of a segment of geneX (‘-‘ indicates a gap).
H1 / A / C / T / A / C / T / G / A / CH2 / T / C / T / - / C / G / G / A / C
H3 / T / G / T / A / C / G / - / - / T
H4 / A / C / T / G / T / G / A / A / C
6.1. Calculate BY HAND the ‘sum-of-pairs’ distance score, scoring transitions (A<->G and C<->T) and transversions equally (Jukes and Kantor 1-parameter model) and affine gap penalties: gap opening 4; gap extension 2.
A / C / G / TA / - / 1 / 1 / 1
C / 1 / - / 1 / 1
G / 1 / 1 / - / 1
T / 1 / 1 / 1 / -
- Indicate all your calculations within the table provided:
ANSWER.
Kimura 2-parameterH1 vs. H2 / 1+4+1=6
H1 vs. H3 / 1+1+1+4+2+1=10
H1 vs. H4 / 1+1+1+1=4
H2 vs. H3 / 1+4+4+2+1=12
H2 vs. H4 / 1+4+1+1=7
H3 vs. H4 / 1+1+1+1+4+2+1=11
Sum of Pairs / 50
10