BIT150 – Fall 2009 – Homework 4

Due on Thursday October 20th by email to TA: as Hwk4_Lastname BEFORE the Lab

1. 20 points Using Pregap4 and Gap4, assemble the 21 sequences that are part of Triticum monococcum L. BAC clone 322N9.

In the ‘Configure Modules’ tab of Pregap4:

·  General Configuration: For Get entry names from trace files, select No.

·  Estimate Base Accuracies: Select Logarithmic (Phred) scale.

·  Trace Format Conversion: Leave default parameters.

·  Initialize Experiment Files: You cannot modify it.

·  Augment Experiment Files: Do not add any extra information.

·  Quality Clip: Leave default parameters.

·  Sequencing Vector Clip: In Select Vector-primer subset, select pBS/HindIII.

·  Screen for Unclipped Vector Clip: OK.

·  Cloning Vector Clip: Unselect this option.

·  Gap4 Shotgun Assembly: In Gap4 database name, put a name for your output, and make sure you change the version every time you perform a new assembly with different parameters. Create new database, and RUN.

Answer the following questions:

ANSWERS

-  According to the database information:

1.1.  Were all the sequences provided used to perform the assembly?

Yes. The 21 sequences were used to perform the assembly.

1.2.  How many contigs were created? How many sequences were included in each contig? What is the length (bp) of each contig?

2 contigs were created. The larger contig included 20 sequences, and had a length of 3,600 bp, and the smaller contig included 1 sequence, and had a length of 54 bp.

-  Look at the confidence values used for the base-calling:

1.3. Present a confidence value graph for all contigs.

1.4. Present a quality plot for the larger contig and indicate, approximately,

a.  for how many bases the quality of the consensus sequence was OK on both strands, and

For ~1,400 bp the quality of the consensus sequence was OK on both strands.

b.  for how many bases the quality of the consensus sequence was OK only on the plus strand and only on the minus strand.

For ~750 bp the quality of the consensus sequence was OK on the minus strand, and for ~1,400 bp the quality of the consensus sequence was OK on the plus strand.

-  The sequence below corresponds to the assembly of the same 21 sequences by Phred Phrap Consed, which yielded only one contig.

>Tm322N9 by PhredPhrapConsed

GactcaaggtaTtaaaatgatAgggatcttaatcttACattagaAGGCATCAAgggtggcaaTgagtgtaggtTctacaAGGTTAAAtgAaCacatttacactgAAAACAgaaACatgTAAtacgATacattAGtctatattAattcccatgttaCCAtGAGCAcactgAgttcaaGAAAGAAGAAAAAAAtatcataCCagtgtaaCgGGAtggCCCCCCTCCTTGTCCAATTCGGACTTGGGGGGAGGGGCACGCGGCAGCCCCTAGGCCTCCTCTCCTCTTCCACCACTAGGCCCATTAAGGCCCATTAGGTTACCGGGGGGTTCCGGTAACCTCCCGGTACTCCGGTAAAATGCCGATTTCACCCGGAACACTTTCGATGTTCAAACATAGGCTTCTAATATATCAATCTTCATGTCTCGGCCATTTCGAGACTCCTCGTCATGTCCGTGATCACATCCGGGACTCCGAACAACCTTCGGTACATCAAAATATATAAACTCATAATGAAACTGTCATCGTAACGTTAAGCGTGCGGACCCTACGGGTTCGAGAACAATGTAGACATGACCGAGACACGTCTCCGGTCAATAACCAATAGCGGAACCTGGATGCTCATATTGGCTCCTACATATTCTACGAAGATCTTTATCGGTCAGACCGCATAACAACATACGTTGCTCCCTTTGTCATCGGTATGTTACTTGTCCGAGATTCGATCGTCGGTATCTCAATACCTAGTTCAATCTCGTTACCGGCAAGTCTCTTTACTCGTTCTGTAATACATCATCTCGCAACTAACTCATTAGTTGCAATGCTTGCAAGGCTTAAGTGATGTGCATTACCGAGAGGGCCCAGAGATACCTCTCCGACAATCGAAGTGACAAATATATTTTTTGTGCTGGCTCATTATTTCCAGTCATATATGTTAAATGCATTGTTAACTCCAGATTGAGTTCCAGTCTTTGCATCGCTATGCTGCTCTAACTGGGAGCTTTGGAACTGGTAACAGCCCTTGTCTTGATGTATTGTGCATGCTTGGCACCAAGCTGGGCGCAGTTGGGTGCCAGTATAAATTAGACCCAGCAACATCTGAGTTAAACTGCTCTGCTGGATTGTCAGTGCAGACGCCGTGCACCTCTCTTGGACTTATAATGTGAGTTTGAATTTTTACTTCTCTTGACACTACTCTTGGATGATTGAACTAGTTTAAATTCTCGGACTTGTTTGTATTGTTCTTTAGGCCGGTCACAATGGGCAAGAACATAAGCTAGTAACTTACACACTTCTCTAGACTATGTTACTACCTCCATAGTGGGTAGGAACATCTATGTTGTGTCATGCAACGATGTATTTATTAGGTTATAGACTCATTGTTTCTTGGAGTGTGTGATGTTCCGTTAACTTAGCTAGTTACCACAAGCACCTCTCTTTTCATTAAATATATGCCACATAAGCAAAGTTGTATTGGAGTGTGTGATGTTACTCCTAAGTTCCTCCCCATTGTGACCAGCCTTAGTATTGTGTGTCTTCCTTTATCTGTAGACCCTTCTTCGCATTGATGTCTTTGACCTCATGTTGTTATTTATTGCTTGTGCTTTTATCCATGGTAGCTTTACTTGAAACATGCAATTTCAGGTGCTGGTGCTCAAGGCGCGTGTGACGAGCGATGGTGCTGCTGCTGTTGTTGTGAAGAAGAGTTTCTGGGGCCACATCTCGCTTGCTGGCAGCGCTGAAGTTGACCTGTGTCGGATCCGTGGCAAACCATCGGTGGGGCTTGTTGTGTCTGTTTTCTAGTTAGTCTTGAGTCGCTTGTCGCCTAACATTGATGCTGTGTCCCCTATTATATTCTGGCTCTGAATGTTGGCATGTTTTGCCTGCTAAGTTTTGTGCCAAACATTGGCTTAAAGTTGATTTTAACTTCCCAGACATGTAGGATGGATGAATGAAATTTGTGCTCATTTGCTTTTCCCAGTTTTTCTTACTGTTGTTGTTGCCTGATTTTATATAAACAAGTTAAGTAGCGGGATTTCTGCAGATGCTGGAGCAAGAATTTGGGCAGGAGAATGCCATGATGGGATTTCTGCAGCCAAGTTTGCAAGATGTGGAGGCTGAGGCATTGCTCATCAAATGGGAAGACGTGCTTGGCCTGTCAGAAAAAAAAACCTGGAGAAGCAGTACTGTTGACTGGAACAGGAGGTGTGAGGCAGTAAACATAAGTGCAGCAACACTCCAGCAGACACTTAAGAAAAAAAAACGTGTGCATTTTCGAATATAAGTTGTTGAAACTGAATCCCTGAAGGAGATGATAAAAGTAGTGGAAACTTGAGAAAGCTCTCTGAAATATGATGTTTGCTTTACATGGGCATGGATAGTTCTGAGATCCTCAAACCTAAATATTATGGTATTTCGGAGCAAAAGGGAGTCCATGCTGGAGATATTCATCCTCCAAACCAATGCGAATTCTGCGACAGTAGAACTGAGAGTGCAGCTACAAGATTCAGAAGAATAAATTGCCAATAAAGGACCGCCGCCTCGCTCTTCTTCCTCGCCGCCCCGGACTGCCTCCGCCATCCTTTCCGCGTGGTCTTCCCTCCCTTTCACCGTAGAGGAAGACGCCGGGTTCTGGGGCGAGCTGACCTCGATGCATGCGCTACTTCTTTCTCCCTGGAATGGAATGGAATCCACACACCATTCCAGCGTGCTCGGACGCTGGGCTGGCTCTGGGAGTAAAGTAAGTAATCCAAGACCTTGCTGCATGATTCGCAGCTCAAACTGTGGAGTGGGCGCACTGATGCACAGTTGACAGCCTCTGATAGCAATGTTGACGGTGCTCCAGTAGCTTGCACTGATGCTTGGACATTCTCAAGTGATACTGCAATTTCTTGAGATGGTTGTGCAGAAGCAGACACCAGAGCAGATGCACCGCAAGCAGCTTGATACAGTAGCTTGGAAACAGTTGCTGTCTCATGAATTGTACTAATGGAGGGTAATGTCTCGTCCACAAATACAAATATGTTGTCCAGCAGAATTAGTGATTCTCATCATCATCATCTTGATATACTCTGGTTCAACTGGACCATACTTGTTGGAATGATTGCCCTTGGATAGTAGTGTAGTGACATGAATTACAGTTGAGCTCGTTCCATAAAATGCCCCCACTTGCAACAACAACTTCCTTTTTGCATACATGGTGTAGGCTTGAAAGGTAGTTTAGCATCAAATTCATGGGATATAAATGTCCACTATCAATGAGATTTTTGTTCCCATTTGAATGTTAATAGTAAAGGTGTTCTTGGCTGCTCCAATACCTAGGACTGCATGCATATGCATTCAACTCCCTCTTCTTGTTGCTGTGATTGAAATTCAGCCACATCCTCTTCATTGTCAAGAATGTGTATTAGTTCAGAGTTATCAAGTGGTTGCACGCACTTGTAGTGCTTGGACCTGATTTTTTCTGGCTTAATTTACAAACTTGTCTATGCCCAGGTGCCCATGGCTCTTTACACCTCCAGCATATCCCTTTTTGCCTTGCTAGTTTTACCAGGGCTTGATTGGGAGGTAGTGCCTGTGCAGTGCAGGTTTTGGAAGGTCAAAATTAACCTGCCTCCTGACTAAAGCAGGTTTCTTCTTGCCATCCACATGTCATTCTGTAGGTCTTTTGGCTTGTCACATTGGAGATGTGCTTGGATGTAATCAGCTAGCCCTGATATAA

1.6. Save the consensus sequence of the larger contig obtained by Gap4 and perform a blast2sequences with the assembly provided above, highlighting mismatches.

a.  How many mismatches can you see? Where are they located (beginning, middle, end of the assembly) in the consensus sequence?

I can see 4 mismatches. They are located at the beginning of the consensus sequence.

b.  Inspect the chromatograms to see their quality for each mismatch. Using Shift/PrintScreen show, for one mismatch, the quality of the chromatograms with the confidence values, and indicate if Gap4 performed well in the calling of this particular base or not.

According to the quality of the chromatograms and the confidence values with which T was called, Gap4 performed well in the calling of this base.


2. 20 points Use GENESEQER (http://www.plantgdb.org/PlantGDB-cgi/GeneSeqer/PlantGDBgs.cgi) with the correct PhyC protein sequence provided below to predict the exon – intron structure of the PhytochromeC gene present in this genomic region of T. aestivum. Select the rice splice site model.

>Ta PhyC genomic DNA

AACGGTCAACATTTTCTGACAGCGACGCTTGCCACCCCATTTCTGTGCGAAAAGCAAAGCACGCACGCCGCATGTTGGCTCGAAGCATCACGCACCACACAGTTTTGGGAAAGGAATCGGGGAGCCCCGCGGACGATCTGGGCCACACGCGTCCAATCCACCGGCCAAAGATGGACAGCCTGCTGGGCACTGGTACTACCGCTAGTTACTACGCGATGGCCCCGACAAGCCAACCCATTCTTCAATTATTCCGATCAAGTGGGAAGCGCCATCCAAATTAGCCGAGCTCGCACACTACTGCCGCGTCAGCTCTTGCGGGGAAGACGAGCCGCGCCGGAGTGACGTCGTACGGCTTCCCGTTCCCCTCTCGGTTTCCCGACGCCTCTCTTGGCTCACCCGCCCGCCCGCCGCCGCCCTGCCACTTCCTCCGCGCGTGAAAGCCCACCGCCTATTCCCCTTCCCTTCGCTCTCCGACGCCGGGCGCCACCCCGGCGGATCGAGCGGGCGGGCGGTTAGTTAGTTGCGCATCGCTGTTGCTTGCTTCTTCTACCGTTTGGCGCAGGGAGGAGGAGCGTGGGGGTAACATCGCTGTTCCACTCCCACCCGGGTGCTGCCCCCTCCTGTTCCCTTCTCACTCACTGCGTGTGCTTATCCGCGCCGGGCGAATCCAATCCCCCACTCTCCCCCGTCCTTCTCCAGAAAAGTCGCGGCTTTCCCCCCGCCCCCTCATGATTCCCGTCGATTCCTCCTCCGCCCATTTGCCCCTCCGCGTCGCAGAATCCCCCGCGCCACCGCTGCTGACCGTCGCGCGGTAGGGGGAGGGGCAGGAGCGAGGAGCCTAGCTCGGGGGTGGTCGTGGTGGCGACCGGCGGCGAGATGTCGTCGTCGCGGTCCAACAACCGGCCGGCGTGCTCGCGGGGGAGCTCGGCGCGCTCCAAGCACAGCGAGCGGGTGGTGGCGCAGACGCCCGTGGACGCGCGCCTGCACGCCGAGTTCGAGGGCTCGCAGCGCCACTTCGACTATTCCTCCTCGGTCAGCGCGCTCAACCGCTCCGGGGCCAGCACCAGCTCCGCCGTCTCCGCCTACCTCCAGAACATGCAGCGGGGCCGCTACATCCAGCCCTTCGGCTGCCTGCTCGCGATCCACCCGGAGTCCTTCGCGCTGCTCGCCTACAGCGAGAACGCCGCCGAGATACTCGACCTCACGCCGCACGCCGTGCCCACCATCGACCAGCGCGACGCGCTCGCCGTCGGCGCCGACGTGCGCACGCTCTTTCGCTCCCAGAGCGCCGTCGCCCTGCACAAGGCCGCCGTCTTCGGGGAGGTCAACCTGCTCAACCCAATCCTCGTCCACGCCAGGACCTCCGGGAAGCCCTTCTACGCCATCTTGCACCGCATCGACGTCGGCCTCGTCATCGACCTCGAACCGGTCAACCCCGCCGACGTGCCCGTCACCGCCGCCGGCGCGCTCAAGTCGTACAAGCTCGCCGCCAAGGCCATCTCCAGGCTGCAGTCCCTGCCCAGTGGCAACCTCTCCCTGCTCTGCGATGTGCTGGTCCGGGAGGTAAGCGAGCTCACTGGCTATGACAGGGTGATGGCATACAAGTTCCATGAGGACGAGCATGGTGAGGTCATTGCCGAGTGCAGGAGGTCTGATTTGGAGCCGTATCTTGGCCTGCACTACCCGGCAACTGACATCCCACAAGCATCCAGGTTTCTGTTTATGAAGAACAAAGTGCGGATGATATGTGATTGTGCTGCAAGTCCTGTGAAGCTCATTCAGGATGACAACCTGTCACAGCCTATCAGCCTCTGTGGCTCGACCATGAGAGCACCCCATGGTTGCCATGCCCAGTACATGGCCAACATGGGCTCCATCGCGTCGCTGGTGATGTCGATCACTATAAACGAGGACGACGATGAGGATGGAGACACTGGGAGTGACCAGCAGCCGAAAGGCAGGAAGCTGTGGGGGCTGGTGGTTTGCCATCACACGAGCCCGAGGTTCGTCCCCTTCCCTCTCAGGTATGCTTGCGAATTTCTCTTGCAAGTGTTCGGCATACAGCTCAACAAGGAGGTGGAACTTGCTTCTCAGGCAAAGGAGAGGCACATCCTCCGCACGCAGACGCTTCTGTGTGATATGCTCCTCAGGGATGCTCCTGTTGGGATATTTACCCAGTCTCCCAATGTAATGGATCTGGTGAAGTGCGATGGTGCAGCATTGTGTTACCAAAACCAGATTATGGTGCTGGGATCAACACCCTCTGAAGGAGAGATAAAGAAGATTGTCGCGTGGCTGCTGGAGTGCCATGATGGCTCTACTGGGCTAAGTACGGACAGCTTATTGGAAGCAGGATATCCTGGTGCGTCTGCGCTCGGTGAGGTTGTCTGTGGCATGGCAGCTATAAAGATCTCTTCCAAAGGATTTATCTTCTGGTTCCGGTCGCACACAGCAAAGGAGATCAAGTGGGGTGGAGCTAAGCATGAACCAGGTGATGCAGATGACAATGGCAGGAGGATGCATCCACGTTCTTCGTTCAGGGCCTTCTTGGAGGTAGTTAAATGGAGGAGTGTTCCTTGGGAGGATGTTGAAATGGACGCAATCCATTCTCTCCAGCTAATATTGCGTGGCTCCCTGCAAGATGAAGATGCTAACGACAACAATGCAAGGTCAATTGTTGAAGCTCCATCTGATGACATCAAGAAGATACAGGGGCTACTTGAACTGAAAATTGTGACAAATGAGATGGTGCGCCTAATTGAGACAGCAACTGCTCCTATATTGGCTGTCGACATCGTTGGTAACATAAACGGATGGAATAATAAAGTTGCAGAAATTACTGGATTACCCACCACGGAAGCCATAGGGATGCTTCTGGTAGATCTCGTGGAGGGTGATTCTGTTGAAGTGGTTAAGCAAATGTTGAACTCAGCTCTGCAAGGTTGTCACTATGTCTTGAATCTGGTCCTTTTATGCTTTCCAGCTTAGTTATAACATGTACATCATTTTTTGGTGACACATCTGTGTTTCATTTCTCCTTTATATTTCAGGAACCGAAGAGCAGAATTTGGAAATCAAGCTTAAAACAATGCATCAACAGGAAAGTAAGGGCCCTGTAGTCTTGATGGTTAACGCCTGTTGTAGTCGTGACCTTTCAGACAAAGTTGTTGGGGTATGTTTCGTAGCACAAGATTTGACAGGGCACAAGATGGTTATGGATAAGTATACCCGGATACAGGGTGACTATGTTGCAATAGTAAAGAACCCCAATGAGCTCATACCCCCTATATTTATGATCAATGATCTTGGTTCTTGCTTAGAATGGAATGAAGCTATGCAAAAGATTACTGGTATAAAGAGGGAAGATGCAATAGATAAGTTGCTAATCGGGGAGCTTTTCACTCTTCATGATTATGGATGCAGGGTAAAAGATCAAGTTACTCTAACCAAACTTAGCATACTGATGAACACGGTGATCTCTGGTCAAGAACCCGAGAAGCTTGCTTTTGGTTTCTTCAACACAGATGGCAAGTACATGGAATCACTGCTGACAGCAAACAAGAGGACAGATGCTGAGGGTAAGATCACCGGCGCTCTTTGCTTTTTGCATGTGGCCAGCCCCGAGCTTCAGCATGCTCTTCAGGTGCAGAAAATGTCTGAACAAGCTGCTACACACAGCTTTAAGGAATTGACATATATTCGTCAAGAATTAAAGAACCCACTCAATGGCATGCAATTTACCCGTAAGTTGTTGGAACCGTCTGACTTGACAGAGGAGCAGAGGCAACTTTTTGCATCAAATGTTCTCTGCCAAGAACAGTTGAAAAAGATTTTACATGACAATGATCTAGAAGGCATTGAACAGTGGTAAGTGCTATTTTTTCAGCATCGTCTAAAAGCTGATAAGCTATTTTTTTTCACTAACCCCAGTAAATTGTGTCTGTGCTGATGCAGAACTGAAACTACTAATATCTCATGTTCTTTTCCCTATCAATTGCATTTAATCTATGGAGTACACATGATATGCACTACTGGTTGCAAATTCTTAGACATGAGAACTTCCACCATTAACAAAATGCTTATAGTGCTAGAAAATCAATTCACACAACAGTTTAGCATGCCTTTTCCATTTACTTCTACATATGATAGAAAAATGATAGATTAAATATTAATTAATACTTTTTTTACAGCTACATGGAGATGAACACGGTGGAATTCAACCTTGAAGAAGCTCTGAATACGGTCCTAATGCAAGGCATGTCTGTGAGCAAGGAAAAACAAATTTCTCTTGATCGTGATTGGCCTGTAGAAGTATCATCAATGTACTTATATGGGGATAACTTAAGGCTTCAGCAAGTCCTAGCAGACTACTTGGCATGCACGCTTCAATTTACTCGGCCAGCTGAAGGGCCTATTGTACTCCAGGTCATTCCCAAGAAGGAACACATTGGTTCTGGCATGCAGATTGCTCATCTAGAATTCAGGTTAGCCCCTCCCACTTCTGTCCCTCCGCGCATGCCATAAATTGAACTAACTATATATATCAGAAGAACACATAGGGAACATGAGACACTTACTTTACGTGTACAGCATAAATATTTTTGGGATGGCGACTGTGCAAGTCCGGAGCCATCCGGGTCAGATAAAAATTATAGTGGTCAGCTTGTAAGTGACAGGAGGTTGCCCTTGCTCTTCTTTGGTAGTGTTTAGATTGCCTGATGAGTTTTCTGGGATGCAATGTCACTTAACTGAAAACAATCTGAAAGCGGTTTGCTGGATAGGCCAAATCTCCTTGAAAATAACAGCTGTTTAACCCATGAATCCATGCAGACTTGTCCACCCAGCACCAGGCGTCCCGGAGGCACTGATACAGGAGATGTTCCGCCACGGCCCAGGGGTATCCCGAGAAGGCCTCGGCCTGCACATAAGCCAGAAGCTGGTGAAGACGATGAGCGGCACGGTACAGTACCTCCGAGAAGCGGAGAGCTCGTCGTTCATCGTTCTGGTAGAGTTCCCGGTGGCGCAGCTCAATAGCAAGAGGTCCAGGCCTTCGACGAGCAAGAGTAACTTCTGAATCCCGACGACGGGTGTCCTCGAGATGTGAACCAGGGCTATTGATTGGGTTGCGCAAAGAGCAAGCCCTTTGGGAGAGCTTAGCTGATGTATCA

Ta_PhyC protein

MSSSRSNNRPACSRGSSARSKHSERVVAQTPVDARLHAEFEGSQRHFDYSSSVSALNRSGASTSSAVSAYLQNMQRGRYIQPFGCLLAIHPESFALLAYSENAAEILDLTPHAVPTIDQRDALAVGADVRTLFRSQSAVALHKAAVFGEVNLLNPILVHARTSGKPFYAILHRIDVGLVIDLEPVNPADVPVTAAGALKSYKLAAKAISRLQSLPSGNLSLLCDVLVREVSELTGYDRVMAYKFHEDEHGEVIAECRRSDLEPYLGLHYPATDIPQASRFLFMKNKVRMICDCAASPVKLIQDDNLSQPISLCGSTMRAPHGCHAQYMANMGSIASLVMSITINEDDDEDGDTGSDQQPKGRKLWGLVVCHHTSPRFVPFPLRYACEFLLQVFGIQLNKEVELASQAKERHILRTQTLLCDMLLRDAPVGIFTQSPNVMDLVKCDGAALCYQNQIMVLGSTPSEGEIKKIVAWLLECHDGSTGLSTDSLLEAGYPGASALGEVVCGMAAIKISSKGFIFWFRSHTAKEIKWGGAKHEPGDADDNGRRMHPRSSFRAFLEVVKWRSVPWEDVEMDAIHSLQLILRGSLQDEDANDNNARSIVEAPSDDIKKIQGLLELKIVTNEMVRLIETATAPILAVDIVGNINGWNNKVAEITGLPTTEAIGMLLVDLVEGDSVEVVKQMLNSALQGTEEQNLEIKLKTMHQQESKGPVVLMVNACCSRDLSDKVVGVCFVAQDLTGHKMVMDKYTRIQGDYVAIVKNPNELIPPIFMINDLGSCLEWNEAMQKITGIKREDAIDKLLIGELFTLHDYGCRVKDQVTLTKLSILMNTVISGQEPEKLAFGFFNTDGKYMESLLTANKRTDAEGKITGALCFLHVASPELQHALQVQKMSEQAATHSFKELTYIRQELKNPLNGMQFTRKLLEPSDLTEEQRQLFASNVLCQEQLKKILHDNDLEGIEQCYMEMNTVEFNLEEALNTVLMQGMSVSKEKQISLDRDWPVEVSSMYLYGDNLRLQQVLADYLACTLQFTRPAEGPIVLQVIPKKEHIGSGMQIAHLEFRLVHPAPGVPEALIQEMFRHGPGVSREGLGLHISQKLVKTMSGTVQYLREAESSSFIVLVEFPVAQLNSKRSRPSTSKSNF*

2.1. In the provided genomic DNA sequence, highlight each predicted exon with a different color.

2.2. Using corresponding colors, mark the locations of the exon regions in the protein translation as well.

2.3 Use bold red letters to mark the ATG start codon, TGA stop codon, GT and AG splicing sites.

ANSWER

Predicted gene structure (within gDNA segment 1 to 5168):

Exon 1 877 2941 (2065 n); Protein 1 688 ( 688 aa); score: 1.000 Intron 1 2942 3050 ( 109 n); Pd: 0.901 Pa: 0.999

Exon 2 3051 3867 ( 817 n); Protein 689 961 ( 273 aa); score: 1.000 Intron 2 3868 4190 ( 323 n); Pd: 0.992 Pa: 1.000

Exon 3 4191 4484 ( 294 n); Protein 962 1059 ( 98 aa); score: 1.000 Intron 3 4485 4830 ( 346 n); Pd: 0.999 Pa: 0.937

Exon 4 4831 5074 ( 244 n); Protein 1060 1139 ( 80 aa); score: 1.000

>PhyC_gDNA

AACGGTCAACATTTTCTGACAGCGACGCTTGCCACCCCATTTCTGTGCGAAAAGCAAAGCACGCACGCCGCATGTTGGCTCGAAGCATCACGCACCACACAGTTTTGGGAAAGGAATCGGGGAGCCCCGCGGACGATCTGGGCCACACGCGTCCAATCCACCGGCCAAAGATGGACAGCCTGCTGGGCACTGGTACTACCGCTAGTTACTACGCGATGGCCCCGACAAGCCAACCCATTCTTCAATTATTCCGATCAAGTGGGAAGCGCCATCCAAATTAGCCGAGCTCGCACACTACTGCCGCGTCAGCTCTTGCGGGGAAGACGAGCCGCGCCGGAGTGACGTCGTACGGCTTCCCGTTCCCCTCTCGGTTTCCCGACGCCTCTCTTGGCTCACCCGCCCGCCCGCCGCCGCCCTGCCACTTCCTCCGCGCGTGAAAGCCCACCGCCTATTCCCCTTCCCTTCGCTCTCCGACGCCGGGCGCCACCCCGGCGGATCGAGCGGGCGGGCGGTTAGTTAGTTGCGCATCGCTGTTGCTTGCTTCTTCTACCGTTTGGCGCAGGGAGGAGGAGCGTGGGGGTAACATCGCTGTTCCACTCCCACCCGGGTGCTGCCCCCTCCTGTTCCCTTCTCACTCACTGCGTGTGCTTATCCGCGCCGGGCGAATCCAATCCCCCACTCTCCCCCGTCCTTCTCCAGAAAAGTCGCGGCTTTCCCCCCGCCCCCTCATGATTCCCGTCGATTCCTCCTCCGCCCATTTGCCCCTCCGCGTCGCAGAATCCCCCGCGCCACCGCTGCTGACCGTCGCGCGGTAGGGGGAGGGGCAGGAGCGAGGAGCCTAGCTCGGGGGTGGTCGTGGTGGCGACCGGCGGCGAGATGTCGTCGTCGCGGTCCAACAACCGGCCGGCGTGCTCGCGGGGGAGCTCGGCGCGCTCCAAGCACAGCGAGCGGGTGGTGGCGCAGACGCCCGTGGACGCGCGCCTGCACGCCGAGTTCGAGGGCTCGCAGCGCCACTTCGACTATTCCTCCTCGGTCAGCGCGCTCAACCGCTCCGGGGCCAGCACCAGCTCCGCCGTCTCCGCCTACCTCCAGAACATGCAGCGGGGCCGCTACATCCAGCCCTTCGGCTGCCTGCTCGCGATCCACCCGGAGTCCTTCGCGCTGCTCGCCTACAGCGAGAACGCCGCCGAGATACTCGACCTCACGCCGCACGCCGTGCCCACCATCGACCAGCGCGACGCGCTCGCCGTCGGCGCCGACGTGCGCACGCTCTTTCGCTCCCAGAGCGCCGTCGCCCTGCACAAGGCCGCCGTCTTCGGGGAGGTCAACCTGCTCAACCCAATCCTCGTCCACGCCAGGACCTCCGGGAAGCCCTTCTACGCCATCTTGCACCGCATCGACGTCGGCCTCGTCATCGACCTCGAACCGGTCAACCCCGCCGACGTGCCCGTCACCGCCGCCGGCGCGCTCAAGTCGTACAAGCTCGCCGCCAAGGCCATCTCCAGGCTGCAGTCCCTGCCCAGTGGCAACCTCTCCCTGCTCTGCGATGTGCTGGTCCGGGAGGTAAGCGAGCTCACTGGCTATGACAGGGTGATGGCATACAAGTTCCATGAGGACGAGCATGGTGAGGTCATTGCCGAGTGCAGGAGGTCTGATTTGGAGCCGTATCTTGGCCTGCACTACCCGGCAACTGACATCCCACAAGCATCCAGGTTTCTGTTTATGAAGAACAAAGTGCGGATGATATGTGATTGTGCTGCAAGTCCTGTGAAGCTCATTCAGGATGACAACCTGTCACAGCCTATCAGCCTCTGTGGCTCGACCATGAGAGCACCCCATGGTTGCCATGCCCAGTACATGGCCAACATGGGCTCCATCGCGTCGCTGGTGATGTCGATCACTATAAACGAGGACGACGATGAGGATGGAGACACTGGGAGTGACCAGCAGCCGAAAGGCAGGAAGCTGTGGGGGCTGGTGGTTTGCCATCACACGAGCCCGAGGTTCGTCCCCTTCCCTCTCAGGTATGCTTGCGAATTTCTCTTGCAAGTGTTCGGCATACAGCTCAACAAGGAGGTGGAACTTGCTTCTCAGGCAAAGGAGAGGCACATCCTCCGCACGCAGACGCTTCTGTGTGATATGCTCCTCAGGGATGCTCCTGTTGGGATATTTACCCAGTCTCCCAATGTAATGGATCTGGTGAAGTGCGATGGTGCAGCATTGTGTTACCAAAACCAGATTATGGTGCTGGGATCAACACCCTCTGAAGGAGAGATAAAGAAGATTGTCGCGTGGCTGCTGGAGTGCCATGATGGCTCTACTGGGCTAAGTACGGACAGCTTATTGGAAGCAGGATATCCTGGTGCGTCTGCGCTCGGTGAGGTTGTCTGTGGCATGGCAGCTATAAAGATCTCTTCCAAAGGATTTATCTTCTGGTTCCGGTCGCACACAGCAAAGGAGATCAAGTGGGGTGGAGCTAAGCATGAACCAGGTGATGCAGATGACAATGGCAGGAGGATGCATCCACGTTCTTCGTTCAGGGCCTTCTTGGAGGTAGTTAAATGGAGGAGTGTTCCTTGGGAGGATGTTGAAATGGACGCAATCCATTCTCTCCAGCTAATATTGCGTGGCTCCCTGCAAGATGAAGATGCTAACGACAACAATGCAAGGTCAATTGTTGAAGCTCCATCTGATGACATCAAGAAGATACAGGGGCTACTTGAACTGAAAATTGTGACAAATGAGATGGTGCGCCTAATTGAGACAGCAACTGCTCCTATATTGGCTGTCGACATCGTTGGTAACATAAACGGATGGAATAATAAAGTTGCAGAAATTACTGGATTACCCACCACGGAAGCCATAGGGATGCTTCTGGTAGATCTCGTGGAGGGTGATTCTGTTGAAGTGGTTAAGCAAATGTTGAACTCAGCTCTGCAAGGTTGTCACTATGTCTTGAATCTGGTCCTTTTATGCTTTCCAGCTTAGTTATAACATGTACATCATTTTTTGGTGACACATCTGTGTTTCATTTCTCCTTTATATTTCAGGAACCGAAGAGCAGAATTTGGAAATCAAGCTTAAAACAATGCATCAACAGGAAAGTAAGGGCCCTGTAGTCTTGATGGTTAACGCCTGTTGTAGTCGTGACCTTTCAGACAAAGTTGTTGGGGTATGTTTCGTAGCACAAGATTTGACAGGGCACAAGATGGTTATGGATAAGTATACCCGGATACAGGGTGACTATGTTGCAATAGTAAAGAACCCCAATGAGCTCATACCCCCTATATTTATGATCAATGATCTTGGTTCTTGCTTAGAATGGAATGAAGCTATGCAAAAGATTACTGGTATAAAGAGGGAAGATGCAATAGATAAGTTGCTAATCGGGGAGCTTTTCACTCTTCATGATTATGGATGCAGGGTAAAAGATCAAGTTACTCTAACCAAACTTAGCATACTGATGAACACGGTGATCTCTGGTCAAGAACCCGAGAAGCTTGCTTTTGGTTTCTTCAACACAGATGGCAAGTACATGGAATCACTGCTGACAGCAAACAAGAGGACAGATGCTGAGGGTAAGATCACCGGCGCTCTTTGCTTTTTGCATGTGGCCAGCCCCGAGCTTCAGCATGCTCTTCAGGTGCAGAAAATGTCTGAACAAGCTGCTACACACAGCTTTAAGGAATTGACATATATTCGTCAAGAATTAAAGAACCCACTCAATGGCATGCAATTTACCCGTAAGTTGTTGGAACCGTCTGACTTGACAGAGGAGCAGAGGCAACTTTTTGCATCAAATGTTCTCTGCCAAGAACAGTTGAAAAAGATTTTACATGACAATGATCTAGAAGGCATTGAACAGTGGTAAGTGCTATTTTTTCAGCATCGTCTAAAAGCTGATAAGCTATTTTTTTTCACTAACCCCAGTAAATTGTGTCTGTGCTGATGCAGAACTGAAACTACTAATATCTCATGTTCTTTTCCCTATCAATTGCATTTAATCTATGGAGTACACATGATATGCACTACTGGTTGCAAATTCTTAGACATGAGAACTTCCACCATTAACAAAATGCTTATAGTGCTAGAAAATCAATTCACACAACAGTTTAGCATGCCTTTTCCATTTACTTCTACATATGATAGAAAAATGATAGATTAAATATTAATTAATACTTTTTTTACAGCTACATGGAGATGAACACGGTGGAATTCAACCTTGAAGAAGCTCTGAATACGGTCCTAATGCAAGGCATGTCTGTGAGCAAGGAAAAACAAATTTCTCTTGATCGTGATTGGCCTGTAGAAGTATCATCAATGTACTTATATGGGGATAACTTAAGGCTTCAGCAAGTCCTAGCAGACTACTTGGCATGCACGCTTCAATTTACTCGGCCAGCTGAAGGGCCTATTGTACTCCAGGTCATTCCCAAGAAGGAACACATTGGTTCTGGCATGCAGATTGCTCATCTAGAATTCAGGTTAGCCCCTCCCACTTCTGTCCCTCCGCGCATGCCATAAATTGAACTAACTATATATATCAGAAGAACACATAGGGAACATGAGACACTTACTTTACGTGTACAGCATAAATATTTTTGGGATGGCGACTGTGCAAGTCCGGAGCCATCCGGGTCAGATAAAAATTATAGTGGTCAGCTTGTAAGTGACAGGAGGTTGCCCTTGCTCTTCTTTGGTAGTGTTTAGATTGCCTGATGAGTTTTCTGGGATGCAATGTCACTTAACTGAAAACAATCTGAAAGCGGTTTGCTGGATAGGCCAAATCTCCTTGAAAATAACAGCTGTTTAACCCATGAATCCATGCAGACTTGTCCACCCAGCACCAGGCGTCCCGGAGGCACTGATACAGGAGATGTTCCGCCACGGCCCAGGGGTATCCCGAGAAGGCCTCGGCCTGCACATAAGCCAGAAGCTGGTGAAGACGATGAGCGGCACGGTACAGTACCTCCGAGAAGCGGAGAGCTCGTCGTTCATCGTTCTGGTAGAGTTCCCGGTGGCGCAGCTCAATAGCAAGAGGTCCAGGCCTTCGACGAGCAAGAGTAACTTCTGAATCCCGACGACGGGTGTCCTCGAGATGTGAACCAGGGCTATTGATTGGGTTGCGCAAAGAGCAAGCCCTTTGGGAGAGCTTAGCTGATGTATCA

TaPhyC_protein

MSSSRSNNRPACSRGSSARSKHSERVVAQTPVDARLHAEFEGSQRHFDYSSSVSALNRSGASTSSAVSAYLQNMQRGRYIQPFGCLLAIHPESFALLAYSENAAEILDLTPHAVPTIDQRDALAVGADVRTLFRSQSAVALHKAAVFGEVNLLNPILVHARTSGKPFYAILHRIDVGLVIDLEPVNPADVPVTAAGALKSYKLAAKAISRLQSLPSGNLSLLCDVLVREVSELTGYDRVMAYKFHEDEHGEVIAECRRSDLEPYLGLHYPATDIPQASRFLFMKNKVRMICDCAASPVKLIQDDNLSQPISLCGSTMRAPHGCHAQYMANMGSIASLVMSITINEDDDEDGDTGSDQQPKGRKLWGLVVCHHTSPRFVPFPLRYACEFLLQVFGIQLNKEVELASQAKERHILRTQTLLCDMLLRDAPVGIFTQSPNVMDLVKCDGAALCYQNQIMVLGSTPSEGEIKKIVAWLLECHDGSTGLSTDSLLEAGYPGASALGEVVCGMAAIKISSKGFIFWFRSHTAKEIKWGGAKHEPGDADDNGRRMHPRSSFRAFLEVVKWRSVPWEDVEMDAIHSLQLILRGSLQDEDANDNNARSIVEAPSDDIKKIQGLLELKIVTNEMVRLIETATAPILAVDIVGNINGWNNKVAEITGLPTTEAIGMLLVDLVEGDSVEVVKQMLNSALQGTEEQNLEIKLKTMHQQESKGPVVLMVNACCSRDLSDKVVGVCFVAQDLTGHKMVMDKYTRIQGDYVAIVKNPNELIPPIFMINDLGSCLEWNEAMQKITGIKREDAIDKLLIGELFTLHDYGCRVKDQVTLTKLSILMNTVISGQEPEKLAFGFFNTDGKYMESLLTANKRTDAEGKITGALCFLHVASPELQHALQVQKMSEQAATHSFKELTYIRQELKNPLNGMQFTRKLLEPSDLTEEQRQLFASNVLCQEQLKKILHDNDLEGIEQCYMEMNTVEFNLEEALNTVLMQGMSVSKEKQISLDRDWPVEVSSMYLYGDNLRLQQVLADYLACTLQFTRPAEGPIVLQVIPKKEHIGSGMQIAHLEFRLVHPAPGVPEALIQEMFRHGPGVSREGLGLHISQKLVKTMSGTVQYLREAESSSFIVLVEFPVAQLNSKRSRPSTSKSNF*

3. 20 points Use FGENESH+ and GENSCAN and GENOMESCAN to predict genes within the same sequence of T. monococcum containing the PhytochromeC gene you annotated above (Q2.). Compare the predictions performed by each of the programs with the results from the correct annotation you made in Q2. For each program describe:

a.  How many of the exons were identified?

b.  How many of those were predicted correctly without errors, according to the correct annotation you made in Q2.?

When asked to provide a protein sequence use the rice ortholog provided below:

>Rice_PhyC

MSSSRSNNRATCSRSSSARSKHSARVVAQTPMDAQLHAEFEGSQRHFDYSSSVGAANRSGATTSNVSAYLQNMQRGRFVQPFGCLLAVHPETFALLAYSENAAEMLDLTPHAVPTIDQREALAVGTDVRTLFRSHSFVALQKAATFGDVNLLNPILVHARTSGKPFYAIMHRIDVGLVIDLEPVNPVDLPVTATGAIKSYKLAARAIARLQSLPSGNLSLLCDVLVREVSELTGYDRVMAYKFHEDEHGEVIAECKRSDLEPYLGLHYPATDIPQASRFLFMKNKVRMICDCSATPVKIIQDDSLTQPISICGSTLRAPHGCHAQYMASMGSVASLVMSVTINEDEDDDGDTGSDQQPKGRKLWGLMVCHHTSPRFVPFPLRYACEFLLQVFGIQINKEVELAAQAKERHILRTQTLLCDMLLRDAPVGIFTQSPNVMDLVKCDGAALYYQNQLWVLGSTPSEAEIKNIVAWLQEYHDGSTGLSTDSLVEAGYPGAAALGDVVCGMAAIKISSKDFIFWFRSHTAKEIKWGGAKHEPIDADDNGRKMHPRSSFKAFLEVVKWRSVPWEDVEMDAIHSLQLILRGSLQDEDANKNNNAKSIVTAPSDDMKKIQGLLELRTVTNEMVRLIETATAPILAVDITGSINGWNNKAAELTGLPVMEAIGKPLVDLVIDDSVEVVKQILNSALQGIEEQNLQIKLKTFNHQENNGPVILMVNACCSRDLSEKVVGVCFVAQDMTGQNIIMDKYTRIQGDYVAIVKNPSELIPPIFMINDLGSCLEWNEAMQKITGIKREDAVDKLLIGEVFTHHEYGCRVKDHGTLTKLSILMNTVISGQDPEKLLFGFFNTDGKYIESLMTATKRTDAEGKITGALCFLHVASPELQHALQVQKMSEQAAMNSFKELTYIRQELRNPLNGMQFTRNLLEPSDLTEEQRKLLASNVLCQEQLKKILHDTDLESIEQCYTEMSTVDFNLEEALNTVLMQAMPQSKEKQISIDRDWPAEVSCMHLCGDNLRLQQVLADFLACMLQFTQPAEGPIVLQVIPRMENIGSGMQIAHLEFRLVHPAPGVPEALIQEMFRHSPGASREGLGLYISQKLVKTMSGTVQYLRESESSSFIVLVEFPVAQLSTKRCKASTSKF

3.1. Use FGENESH+ (http://linux1.softberry.com/berry.phtml?topic=fgenes_plus&group=programs&subgroup=gfs)

ANSWER

FGENESH+ found six exons. It appears to have assigned an alternative splicing site and incorrectly called the exons.

3.2. Use GENSCAN (http://genes.mit.edu/GENSCAN.html)

ANSWER

GENESCAN found 7 exons. This program uses a maize model to identify splice sites, and incorrectly assigned the exons.

3.3 Now use the updated version called GenomeScan using the protein sequence provided above in Q2. (http://genes.mit.edu/genomescan.html).

-Did you get the correct exon prediction? Why might the prediction program work better when provided with the protein information?

Yes, the maize model with the rice orthologous protein gave the correct prediction. Using a reference protein gives this prediction program much more accuracy in assigning the exons.


4. 20 points In the following sequence from T. monococcum annotate the repeat elements present. The Triticeae Repeat Database TREP (http://wheat.pw.usda.gov/ggpages/Repeats/blastrepeats3.html) will help you to identify the elements. In TREP, select blastn program and Cereal repeat sequences, complete set database.

>Tm_43j16

AAATATTAATCAAGAATTTGAAAATTTTTGAACAAAAAAAGATTAAATGTGTATAGAAAATTTGTTGACCATGTACTTAAAAAACTATGATTTCTTATAATGTACTTAAAAAAGGGTTAGTCATAACAATATGCAACGGGGACCTCACTCTTCTAGAACCAACAATATGCAACATTAGGTAGATGGCTCTCCATGTTGTTTGCGGCAGTTCATCTTTGGCAAACATGTCCTTAGAGCATCTCCAGCCGTTCAGCCCATAGGACGCCGAAGAAGAGCCGCTTGGGGCTTTAAGGTTGGAGTAAAATGCAAGAAACCCACTTTGAAGGCTAGGTTTGCAGATAACACTACGTTACATTTTTTGGACAGAAAGCACCACGTCTCGTGTAATCTTGTTGCAAATAACACTGATTGGCGGATTAGAGCGCTTAAAGTATGTTTATGACAGGAGGGTCCAGATTTTGCCCTTGTGGCGTAACGGCTTGGCCATGACGGCGTTAGATGGTGAGTTTGTCTAGATTTACAGACAAGTCCCTGTAATTTTAAAGGAACACCCTTCAGTGAAAATCAAAAAAGCAATCAGCTCACTGGCATACCCAAGCACCCAGAGCACAGAACATCGAGGCCACGATGGTCGCTGGCGAGGGGGCGCGCCATCAGGTCAAGGAGGCGGGGGCAGTGGTCGTCGGGCTCTGGTATGCGGCGGCGGCGGTCGTCGGGCTCCGGGATGCGGCAGCGTAGCGGTCGTCGGGGTCTGGGAGAGCGGCGGCGGTGTTGGAAATATGCCCTAGAGACAATAATAAATTGATTATTATTATATTTCCTTGTTCATGATAATCGTTTATTATCTATGCTAGAATTGTTTTGATAGGAAACTCAGATACATGTGTGGATACATAGACAACACCATGTCCCTAGTAAGCCTCTAGTTGACTAGCTCGTTGATCAATAGATGGTTACGGTTTCCTAACCATGGACATTAGATGTCGTTAATAATGGGATCACATCATTAGGAGAATGATGTGATGGACAAGACCCAATCCTAAGCCTAGCACAAAGATCCTGTAGTTCGTTCGCTAAAGCTTTTCTAATGTCAAGTATTATTTCCTTAGACCATGAGATTGTGCAACTCCCAGATACCGTAGGAATACTTTGCCTGTGCCAAACGTCACAACATAACTAGGTGACTATAAAGGTGCACTACGGGTATCTCCGAAAGTGTCTGTTGGGTTGGCACGAATCGAGACTGGGATTTGTCACTCCGTGTGATGGAGAGGTATCTCTGGGCCCACTCGATAGGACATCATCATAATGTGCACAATGTGATCAAGGATTTGATCACGGGATGATGTGTTACGGAACGAGTAAAAGAGACTTGCCGGTAACGAGATTGAACAAGGTATCGACATACCGACGATCGAATCTCGGGCAAGTACAATACCGATAGACAAAGGGAATTGTATATGGGATAGATCGAATCCTCGACATCGTGGTTCATCCGATGAGATCATCGTGGAACATGTGGGAACCAACATGGGTATCCAGATCCCGCTGTTGGTTATTGACCGGAGAACGTCTCGGTCATGTCTGCATGTCTCCCGAACCCGTAGGGTCTACACACTTAAGGTTCGATGACGCTAGGGTTATAAAGGAAGTTTGTATGTGGTTACCGAATGTTGTTCGGAGTCCCGGATGAGATCCCGGACGTCACGAGGAGTTCCGGAATGGTCCGGAGGTAAAGATTTATATATGGGAAGTCCTGTTTTGGTCACCGAAAAAGTTTCGGGGTTTATCGGTAACGTACCGGGACCACCGGGAGGGTCCCGGGGGTCCACCAAGTGGGGCCACAAGCCCCAGAGGCATACATGGGCCAAGTGTGAGAAGGCACCAGCCCCTAGGTGGGCTAGGGCGCCTCCCACCATGGCCCATGCACCTAGAAGGGGAGAGGGGGCAAACCCTAAGGGCAGATGGTGTTGGGGAACGTAGTAATTTCAAAAAATTTCCTACGCACACGCAAGATCATGGTGATGCATAGCAACGAGGGGAGAGTCTGATCTACGTACCCTTGTAGATCGCAACGGAAGAGTTGACACAACGTAGATGAAGTAGTCGTACGTCTTCTTCCCGATCCGACCGATCCAAGCACCGTTACTCCGGCACCTCCGAGTTCTTAGCACACGTTCAGCTCGATGTGATACGTCTCCGTCGTATCTACTTTTTCAAACACTTTTGCCTTTGTTTTGGACTCTAACTTGTATGATTTGAATGGAACTAACCCTGACTGACACTGTTTTCAGCAAAACTGCCATGATGTTGTTTTATGTGCAGAAAACAAAAGTTCTCGGAATGACCTGAAACTCCACGAAATATATTACAATAAATAATAAAAAATCCTCACCAAAGATGAAGACCAGGGGGCCCACACCCTGTCCACGAGGGTGGGGGGCGCCCCCCCAGGGCGCGCCCCCTACCTCGTGGCCCCCCTAGAGACCCCCCGACTCCAACTCCAACTCTATATATCTGCTTTCGGGAAGAAAAAAATAGGAGAGAAGAATTCATCGCGTTTTACGATATGGAGCCGCCGCCAAGCCCTAAAACCTCTCGGGAGGGCTGATCTGGAGTCCGTTCGGGGCTCCGGAGAGGGGGATTCGTCGCCGTTGTCATCATCAACCATCCTCCATCACCAATTTCATGATGCTCACCGCCGTGCGTGAGTAATTCCATCGTAGGCTTGCTGGACGGTGATGGGTTGGATGAGATTTATCATGTAATCGAGTTAGTTTTGTTAGGATTTGATCCCTAGTATCCACTATGTTCTGAGATTGATGTTGCTATGACTTTGCTATGCTTAATGCTTGTCACTAGGGCCCGAGTGCCATGATTTCAGATCTGAACCTATTATGTTTTCATCAATATATGAGAGTTCTTGATCCTATTTTGCAAGTCTATAGTCACCTATTATGTGTTACGATCTGTTAACCTCGAAGTGACAATAATCGGGATACTTACCGGTGATGACCGTAGTTTCAGGAGTTCATGTATTCACTATGTGCTAATGCTTTGTTCTGGTTCTCTATTAAAAGGAGGCCTTAATATCCCTTAGTTTCCATTAGGACCCCGCTGCCACGGGAGGGTAGGACAAAAGATGTCATGCAAGTTCTTTTCCATAAGCACGTATGACTATATTCGGAATACATGCCTACATTACATTGATGAATTGGAGTTAGTTCTGTGTCACCCTATGTTATGACTGTTACATGATGAAACCACATCCGGCATAATTATCCATCACTGATCCGATGCCTACGAGCTTTCCATATATTGGTTTACGCTTATTTACTTTCCCGTTACTTTTGTTACAACCACTACAAAATACCAAAAACATTACTTTGCTTTCGTTACTCTTTTGTTACCGTTACCATCACTATCATATTACTTTGCTACTAAACACTTTGCTGCAGATACTATGTTTCCAGGTGTGGTTGAATTGACAACTCAGCTGCTAATACTTGAGAATATTCTTTGGCTCCCCTTGTGTCGAATCAATAAATTTGGGTTGAATACTCTACCCTCGAAAGCTGTTGCGATCCCCTATACTTGTGGGTTATCAAGACTAATTTCTGCCGCTGTTGCCGGGGAGCATAGCTCCATTCTTTGTGTCACTTGGGATTTATATCTGCTGGACACTATGAAGAACTTGAAAGATGCTAAGGCAACAATTTATCGCTCAACTACAAGGGGAGGTAAGGAACTGTCATCTAGCTCTGCACTTGATTCACCTTCTGTTATGAGTAAGCTTGCGACACCTAAACCTGCTTCTGCTATTCGTTCTGATATGCCGCATGTTATTGATGATGCTACTTCTGCTATACATGATACTTATGATGAAACTACTTCTATGCTTGATACTACTGTGCCACTTGGTGATTTTCTTGAGGAACGACTTGCTAGGGCTAGAGAGATTGAAAATATTGAATATGATTATGATGATGAAAGTGATGATGAAGAATCACTTATTATTCTTGAGGGTTATTTATTTGATCAAGAAGCTTCTTTAGCTATTTTAGCTTGCAAAGATAGATATGAACTCAAAAGGTTATTAGCTAAATGGAATAAGCAATCTCTAAATGCTAGGATGAAACCCGACCCTGCTTTTGCTACTTCACCTATCTGTGTTACTGATAAGGATTATGAATTCTCTGTTGATCCTGATATAATTACTTTAGTTGAATCTGATCCTTTTCATGGCTATGAATCTGAAACTGTTGTGGCATATCTTACTAAATTAAATGATATAGCCACCCTGTTCACTAATGATGAGAGATCTCGCTACTTTTATATCCTTCAAATATATCCGTTCTCATTAAAGGGTGATGCTAAGATATGGTTTAATTCTCTTGATCTTGGTTGTGTGCGTAGTCCCCAAAATATGATTTACTACTTCTCTGCTAAATATTTCCCTGCTCATAAGAAACAAGCTGCTTTAAAGGAAATATACAACTTTGTGCAAATTAAAGAAGAGAGTCTCCCACAAGCTTGGGGGAGGCTTCTCAAGTTACTTAATGCTTTGCCTGATCATCCTCTTAAGAAAAATGAAATACTTGATATATTTTATAATGGACTAACCGATGCTTCCAGAGATTACCTGGATAGTTGTGCTGGTTCTCTTTTCAGGGAAAGAACACCGGATGAAGCTGAAATCATATTGAATAATATGTTGACAAATGAAAATAATTGGGCACCTCTGAGCCAGCTCCTGCTCCAATTAATGATCCTATTCCTAAACAAACTCCGAAGAAGAGAGGTGTTCTATTTCTCAGTCCCGAAGATATGCAAGAGGCAAAGAAATCTATGAAAGAAAAAGGTATTAAAGCTGAAGATGTTAAGAATTTACCTCCTATTGAAGAAATACATGGTCTTTATTTACCGCCTATTGAAGAAATATATGATCTTAATCCTTTATTTATTGAAGAACCTCCAGACCTCGATAACCCGACTCATGTAGTAAAGGTAAATTCTCTCTATAGATATGATAAAGCTGAAATCCCTCCTGCTAAAATTGCTAGTCAATGCTTGGATGAGTTTGATAATTTTATGTTTAAGCAACAAGACTTCAATGCTTATTTTGGTAGACAATTAAAACAAGATGCTGATATATTAAATATTTGGGTGATTATATGGCTAATATTAGAGGTGAACTTAAACTTGTTAGCAAACATGCTTCTATGGTTACTACTCAAGTAGAACAAGTACTTAAAGCTCAAAAGGAATTGCTCAATGAGATGAATAGTAAGAAAAATAATTATGCTGTTAAAGTGGCTACTAGAACTGGTAGAATGACTCAGGAACCTTTGTATCCTGAAGACCACCCTAAGAGAATTAAGCAAGATTCTCAGAGAAATAATATTGATGCACCTAGTTCTTCTGAAAGGAAGAAAAAGAAAAATGATAGAACTGTGCAAACTTCTAGTGAACCTATTGCTGAACCACCTGATAATCCAAATGGTATATCTATGTCCGATGCCGAAACACAATCTGGCAATGAACATGAACCTAATGAAAATGTTAATGATGATGTTCATAATGATGCTCAACCTAGTAATGATAATGATGTAGAAATTGAATCTGCTGTTGATCTTGATAACCCACAATCAAAGAATCAACGTTATGATAAGAAAGACTTTGTTGCTAGGAAACATGGTAAAGAAAGAGAGTCTTGGGTTCAGAAACCCATGCCTTTTCCTCCCAAACCATCCAAGAAAAAGGATGATGGGGATTTTGAGCGCTTTGCTGAAATGATTAGACCTATCTTTTTGCGTATGCGATTGACTGATATGCTCAAAATGAATCCTTATGCTAAGTACATGAAAGATATTGTTACGAATAAAAGAAAGATACCAGAAGCTGAAATTTCCACCATGCTTGCTAATTATAATTTTAAGGGTGGAATACCAAAGAAACTTGGAGATCCAGGAGTACCTACTATACCATGCTCCATTAAAAGAAACTATGTTAAAGCTGCTTTATGTGATCTTGGAGCCGGTGTTAGTGTTATGCCTCTCTCTTTATATCGTAGACTTGACTTGAATAAGTTGACACCTACTGAAATATCTTTGCAAATGGCTGATAAATCAACTGTTATACCTGTCGGTATTTGTGAGGATGTGCCTGTTGTGGTTGCAAACGTTACTATTTTAACGGACATTGTTATTCTTGATATTCCTGAGGACGATAGTATGTCTATTATTCTTGGAAGACCTTTTCTTAATACTGCAGGGGCTGTTATTGATTGCAACAAAGGCAATGTCACTTTTCATGTTAATGGTAATGAGCATACAGTACACTTTCCGAGGAAACAACCTCAAGTTCATAGTATCAATTCTATCGGAAAAATTCCATCGATTATATTTGGAGGTTTTGAATTTCCTCTTCCTACTGTCAAGAAGAAATATGATATACTTATTATTGGGGATGTGCATATTTATTCGAAATTTCGCCGGTTTCATGTTAATCGGAATGAGTTTGTTAACAAGACTTGATCAACCTTGTTAGTGGATTCCTTTTGACGAGCATGAGATGGATGAAACTAGAAGCACAACCTTCTGTACTCTCTCTATACTTTCTGTTATTTAGTTGAAATAAAGTAAAAATAGTATTTTTCTGTCTGTTTCCGGATTTATCCGTGCAATATAAAAATGTCCAGAAAATAAAAGTCCTCAGAATGACATGCCAATTTAATATGATTTTTTCAGGAATATTTGAGGATTTATGGTGCAAAAATTACCGCGGGGGGAGCTGCCACCTGGTCACGAGGGTGGAGGGCACGCTCTATCCCCTAGGCGCGCCCCCCTGCCTCGTGGGCCCATGGTGGCCCTCCTCCACTTATTCCTGCACCCATCCTCTTCTTCTTCCTCCCACAAATACGAAAAACCGGCTCAAGCACGAGTTCCAGCCCTCTTTGCTGCCATTTTCGATCTCCTTGCTCAAAGCACCTCCCACAAAACTGCTTGGGGAGATTGTTCCTTGGTATGTGACTCCTCCATTGGTCCAATTAGTTTTTGTTCTAGTGCTTTATTCATTGCAAATTTTTGCTGCTTAGGTGACCCTGTTCTTGAGCTTGCATGTCAAATTTATATGGTCAAAAGTAGTTTTGATGCATGATACAGGCTCTAGGCACTTGTAGGAGTGGTTGCTATAAATATTATTGAGTTTGGTTTACTTTTATTTTGAAGTTACTAAAAATTTCAGAATTTTTCAGAAAATGATGAGGAGACTTTTGAGGGGCTCATCGAGCCAAAGCTCGAAGGATAAGGCACCGAAGCCTAAGTATAATTTTCCTCGCACCGCAGAGGTTCGGGCGTGTGAATGGCCTTCCGAGGATTTCTTAAGAGCAGCCGGGATTTATGAAGATTTTCATGAGTTGGCTCAGACTGCAGGCCTCACCGCTTTCCTCCACGACCAATGTGATCAGTATCTCTTGCTCACAAATACCTTTATGCAAAATTTTCATTTTCATTCTAAGAACTCACCACCTACGGTGGAATTTCATTTATATGATGAGCATAAGGAGATGTCACTTTATGATTTCTGTCAGATTTGTTTAGTCCCTTTTGGGGGCAAGACAGAAGAACCACATCGTGATGATGTAGAGGGATTTATTGACACTATCACTATAGGGGGAAAAAGGAAGGTTTTCGATGCACGAATCACTAGCATACATTTTCCTGTTTTATGTTATTTTGCATTATTTGCTAGTCGTTGCTTAATTGGTCGCGGAAATTGTGGAAACCTGAGTATCCCTGATATTATCATTTTGCACCACGGCTTATACGGTGATAACTCTTTTAGTATGGGCGACATTATTGCTAGACGGTTAAGTATGAACCGTACTAAGGGTCCCATCTTTGTAGGCATTTATGCTTCACGCCTCGCTGCACATTTTAACATACCTATTAGGCATTATGAGAAGGAAGAAAATATGCTGCCTCGTGTTTATTTAGATTATAAAAGTATGGTAGCACATGATTTTATTGTTAAGAATAGGGAAGAAGAGCTTAAATATCAATTGTTCTTTAATAAACATCATCCTGAGACTATTACCTTGCCTGCTCCTTCTTTGTTTGATTTGACTGTAGGCACATACCTTGTTCCATTGGCGGCTATTCGCGCCTACCGGAACCCTGCACCAGCCCCGGAATCGGAACCACAAGAGGAGCCTCCACGACAGTCTGTTTATTCTTGGGATCCAGAGATGGCTGTCAGCCAATGGCAGTCAGAGTCTTCTTCATCATAGTACGACCCCAACAACTATTATTATGGATATCAGCCAGGCCAGCCGTGGCCATAGACCAACTTAGGTGAAAAGCCTAAGCTTGGGGGAGTACGTATTTCTCACTGACATTACATTTATGTTCACACACACTCATTGCTAGATGTCGGTGCTCATACTTTTTCATTGTATCATCCATGCTAGTTTATTTCCTTTTATGCTTTCTTTTTGTGTGTTTAATAAACCTTAAGAAAAAAATTAGTAGTAGTTTACTTTTCTTGCCGTAGTAATAATTAAAAAGAAAACCCAAAAATATTTCCCGTTCTTCTTTTGCTTGTTGGGAGCTTTCCTGTGTAAATAGTTTTATTTCTTTTTCTTTTCTTTGGTGGTTGGTAGGAGAAGACCATGATTAAATTGTTGGAGTGGCTCTTATATGCTTTATTGTTGGTTTAACCAAGAGCCCATATTGCTTTGTCTTCTCGTGTTTATTGAATGCTCACAGATTCCAGCTTAGTCCAATGCATGTGCACTCTTATTATTATTCACATTGTTCGCTTATGCAAGTGAAAGGCAATTATGATGATATATGATGGACTGACTGAGAGGAGAAAAGCGGGTATGAACTCGACCTCTTTTGTTTTTGTAAATATGATGAGTTCATCGTTCTTGATTTAGCTTGTTATGAATAAACATGTTTGCAATGACAATTAGATATCATAGTTGCTTGTGCCATGCTTGATTATCTATGAGTTATAATGGTTTACCTTGCGTGCCAACATGCTATTGAGATGGTTATGATGTGGTATGATGGGGTGGTATCCTACTTTGAATGATTTAAGTGACTTGACTTGGCACATGTTCACGCATGTAGTTGAAACAAATCAACATAGTCTTCACGATATTTATGTTCATGGTGGATTATATCCTACTCATGCTTGCATCCAATGTTTATTAATTTTAATGCATGTACATGGCTGTTGTCGCTCTCTAGTTGATCGCTTCCCAGTCTTTTGCTAGCCTTCACATGTACTAAGCAGGAATACTGGTTGTGCATCCAATCCCTTAAACCCCAAAGTTATTCCAGATAAGTCCACTATACCTTCCTATATGTGGTATATATCTGCCGTTTCAAGTAAATTTGTATGTGCCAAACTCTAAACCTTCAAATAATCATCATGTTTTGTATGCTCGAATAGCTCATGTATCAACTAGGGCTGTCTGTATCTTCCACGCTAGGCGGGTTATTCTCAAGAGGAGTGGACTCCGCTCCTCACTCACGAGAAAATGGCTGTTCACGGGATGCCCAGTCCCATGCTTTATGAAAACTAAATCAAAATAATTGCAAAACAAAACTCCCCCTGGGACTGTTGTTAGTTGGAGGCACTTGTTGTTTCGAGCAAGCCATGGATTGATGTTTGTTGGTGGAGGGGGAGTATAAACTTTACCATTCTGTTTGGGAACCGCCTATAATGTGTGTAGCATGGAAGATATCGCCATCTCTTGGTTGTTATGTTGACAATGAAAGTATACCGCTCAAAATGTTAATTATCTCTATTTCAAAACCGAGCTCTGGCACCTCTACAAATCCCTGCTTCCCTCTGCGAAGGGCCTATCTATTTACTTTTATGTTGAGTCATCACCCTCTTATTAAAAGCACTAGCTGGAGAGGGCAGCTGTCATTTGCATTCATCACGGTTAATTTATATTGGGTGTGACTATGATTGGATCTCTTTTACCATGAATTACAATGTCTAGTCAGTCCTTGATCTTTAAAGGTGCTCTGCATTTATGTTTTGCGGTCTCAGAAAGGGCTAGTGGGATACCATCTTGTTATATCATATTATGATTGTTTTGAGAAAGTGTTGTCATCCGAGATTTATTATTATGGCTTGCTAGTTGATTATGCTATTGATATGGGTAATCATGAGACCTGAGAACTATTGCAAATATGGTTAGTTATAATCTTTGCTGAAAACTTGAATGCTAGCTTGACATACTTACAACAACAAGAGCAAACATAGTTTGTAAAAGTTTTTCTTTATTTCTTTCAGTTTGTCAACTAAATTGCTTGAGGACAAGCAAGGGTTTAAGCTTGGGAGAGTTGATACGTCTCCGTGTATCTACTTTTCCAAACACTTTTGCCCTTGTTTTGGACTCTAACTTGTATGATTTGAATGGAACTAACCCGGACTGACGTTGTTTTCAGCAGAACTGCCATGATGTTGTTTTATGTGCAGGAAACAAAAGTTCTCGGAATGACCTGAAACTCCACGGAATATATTACAATAAATAATAAAAAATCCTCGCCAAAGATGAAGACCAGGGGCCCACACCCTGTCCATGAGGGTGGGGGACGCCCCCCCTCTAGGGCGCGCCCCCTACCTCGTGGGCCCCTTGGAGACCTCCCGATTCCAACTCCAACTCTATATATCTGCTTTCGGGGAGAAAAAATAGGAGAGAAGAATTCATCGCGTTTTACGATACGGAGCCATCACCAAGCCTAAAACCTCTCGGGAGGGCTGATCTGGAGTCCGTTCGGGGCTCCGGAGAGGGGTATTCATCGCCGTCGTCATCATCAACCATCCTCCATCACCAATTTCATGATGCTCACCATCGTGCGTGAGTAATTCCATCATAGGCTTGCTGGACGGTGATGGATTGGATGAGATTTATCATGTAATCGAGTTAGTTTTGTTAGGATTTGATCCCTAGTATCCACTATGTTCTGAGATTGATATTGCTATGACTTTGCTATGCTTAATGCTTGTCACTAGGGCCCTAGTGCCATGATTTCAGATCTGAACCTATTATGTTTTCATCAATATATGATAGTTCTTGATCCTATCTGGCAAGTCTATAGTCACCTATTATGTGTTATGATCCGTTAACCCCGAAGTGACAATAATCGGGATACTTACCGGCGATGACCATAGTTTGAGGAGTTCATGCATTCACTATGTGCTAATGCTTTGTTCCGGTTCTCTCTTAAAAGGAGGCCTTAATATTCCTTAGTTTCCATTAGGACCCCGCTGCCACGGGAGGATAGGACAAAAGATGTCATGCAAGTTCTTTTCCATAAGCATGTATGACTATATTCGGAATACATGCCTACATTACATTGATAAATTGGAGCTAGTTCTGTGTCACCTTATGTTATGACTGTTACATGATGAAACCACATCCAGCATAATTATCCATCACTGATCCGATGCCTACGAGCTTTCCATATATTGGTTTATGCTTATTTACTTTCCCGTTGCTATTGTTACAACCACTACAAAATACCAAAAACATTACTTTGCTTTCGTTACTCTTTTGTTACCGTTACCATCACTATCATATTACTTTGTTATTAAACACTTTGCTGCAGATACTAAGTTTCCAGGTGTGGTTGAATTGACAACTCAGCTGCTAATACTTGAGAATATTCTTTGGCTCCCCTTGTGTCGAATCAATAAATTTGGGTTGAATACTCTACCCTCAAAAGCTGTTGCGATCCCCTATACTTATGGGTTATCACGATGGCCACAAGCCCCAGAGGCATACATGGGCCAAGTGTGAGAAGGCACCAGCCCCTAGGTGGGCTAGGGCACCTCCCACCATGGCCCATGCGCCTAGAAGGGGAGAGGGGGCAAACCCTAAGGGCAGATGGGCCCTAAGGCCCATGCTCCCTGCGCCTCCCTCTCCTCCCCTCTTGGCCGCCCCCCTCAAACCCATCTAGGGCTGCCGCCCCTCTAGGGATGGGAACCCTAGAGCGGGCGCACCCTCCTCCCCTTCCCCTATATATACTTGAGGCCTAGGGGCTTCCCAAACACGCGATTTGATCTCTCCCTGTTGGTGCTGCCCTACCTCTCTTTTCCTCATATCTAGCAGTGCTTGGCGAAGCCCTGCTGGATCACCACGCTCCTCCACCACCACCACGCCGTTGTGCTGCTGCTGGATGGAGTCTTCCTCAACCTCTCCCTCACTCCTTGCTGGATCAACGCATGGGAGACATCACCGGGCTGTATGTGTGTTGAACGCAGAGGTGCCATCCGTTCGGCACTAGGATCATCGGTGATTTGGATCACGACGAGTACGACTCCATCAACCCCGTTCTCTTGAACGCTTCTGCTTAGCAATCTACAAGGGTATGTAGATGCACTCTCCTTCCCCTCGTTGCTGGTTTCTCCATAGATAGATCTTGGTGACACGTAAAAAAATTTTGAATTTCTGCTACGTTCCCCAACAGGCGGCGGCGGCAGTCGTCGGGATCTGGGAGGTGACGGCGAGCGAATCGCGCGATGACGGGCCATAGGTGGCCGTGCCAGCGCGGCGGGGGCGCGCCAACGGGAACTGGGAGGCGGCGGAGAGAGATCGATGCTCATCCATGGCGGCGACGTCCTTGGCGCTCGACCATGGCGGCGGGACGCTCGTCCAGCGTCCATGGTGATCCAGGCGGCCGGGAGGAGTAGAGGGAGGGCTGGGCTCGTCGCTGGCACCGGCGGTCACGGAGCAGTCCGGCCGTGTGGTGGAGGTTGGGATCTGGCGTGGGTGGCGCCTGCTTCTCCACCACCAGAAGAAGGATCGAGGAGAAATTTCTTGTGAGGGAAAAAGATAACGGAAGAATCCGGCTGATAGGTGGACCCCATGTCTACTCTGACGCCGTTTCGCCCCAAGCCGTTATGTCACATGGGCAAAATCTGGAACCACCTGTCATAAACACACTTAAAGCACTCTAATCCGCCAATCAGTGTTATTTGCAATAAGATTACACAAAACGTGATGCTTTTTGTTCAAAAAATGTAACGTAGTGTTATCTGCAAACCTAGCCTTCAAAATGGTGGTTTTATGCAATTTACTCTTAAGGTTGGCTCGTCGGCGAGAAGGAGGGAGTGTGTGTTGCTTTGTGTGGCGATGCGAGACGATGTGTGGCAGCTGTGACGATCACCGGCGGGGAGATCGTCTTTTATAGCCCTAGCGGGGCGTCGAGAGTCTGCATGTCGGGCGACACGTGGCGGGAGCGGGCGGGGCGCGTATCGGTGGCGCTTCACTGCGCCGCCTGTGAGGCCTTAATGAAGGCTGACCGGCGTGGCAGCGCAGCAACCTTGGCATTGATTCCCGCGGGAACCGAGGCGATGAGGACGACGAAGCGGCGTCTCGCTGACTCGGCAGGCCCATCCCATTCCCGTCAAAAATGCGTTCCCCAGCGCCCCCAGACGCCACCAGCACGCCGGGTTCGGCCTGAATCCGCCGGTGCCAGTTTCGGTCTAAATCGGCAAAAAATGGGCTTCTGAGGGCGCGACCCGCCGGCGCAAAGAAACGCCTGGGGAGGGGGGAGGGGGGCTGTTGGGGGCACGGCTGAAGA

4.1. Use Dotter to align the DNA sequence with itself and have a graphic view of the organization of the repeat elements.


4.2. After identifying the different repeat elements with TREP, highlight each of them with a different color in the DNA sequence.

4.3. Bold and underline the LTRs of the nested elements.

4.4. Bold and highlight the inversions flanking the LTRs, and use bold red letters to show the Target Site Duplication (TSD) or Host Duplication.

-Also list the accessions and names of the elements you found and their host duplications, and whether they are complete, SOLO LTRs, or partial sequences.

Example:

Accession / Name / Status / Host Duplication
gnl|TREP|TREP3000 / Helitron "DHH_Helios_42j2-1" / fragment / CAACC

4.5. Describe, in one sentence, your interpretation of this particular arrangement of repeat elements (Which of the detected elements inserted first and which one last?).

ANSWER

This sequence contains three repetitive elements nested inside each other. The original transposon (inserted first) is a complete DNA transposon Mutator " DTM_Annie_AY951945-1", which has a SOLO LTR from a WIS COPIA retroelement "RLC_WIS_AF339051-1 SOLO LTR" inserted inside it second, and a complete "RLG_WHAM_consensus-1" inserted inside of the WIS element most recently.

TREP Accession / Name / Status / Host Duplication
gnl|TREP|TREP2217 / TIR, Mutator, "DTM_Annie_AY951945-1" / Complete / TTAAGGTTG
gnl|TREP|TREP96 / Copia,
"RLC_WIS_AF339051-1" / SOLO LTR / GGCGG
gnl|TREP|TREP3269 / Gypsy, "RLG_WHAM_consensus-1"; / Complete / CGATG

>Tm_43j16

AAATATTAATCAAGAATTTGAAAATTTTTGAACAAAAAAAGATTAAATGTGTATAGAAAATTTGTTGACCATGTACTTAAAAAACTATGATTTCTTATAATGTACTTAAAAAAGGGTTAGTCATAACAATATGCAACGGGGACCTCACTCTTCTAGAACCAACAATATGCAACATTAGGTAGATGGCTCTCCATGTTGTTTGCGGCAGTTCATCTTTGGCAAACATGTCCTTAGAGCATCTCCAGCCGTTCAGCCCATAGGACGCCGAAGAAGAGCCGCTTGGGGCTTTAAGGTTGGAGTAAAATGCAAGAAACCCACTTTGAAGGCTAGGTTTGCAGATAACACTACGTTACATTTTTTGGACAGAAAGCACCACGTCTCGTGTAATCTTGTTGCAAATAACACTGATTGGCGGATTAGAGCGCTTAAAGTATGTTTATGACAGGAGGGTCCAGATTTTGCCCTTGTGGCGTAACGGCTTGGCCATGACGGCGTTAGATGGTGAGTTTGTCTAGATTTACAGACAAGTCCCTGTAATTTTAAAGGAACACCCTTCAGTGAAAATCAAAAAAGCAATCAGCTCACTGGCATACCCAAGCACCCAGAGCACAGAACATCGAGGCCACGATGGTCGCTGGCGAGGGGGCGCGCCATCAGGTCAAGGAGGCGGGGGCAGTGGTCGTCGGGCTCTGGTATGCGGCGGCGGCGGTCGTCGGGCTCCGGGATGCGGCAGCGTAGCGGTCGTCGGGGTCTGGGAGAGCGGCGGCGGTGTTGGAAATATGCCCTAGAGACAATAATAAATTGATTATTATTATATTTCCTTGTTCATGATAATCGTTTATTATCTATGCTAGAATTGTTTTGATAGGAAACTCAGATACATGTGTGGATACATAGACAACACCATGTCCCTAGTAAGCCTCTAGTTGACTAGCTCGTTGATCAATAGATGGTTACGGTTTCCTAACCATGGACATTAGATGTCGTTAATAATGGGATCACATCATTAGGAGAATGATGTGATGGACAAGACCCAATCCTAAGCCTAGCACAAAGATCCTGTAGTTCGTTCGCTAAAGCTTTTCTAATGTCAAGTATTATTTCCTTAGACCATGAGATTGTGCAACTCCCAGATACCGTAGGAATACTTTGCCTGTGCCAAACGTCACAACATAACTAGGTGACTATAAAGGTGCACTACGGGTATCTCCGAAAGTGTCTGTTGGGTTGGCACGAATCGAGACTGGGATTTGTCACTCCGTGTGATGGAGAGGTATCTCTGGGCCCACTCGATAGGACATCATCATAATGTGCACAATGTGATCAAGGATTTGATCACGGGATGATGTGTTACGGAACGAGTAAAAGAGACTTGCCGGTAACGAGATTGAACAAGGTATCGACATACCGACGATCGAATCTCGGGCAAGTACAATACCGATAGACAAAGGGAATTGTATATGGGATAGATCGAATCCTCGACATCGTGGTTCATCCGATGAGATCATCGTGGAACATGTGGGAACCAACATGGGTATCCAGATCCCGCTGTTGGTTATTGACCGGAGAACGTCTCGGTCATGTCTGCATGTCTCCCGAACCCGTAGGGTCTACACACTTAAGGTTCGATGACGCTAGGGTTATAAAGGAAGTTTGTATGTGGTTACCGAATGTTGTTCGGAGTCCCGGATGAGATCCCGGACGTCACGAGGAGTTCCGGAATGGTCCGGAGGTAAAGATTTATATATGGGAAGTCCTGTTTTGGTCACCGAAAAAGTTTCGGGGTTTATCGGTAACGTACCGGGACCACCGGGAGGGTCCCGGGGGTCCACCAAGTGGGGCCACAAGCCCCAGAGGCATACATGGGCCAAGTGTGAGAAGGCACCAGCCCCTAGGTGGGCTAGGGCGCCTCCCACCATGGCCCATGCACCTAGAAGGGGAGAGGGGGCAAACCCTAAGGGCAGATGGTGTTGGGGAACGTAGTAATTTCAAAAAATTTCCTACGCACACGCAAGATCATGGTGATGCATAGCAACGAGGGGAGAGTCTGATCTACGTACCCTTGTAGATCGCAACGGAAGAGTTGACACAACGTAGATGAAGTAGTCGTACGTCTTCTTCCCGATCCGACCGATCCAAGCACCGTTACTCCGGCACCTCCGAGTTCTTAGCACACGTTCAGCTCGATGTGATACGTCTCCGTCGTATCTACTTTTTCAAACACTTTTGCCTTTGTTTTGGACTCTAACTTGTATGATTTGAATGGAACTAACCCTGACTGACACTGTTTTCAGCAAAACTGCCATGATGTTGTTTTATGTGCAGAAAACAAAAGTTCTCGGAATGACCTGAAACTCCACGAAATATATTACAATAAATAATAAAAAATCCTCACCAAAGATGAAGACCAGGGGGCCCACACCCTGTCCACGAGGGTGGGGGGCGCCCCCCCAGGGCGCGCCCCCTACCTCGTGGCCCCCCTAGAGACCCCCCGACTCCAACTCCAACTCTATATATCTGCTTTCGGGAAGAAAAAAATAGGAGAGAAGAATTCATCGCGTTTTACGATATGGAGCCGCCGCCAAGCCCTAAAACCTCTCGGGAGGGCTGATCTGGAGTCCGTTCGGGGCTCCGGAGAGGGGGATTCGTCGCCGTTGTCATCATCAACCATCCTCCATCACCAATTTCATGATGCTCACCGCCGTGCGTGAGTAATTCCATCGTAGGCTTGCTGGACGGTGATGGGTTGGATGAGATTTATCATGTAATCGAGTTAGTTTTGTTAGGATTTGATCCCTAGTATCCACTATGTTCTGAGATTGATGTTGCTATGACTTTGCTATGCTTAATGCTTGTCACTAGGGCCCGAGTGCCATGATTTCAGATCTGAACCTATTATGTTTTCATCAATATATGAGAGTTCTTGATCCTATTTTGCAAGTCTATAGTCACCTATTATGTGTTACGATCTGTTAACCTCGAAGTGACAATAATCGGGATACTTACCGGTGATGACCGTAGTTTCAGGAGTTCATGTATTCACTATGTGCTAATGCTTTGTTCTGGTTCTCTATTAAAAGGAGGCCTTAATATCCCTTAGTTTCCATTAGGACCCCGCTGCCACGGGAGGGTAGGACAAAAGATGTCATGCAAGTTCTTTTCCATAAGCACGTATGACTATATTCGGAATACATGCCTACATTACATTGATGAATTGGAGTTAGTTCTGTGTCACCCTATGTTATGACTGTTACATGATGAAACCACATCCGGCATAATTATCCATCACTGATCCGATGCCTACGAGCTTTCCATATATTGGTTTACGCTTATTTACTTTCCCGTTACTTTTGTTACAACCACTACAAAATACCAAAAACATTACTTTGCTTTCGTTACTCTTTTGTTACCGTTACCATCACTATCATATTACTTTGCTACTAAACACTTTGCTGCAGATACTATGTTTCCAGGTGTGGTTGAATTGACAACTCAGCTGCTAATACTTGAGAATATTCTTTGGCTCCCCTTGTGTCGAATCAATAAATTTGGGTTGAATACTCTACCCTCGAAAGCTGTTGCGATCCCCTATACTTGTGGGTTATCAAGACTAATTTCTGCCGCTGTTGCCGGGGAGCATAGCTCCATTCTTTGTGTCACTTGGGATTTATATCTGCTGGACACTATGAAGAACTTGAAAGATGCTAAGGCAACAATTTATCGCTCAACTACAAGGGGAGGTAAGGAACTGTCATCTAGCTCTGCACTTGATTCACCTTCTGTTATGAGTAAGCTTGCGACACCTAAACCTGCTTCTGCTATTCGTTCTGATATGCCGCATGTTATTGATGATGCTACTTCTGCTATACATGATACTTATGATGAAACTACTTCTATGCTTGATACTACTGTGCCACTTGGTGATTTTCTTGAGGAACGACTTGCTAGGGCTAGAGAGATTGAAAATATTGAATATGATTATGATGATGAAAGTGATGATGAAGAATCACTTATTATTCTTGAGGGTTATTTATTTGATCAAGAAGCTTCTTTAGCTATTTTAGCTTGCAAAGATAGATATGAACTCAAAAGGTTATTAGCTAAATGGAATAAGCAATCTCTAAATGCTAGGATGAAACCCGACCCTGCTTTTGCTACTTCACCTATCTGTGTTACTGATAAGGATTATGAATTCTCTGTTGATCCTGATATAATTACTTTAGTTGAATCTGATCCTTTTCATGGCTATGAATCTGAAACTGTTGTGGCATATCTTACTAAATTAAATGATATAGCCACCCTGTTCACTAATGATGAGAGATCTCGCTACTTTTATATCCTTCAAATATATCCGTTCTCATTAAAGGGTGATGCTAAGATATGGTTTAATTCTCTTGATCTTGGTTGTGTGCGTAGTCCCCAAAATATGATTTACTACTTCTCTGCTAAATATTTCCCTGCTCATAAGAAACAAGCTGCTTTAAAGGAAATATACAACTTTGTGCAAATTAAAGAAGAGAGTCTCCCACAAGCTTGGGGGAGGCTTCTCAAGTTACTTAATGCTTTGCCTGATCATCCTCTTAAGAAAAATGAAATACTTGATATATTTTATAATGGACTAACCGATGCTTCCAGAGATTACCTGGATAGTTGTGCTGGTTCTCTTTTCAGGGAAAGAACACCGGATGAAGCTGAAATCATATTGAATAATATGTTGACAAATGAAAATAATTGGGCACCTCTGAGCCAGCTCCTGCTCCAATTAATGATCCTATTCCTAAACAAACTCCGAAGAAGAGAGGTGTTCTATTTCTCAGTCCCGAAGATATGCAAGAGGCAAAGAAATCTATGAAAGAAAAAGGTATTAAAGCTGAAGATGTTAAGAATTTACCTCCTATTGAAGAAATACATGGTCTTTATTTACCGCCTATTGAAGAAATATATGATCTTAATCCTTTATTTATTGAAGAACCTCCAGACCTCGATAACCCGACTCATGTAGTAAAGGTAAATTCTCTCTATAGATATGATAAAGCTGAAATCCCTCCTGCTAAAATTGCTAGTCAATGCTTGGATGAGTTTGATAATTTTATGTTTAAGCAACAAGACTTCAATGCTTATTTTGGTAGACAATTAAAACAAGATGCTGATATATTAAATATTTGGGTGATTATATGGCTAATATTAGAGGTGAACTTAAACTTGTTAGCAAACATGCTTCTATGGTTACTACTCAAGTAGAACAAGTACTTAAAGCTCAAAAGGAATTGCTCAATGAGATGAATAGTAAGAAAAATAATTATGCTGTTAAAGTGGCTACTAGAACTGGTAGAATGACTCAGGAACCTTTGTATCCTGAAGACCACCCTAAGAGAATTAAGCAAGATTCTCAGAGAAATAATATTGATGCACCTAGTTCTTCTGAAAGGAAGAAAAAGAAAAATGATAGAACTGTGCAAACTTCTAGTGAACCTATTGCTGAACCACCTGATAATCCAAATGGTATATCTATGTCCGATGCCGAAACACAATCTGGCAATGAACATGAACCTAATGAAAATGTTAATGATGATGTTCATAATGATGCTCAACCTAGTAATGATAATGATGTAGAAATTGAATCTGCTGTTGATCTTGATAACCCACAATCAAAGAATCAACGTTATGATAAGAAAGACTTTGTTGCTAGGAAACATGGTAAAGAAAGAGAGTCTTGGGTTCAGAAACCCATGCCTTTTCCTCCCAAACCATCCAAGAAAAAGGATGATGGGGATTTTGAGCGCTTTGCTGAAATGATTAGACCTATCTTTTTGCGTATGCGATTGACTGATATGCTCAAAATGAATCCTTATGCTAAGTACATGAAAGATATTGTTACGAATAAAAGAAAGATACCAGAAGCTGAAATTTCCACCATGCTTGCTAATTATAATTTTAAGGGTGGAATACCAAAGAAACTTGGAGATCCAGGAGTACCTACTATACCATGCTCCATTAAAAGAAACTATGTTAAAGCTGCTTTATGTGATCTTGGAGCCGGTGTTAGTGTTATGCCTCTCTCTTTATATCGTAGACTTGACTTGAATAAGTTGACACCTACTGAAATATCTTTGCAAATGGCTGATAAATCAACTGTTATACCTGTCGGTATTTGTGAGGATGTGCCTGTTGTGGTTGCAAACGTTACTATTTTAACGGACATTGTTATTCTTGATATTCCTGAGGACGATAGTATGTCTATTATTCTTGGAAGACCTTTTCTTAATACTGCAGGGGCTGTTATTGATTGCAACAAAGGCAATGTCACTTTTCATGTTAATGGTAATGAGCATACAGTACACTTTCCGAGGAAACAACCTCAAGTTCATAGTATCAATTCTATCGGAAAAATTCCATCGATTATATTTGGAGGTTTTGAATTTCCTCTTCCTACTGTCAAGAAGAAATATGATATACTTATTATTGGGGATGTGCATATTTATTCGAAATTTCGCCGGTTTCATGTTAATCGGAATGAGTTTGTTAACAAGACTTGATCAACCTTGTTAGTGGATTCCTTTTGACGAGCATGAGATGGATGAAACTAGAAGCACAACCTTCTGTACTCTCTCTATACTTTCTGTTATTTAGTTGAAATAAAGTAAAAATAGTATTTTTCTGTCTGTTTCCGGATTTATCCGTGCAATATAAAAATGTCCAGAAAATAAAAGTCCTCAGAATGACATGCCAATTTAATATGATTTTTTCAGGAATATTTGAGGATTTATGGTGCAAAAATTACCGCGGGGGGAGCTGCCACCTGGTCACGAGGGTGGAGGGCACGCTCTATCCCCTAGGCGCGCCCCCCTGCCTCGTGGGCCCATGGTGGCCCTCCTCCACTTATTCCTGCACCCATCCTCTTCTTCTTCCTCCCACAAATACGAAAAACCGGCTCAAGCACGAGTTCCAGCCCTCTTTGCTGCCATTTTCGATCTCCTTGCTCAAAGCACCTCCCACAAAACTGCTTGGGGAGATTGTTCCTTGGTATGTGACTCCTCCATTGGTCCAATTAGTTTTTGTTCTAGTGCTTTATTCATTGCAAATTTTTGCTGCTTAGGTGACCCTGTTCTTGAGCTTGCATGTCAAATTTATATGGTCAAAAGTAGTTTTGATGCATGATACAGGCTCTAGGCACTTGTAGGAGTGGTTGCTATAAATATTATTGAGTTTGGTTTACTTTTATTTTGAAGTTACTAAAAATTTCAGAATTTTTCAGAAAATGATGAGGAGACTTTTGAGGGGCTCATCGAGCCAAAGCTCGAAGGATAAGGCACCGAAGCCTAAGTATAATTTTCCTCGCACCGCAGAGGTTCGGGCGTGTGAATGGCCTTCCGAGGATTTCTTAAGAGCAGCCGGGATTTATGAAGATTTTCATGAGTTGGCTCAGACTGCAGGCCTCACCGCTTTCCTCCACGACCAATGTGATCAGTATCTCTTGCTCACAAATACCTTTATGCAAAATTTTCATTTTCATTCTAAGAACTCACCACCTACGGTGGAATTTCATTTATATGATGAGCATAAGGAGATGTCACTTTATGATTTCTGTCAGATTTGTTTAGTCCCTTTTGGGGGCAAGACAGAAGAACCACATCGTGATGATGTAGAGGGATTTATTGACACTATCACTATAGGGGGAAAAAGGAAGGTTTTCGATGCACGAATCACTAGCATACATTTTCCTGTTTTATGTTATTTTGCATTATTTGCTAGTCGTTGCTTAATTGGTCGCGGAAATTGTGGAAACCTGAGTATCCCTGATATTATCATTTTGCACCACGGCTTATACGGTGATAACTCTTTTAGTATGGGCGACATTATTGCTAGACGGTTAAGTATGAACCGTACTAAGGGTCCCATCTTTGTAGGCATTTATGCTTCACGCCTCGCTGCACATTTTAACATACCTATTAGGCATTATGAGAAGGAAGAAAATATGCTGCCTCGTGTTTATTTAGATTATAAAAGTATGGTAGCACATGATTTTATTGTTAAGAATAGGGAAGAAGAGCTTAAATATCAATTGTTCTTTAATAAACATCATCCTGAGACTATTACCTTGCCTGCTCCTTCTTTGTTTGATTTGACTGTAGGCACATACCTTGTTCCATTGGCGGCTATTCGCGCCTACCGGAACCCTGCACCAGCCCCGGAATCGGAACCACAAGAGGAGCCTCCACGACAGTCTGTTTATTCTTGGGATCCAGAGATGGCTGTCAGCCAATGGCAGTCAGAGTCTTCTTCATCATAGTACGACCCCAACAACTATTATTATGGATATCAGCCAGGCCAGCCGTGGCCATAGACCAACTTAGGTGAAAAGCCTAAGCTTGGGGGAGTACGTATTTCTCACTGACATTACATTTATGTTCACACACACTCATTGCTAGATGTCGGTGCTCATACTTTTTCATTGTATCATCCATGCTAGTTTATTTCCTTTTATGCTTTCTTTTTGTGTGTTTAATAAACCTTAAGAAAAAAATTAGTAGTAGTTTACTTTTCTTGCCGTAGTAATAATTAAAAAGAAAACCCAAAAATATTTCCCGTTCTTCTTTTGCTTGTTGGGAGCTTTCCTGTGTAAATAGTTTTATTTCTTTTTCTTTTCTTTGGTGGTTGGTAGGAGAAGACCATGATTAAATTGTTGGAGTGGCTCTTATATGCTTTATTGTTGGTTTAACCAAGAGCCCATATTGCTTTGTCTTCTCGTGTTTATTGAATGCTCACAGATTCCAGCTTAGTCCAATGCATGTGCACTCTTATTATTATTCACATTGTTCGCTTATGCAAGTGAAAGGCAATTATGATGATATATGATGGACTGACTGAGAGGAGAAAAGCGGGTATGAACTCGACCTCTTTTGTTTTTGTAAATATGATGAGTTCATCGTTCTTGATTTAGCTTGTTATGAATAAACATGTTTGCAATGACAATTAGATATCATAGTTGCTTGTGCCATGCTTGATTATCTATGAGTTATAATGGTTTACCTTGCGTGCCAACATGCTATTGAGATGGTTATGATGTGGTATGATGGGGTGGTATCCTACTTTGAATGATTTAAGTGACTTGACTTGGCACATGTTCACGCATGTAGTTGAAACAAATCAACATAGTCTTCACGATATTTATGTTCATGGTGGATTATATCCTACTCATGCTTGCATCCAATGTTTATTAATTTTAATGCATGTACATGGCTGTTGTCGCTCTCTAGTTGATCGCTTCCCAGTCTTTTGCTAGCCTTCACATGTACTAAGCAGGAATACTGGTTGTGCATCCAATCCCTTAAACCCCAAAGTTATTCCAGATAAGTCCACTATACCTTCCTATATGTGGTATATATCTGCCGTTTCAAGTAAATTTGTATGTGCCAAACTCTAAACCTTCAAATAATCATCATGTTTTGTATGCTCGAATAGCTCATGTATCAACTAGGGCTGTCTGTATCTTCCACGCTAGGCGGGTTATTCTCAAGAGGAGTGGACTCCGCTCCTCACTCACGAGAAAATGGCTGTTCACGGGATGCCCAGTCCCATGCTTTATGAAAACTAAATCAAAATAATTGCAAAACAAAACTCCCCCTGGGACTGTTGTTAGTTGGAGGCACTTGTTGTTTCGAGCAAGCCATGGATTGATGTTTGTTGGTGGAGGGGGAGTATAAACTTTACCATTCTGTTTGGGAACCGCCTATAATGTGTGTAGCATGGAAGATATCGCCATCTCTTGGTTGTTATGTTGACAATGAAAGTATACCGCTCAAAATGTTAATTATCTCTATTTCAAAACCGAGCTCTGGCACCTCTACAAATCCCTGCTTCCCTCTGCGAAGGGCCTATCTATTTACTTTTATGTTGAGTCATCACCCTCTTATTAAAAGCACTAGCTGGAGAGGGCAGCTGTCATTTGCATTCATCACGGTTAATTTATATTGGGTGTGACTATGATTGGATCTCTTTTACCATGAATTACAATGTCTAGTCAGTCCTTGATCTTTAAAGGTGCTCTGCATTTATGTTTTGCGGTCTCAGAAAGGGCTAGTGGGATACCATCTTGTTATATCATATTATGATTGTTTTGAGAAAGTGTTGTCATCCGAGATTTATTATTATGGCTTGCTAGTTGATTATGCTATTGATATGGGTAATCATGAGACCTGAGAACTATTGCAAATATGGTTAGTTATAATCTTTGCTGAAAACTTGAATGCTAGCTTGACATACTTACAACAACAAGAGCAAACATAGTTTGTAAAAGTTTTTCTTTATTTCTTTCAGTTTGTCAACTAAATTGCTTGAGGACAAGCAAGGGTTTAAGCTTGGGAGAGTTGATACGTCTCCGTGTATCTACTTTTCCAAACACTTTTGCCCTTGTTTTGGACTCTAACTTGTATGATTTGAATGGAACTAACCCGGACTGACGTTGTTTTCAGCAGAACTGCCATGATGTTGTTTTATGTGCAGGAAACAAAAGTTCTCGGAATGACCTGAAACTCCACGGAATATATTACAATAAATAATAAAAAATCCTCGCCAAAGATGAAGACCAGGGGCCCACACCCTGTCCATGAGGGTGGGGGACGCCCCCCCTCTAGGGCGCGCCCCCTACCTCGTGGGCCCCTTGGAGACCTCCCGATTCCAACTCCAACTCTATATATCTGCTTTCGGGGAGAAAAAATAGGAGAGAAGAATTCATCGCGTTTTACGATACGGAGCCATCACCAAGCCTAAAACCTCTCGGGAGGGCTGATCTGGAGTCCGTTCGGGGCTCCGGAGAGGGGTATTCATCGCCGTCGTCATCATCAACCATCCTCCATCACCAATTTCATGATGCTCACCATCGTGCGTGAGTAATTCCATCATAGGCTTGCTGGACGGTGATGGATTGGATGAGATTTATCATGTAATCGAGTTAGTTTTGTTAGGATTTGATCCCTAGTATCCACTATGTTCTGAGATTGATATTGCTATGACTTTGCTATGCTTAATGCTTGTCACTAGGGCCCTAGTGCCATGATTTCAGATCTGAACCTATTATGTTTTCATCAATATATGATAGTTCTTGATCCTATCTGGCAAGTCTATAGTCACCTATTATGTGTTATGATCCGTTAACCCCGAAGTGACAATAATCGGGATACTTACCGGCGATGACCATAGTTTGAGGAGTTCATGCATTCACTATGTGCTAATGCTTTGTTCCGGTTCTCTCTTAAAAGGAGGCCTTAATATTCCTTAGTTTCCATTAGGACCCCGCTGCCACGGGAGGATAGGACAAAAGATGTCATGCAAGTTCTTTTCCATAAGCATGTATGACTATATTCGGAATACATGCCTACATTACATTGATAAATTGGAGCTAGTTCTGTGTCACCTTATGTTATGACTGTTACATGATGAAACCACATCCAGCATAATTATCCATCACTGATCCGATGCCTACGAGCTTTCCATATATTGGTTTATGCTTATTTACTTTCCCGTTGCTATTGTTACAACCACTACAAAATACCAAAAACATTACTTTGCTTTCGTTACTCTTTTGTTACCGTTACCATCACTATCATATTACTTTGTTATTAAACACTTTGCTGCAGATACTAAGTTTCCAGGTGTGGTTGAATTGACAACTCAGCTGCTAATACTTGAGAATATTCTTTGGCTCCCCTTGTGTCGAATCAATAAATTTGGGTTGAATACTCTACCCTCAAAAGCTGTTGCGATCCCCTATACTTATGGGTTATCACGATGGCCACAAGCCCCAGAGGCATACATGGGCCAAGTGTGAGAAGGCACCAGCCCCTAGGTGGGCTAGGGCACCTCCCACCATGGCCCATGCGCCTAGAAGGGGAGAGGGGGCAAACCCTAAGGGCAGATGGGCCCTAAGGCCCATGCTCCCTGCGCCTCCCTCTCCTCCCCTCTTGGCCGCCCCCCTCAAACCCATCTAGGGCTGCCGCCCCTCTAGGGATGGGAACCCTAGAGCGGGCGCACCCTCCTCCCCTTCCCCTATATATACTTGAGGCCTAGGGGCTTCCCAAACACGCGATTTGATCTCTCCCTGTTGGTGCTGCCCTACCTCTCTTTTCCTCATATCTAGCAGTGCTTGGCGAAGCCCTGCTGGATCACCACGCTCCTCCACCACCACCACGCCGTTGTGCTGCTGCTGGATGGAGTCTTCCTCAACCTCTCCCTCACTCCTTGCTGGATCAACGCATGGGAGACATCACCGGGCTGTATGTGTGTTGAACGCAGAGGTGCCATCCGTTCGGCACTAGGATCATCGGTGATTTGGATCACGACGAGTACGACTCCATCAACCCCGTTCTCTTGAACGCTTCTGCTTAGCAATCTACAAGGGTATGTAGATGCACTCTCCTTCCCCTCGTTGCTGGTTTCTCCATAGATAGATCTTGGTGACACGTAAAAAAATTTTGAATTTCTGCTACGTTCCCCAACAGGCGGCGGCGGCAGTCGTCGGGATCTGGGAGGTGACGGCGAGCGAATCGCGCGATGACGGGCCATAGGTGGCCGTGCCAGCGCGGCGGGGGCGCGCCAACGGGAACTGGGAGGCGGCGGAGAGAGATCGATGCTCATCCATGGCGGCGACGTCCTTGGCGCTCGACCATGGCGGCGGGACGCTCGTCCAGCGTCCATGGTGATCCAGGCGGCCGGGAGGAGTAGAGGGAGGGCTGGGCTCGTCGCTGGCACCGGCGGTCACGGAGCAGTCCGGCCGTGTGGTGGAGGTTGGGATCTGGCGTGGGTGGCGCCTGCTTCTCCACCACCAGAAGAAGGATCGAGGAGAAATTTCTTGTGAGGGAAAAAGATAACGGAAGAATCCGGCTGATAGGTGGACCCCATGTCTACTCTGACGCCGTTTCGCCCCAAGCCGTTATGTCACATGGGCAAAATCTGGAACCACCTGTCATAAACACACTTAAAGCACTCTAATCCGCCAATCAGTGTTATTTGCAATAAGATTACACAAAACGTGATGCTTTTTGTTCAAAAAATGTAACGTAGTGTTATCTGCAAACCTAGCCTTCAAAATGGTGGTTTTATGCAATTTACTCTTAAGGTTGGCTCGTCGGCGAGAAGGAGGGAGTGTGTGTTGCTTTGTGTGGCGATGCGAGACGATGTGTGGCAGCTGTGACGATCACCGGCGGGGAGATCGTCTTTTATAGCCCTAGCGGGGCGTCGAGAGTCTGCATGTCGGGCGACACGTGGCGGGAGCGGGCGGGGCGCGTATCGGTGGCGCTTCACTGCGCCGCCTGTGAGGCCTTAATGAAGGCTGACCGGCGTGGCAGCGCAGCAACCTTGGCATTGATTCCCGCGGGAACCGAGGCGATGAGGACGACGAAGCGGCGTCTCGCTGACTCGGCAGGCCCATCCCATTCCCGTCAAAAATGCGTTCCCCAGCGCCCCCAGACGCCACCAGCACGCCGGGTTCGGCCTGAATCCGCCGGTGCCAGTTTCGGTCTAAATCGGCAAAAAATGGGCTTCTGAGGGCGCGACCCGCCGGCGCAAAGAAACGCCTGGGGAGGGGGGAGGGGGGCTGTTGGGGGCACGGCTGAAGA

5. 20 points This is a genomic DNA sequence from a wheat BAC clone. Use all the tools you have learned to identify the genes and repetitive elements present.

5.1. How many genes are present this sequence? Highlight the exons of each gene with different colors and indicate which color corresponds to each exon and each gene.

5.2. Are there any repetitive elements? Highlight them with different colors and indicate which color corresponds to each repetitive element.

ANSWER

There are two genes in this sequence, one with two exons at the beginning, and one with three exons at the end. A Gypsy retroelement called Fatima sits between them

>BAC_Clone

AGAGCATAGTAGAGTTAGCCGGCCTTGGACCGACAATGTGAGAGGAGAGGTCGGCGACGGTAGGTTGTCTCATCATGGAGAAGAATGAAAAATCCATGGCGACGTGAGGTGGGAGGAGGGGACGAGTGGCTAGAAATAGACAAGGATAAGGCAACCATGCACACGGGGCGACCGATTCCATCGCGCCACCAACTGATGTGAATCGTTTACCTGTATTTATGTGCATGCGCCCATATTTATGCGCAATCGGCCACACACTGCACTGCACAATACTCCTACCTGCAACAAACAAAGAAACCTAGTAGCAGCTAACCAAACCATGGACCACAGCGTGCTTCTCCTGCTCGCCTCCTTGGCCGCAGTCGCCGTCGCGGCTGTCTGGCACCTCCGAAGCCATGGCAGACGAACAAAGCTGCCTCTGCCGCCGGGGCCGAGGGGTTGGCCGGTGCTGGGCAACCTGCCGCAGCTAGGGGCCATGCCGCATCACACCATGGCTGCTCTCGCCCGCCAGCATGGCCCCCTCTTCCGCCTCCGCTTCGGCAGCGTCGAGGTCGTCGTCGCAGCGTCGGCCAAGGTCGCCCGCAGCTTCCTCCGCGCGCACGACGCCAACTTCAGCGACCGCCCGCCTACCTCCGGCGCCGAGCACCTCGCCTACAACTACCAGGACCTCGTCTTCGCGCCCTATGGCGCCCGCTGGCGCGCCCTCCGCAAGCTCTGCGCGCTCCACCTCTTCTCCGCCCGTGCCCTCGACGCCCTCCGCACCATACGGCAGGACGAGGCCCGACTCATGGTCACGCACTTGCTCTCTTCCTCCTCGCCGGCCGGGGTGGCGGTCAACCTGTGCGCCATCAACGTGTGTGCTACCAACGCGCTGGCACGGGCCGCCATCGGGAGGCGCATGTTCGGCGACGGCGTCGGCGAGGGTGCCAGGGAGTTCAAGGACATGGTGGTCGAGCTCATGCAGCTCGCCGGCGTCCTCAATATCGGCGACTTCGTGCCCGCGCTCCGCTGGCTTGACCCGCAGGGCGTCGTCGCCAAGATGAAGAGGCTGCACCGCCGCTACGACCGCATGATGGACGGCTTCATCAGCGAGAGGGGCCAGCATGCCGGAGAGATGGAAGGGAACGACCTGCTGAGCGTGATGCTGGCGACGATGCGGTGGCAGTCGCCCGCAGATGCCGGCGAAGAGGACGGGATCAAGTTCACCGAGATTGACATCAAGGCTCTCCTCCTGGTATGCACAAATTGTTACATGCCCATTTGTTTGGCCATTCATATTTTGTACGTCTAGGTAAGGTATTTGTTGATGTCAAGTCAAAGATTTTGGATTGTCATAGCTATATTTTTCATTTTAATTAATGGGATACAAATATTGGTTCTTTTAGAATTTATTCACGGCCGGGACAGACACGACGTCGAGCACAGTGGAGTGGGCGCTGGCAGAGCTCATACGAGACCCTTGCATCCTCAAGCAGCTGCAGCACGAGCTCGATGGCGTAGTGGGAAATGACCGTCTTGTCACGGAAGCCGACCTGCCACGCCTCACTTTCCTCGCCGCCGTCATCAAGGAGACATTCCGTCTACACCCGGCAACGCCGCTCTCCCTTCCCCGGGTGGCCGCTGAGGACTGCGAGGTAGACGGCTACCATGTTTCCAAGGGCACCACCCTCATCATGAACGTGTGGGCCATCGCCCGTGACCCGGCCTCATGGGGCCCCGACCCATTGGAGTTCCGGCCGGTCCGCTTCCTCCCGGGCGGATTGCATGAGAGCGCGGATGTGAAGGGGGGCGACTATGAGCTCATCCCGTTTGGGGCGGGTCGGAGGATATGCGCAGGCCTCGGCTGGGGCCTTCGGATGGTGACACTCATGACTGCCATGCTGGTGCACGCATTCGACTGGTCCTTGGTTGATGGAACGACGCCCGAAAAACTTAACATGGAGGAGGCCTATGGTCAGACCCTGCAAAGGGCCGTGCCTCTAGTGGTTCAGCCTGTGCCTAGGTTGTTGTCGTCGGCGTACACAGTGTGACGCATGTTTTATCATGGTTCTTTGTTGTCTTTCAGTTCCTTTCAGTTTGATGTACCTTGTTTAACCAATTCATAATATTCAAATCATTTGTAAGAAAAGTGTACACGGCACGATTATTACTCTATTTTCATTTATCTCTTTGCAAATGGATGGTAAGGAAGATTGGAGAGCTATGTGTATGTCACTCTCGTGATATTTAGGTTTTTGTTGCACACAGCCGGAACTTACCCCAAATAGGACTTACGCCTCCAAGCACACCATAGGGTCGAGAACCAACAAACTCACAACCAAATTGAAGATCTTATTGAGCATCATTGGCGATTTCATATCACTTGATTCTTATGCTTTTATCATGTTTACCCGCTGAGGGAGTCCTGGATTAGGGGGTGTCCGGATGGCCGGACTATACCTTCAGCCGGACTCCTGGACTATGAAGATACAAGATTGAAGACTTCGTCCCGTGTCCGGAAGGGACTTTCCATGGCGTGGAAGGCAAGCTTGGCGATACGGATATGCAGATCTCCTACCATTGTAACCGACTTTGTGTAACCCTAACCCTCTCCGGTGTCTATATAAACCGGAGGGTTTTAGTCCGTAGGACAACATACACAACAACAATCATACCATAGGCTAGCTTCTAGGGTTTAGCCTCTCCGATCTCATGGTAGATCTACTCTTGTACTGCCCATATCATCAATATTAATCAAGCAGGACGTAGGGTTTTACCTCCATCGGGAGGGCCCGAACCTGGGTAAAACTTCGTGTCCCTTGCCTCATGTTACCATCTGGCCTAGACGCACAGTTCGGGACCCCCTACCCGAGATCCACCGGTTTTGACACCGACATTGGTGCTTTCATTGAGAGTTCCTCTGTGTCGTCGACGTCAGGCTTGATGGCTCCTACGATCATCAATGGCGATGCAGTCCAGGGTGAGACCTTCCTCCCTGGATAGATCTTCGTCTTCGGCGGCTTCGCACTGCGGGCCAATTCACTTGGCCATCTGGAGCAGATCGAAAGCTATGCCCCTGGCCGTCAGGTCAGATTTGGAAGTTTAAACTACACGGCTAACATCCACGGGGACTTGATCTTCGACGGATTCGAGCCACTGCCGAGCGCGCCGCACTGTCACGACGAGCATGATCTAGCTCTGCCGCTGAACAGTGCCCTGGAGGCCACACCCGCATCGGCTTAGACCCTTAATTCGGAGCCAACTGCGCCGATCGAGGATGGGTGGTTGGACGCTGCCTTAGGGCTGCGATCCCAACGGCGATCGAGCCGAACACCAGCCCCGCACTCTGCGAGACTCATGACTCCAAGGAGCCGGACTCCTCTCCGGACCCCGAACCCTCCACGCCCCTGCCGATCGAATCCGATTGGGCACCGATTATGGAGTTTACTGCCGCGGACATCTTTCAGCACTCGCCTTTCGGCGATATTCTAAAGACACTAAAGTCTCTCTCTTTATCAGGAGAGCCCTAGGCGGACTACGGTCAGCAAGGTTGGGATTCGGACAATGAAGAAATTCAAAGCCCACCCACCACCCACTTTGTAGCCACTATTGACGACTTAACCGACATGCTCGACTTCGACTCCGAAGACATCAACGGTATGGACGCCGATGAAGGACACGATGAAGAACCAGCGCCTATCAGGCCCTGGAAAGCCACCTCGTCATATGACATCTACATGGTGGACACCCCAAAAGAAGGATATCCGATGGAACAACGGAGGATGACCCCTCCAAGAAACAGCCTAAGCGCCGGCGTCAGCGGCGCCACTCAAAATCCCGCCAAAACAAATACGGTGATTCCAGCATGGGAGATAATAATACTCTGGAAAGTGCCGAAGAAAACCCCCTCCAGCAAGATTTAGCACAGGAGGATGGAGAAACTAGCCCTCATGAGAGAGCAGCAGACAGAGAGATTGAGGACGATAATTATATGCCTCCCTACGAAGACGAGGCAAGCCTCGACGATGACGAATTCGTCGTGCCAGAGGATCCCGTCGAGCAACAGCATTTTCAACGCAGGCTTATGGCCACGACAAGAAGCCTCAAGAAAAAACAGCAACAGCTTAGAGCTGATCAAGATTTGCTAGTCGACAGATGGACTGAAGTCCTTGCGGCCGAAGAGCATAAACTCGAATGCCCCTCCAAGAGCTACCCAAAGCGCAGGTTGCTACCCCGATTAGAGGAGGAAGCACTTGATGCGGCCGACCAGCCACCTCGTGGCCGCGACAGAGAGGCCTCTCGGCCCTCCACTCAAGCCGCACCCCGTACCAAGGCACGGGAATATGCGCCGGACTTACGAGATATGTTGGAGGACAAGGCAAGGGAAACAAGATCGATCCACGGATCGCGTGGGAGCCCCACGGCTCGAGATGGTAACCGTCACGCCGGCCACAAATTCGGCAGGGCCGAGCACAGTAGACAAAGCTCATTGGAGCTACGTCATGACATAGCCCAGTATAGAGGCACCGCACACCCACTATGCTTCACAGATGAAATAATGGATCATCAAATCCCCGAAGGTTTCAAACCCGTAAATATCGAATCATATGATGGCACAACAGATCCTGCGGTATGGATCGAGGATTATCTCCTTCATATCCACATGGCCCGCGACGATGATTTCCACGCCATCAAATACCTCCCACTCAAGCTTAAAGGACCAGCTTGGCATTGGCTTAACAGCTTGCCAACAGGATCAATCAGTTGTTGGGAGGACCTGGAAGCCGCATTCCTCGACAATTTCCAGGGCACTTATGTGCGACCCCCAGATGCCGATGACCTAAGTCACGTAATTCAGTAGCCAGAGGAATCGGCCAGGCAATTCTGGACACGGTTCCTAACAAATAAAAATCAAATAGTCGACTGTCCGGACGCTGAGGCCCTAGCAGCCTTCAAGCACAATATCCGTGACGAGTGGCTTGCCCGGCACCTTGGATAGGAAAAGCCAAAGTCTATGGCAGCACTCACGACACTCATGACCCGCTTTTGCGCAGGAGAAGACAGCCGGCTTGCTCGTAGCAACAATATGACCAAGAACCCTGGTAATTCGGATACCAGGGACAGTAGTGGCAGGTCGCATCGCAACAAGCAGAAGCGCCGCATTAACGGCGACAATGCTGAGGATACGATAGTTAATGCCGGATTCAGAGGCTCTAAATCCGGTCAGTGGAAAAAGCCATTCAAAAGGAACCCTAGGGGCCCGTCCAGTTTGGACTGAATACTCGACCGCTTGTGCCGGATACATGGCACCCCTGAAAAGCCGGCCAATCACACCAACAGGGATTGTTGGGTGTTCAAGCAGGCAGGCAAGTTAAACACCGAAAATGAAGACAAGGGGTTGCATAGCGATGACGACGAGGAGCCCCGGTCGCTGAACAACAGTGAACAGAAGGGTTTTCCCCCACAAGTGCGGACGGTGAACATGATATACGCAACCCACGTCCCCAAAAGGGAGCGGGAAGCGCGCGTTAAGGGACGTATACGCAGTAGAGCCAGTCGCCCCAAAGTTCAACCCATGGTCCTCCTGCCCGATCACCTTCGATCGGAGGGACCATCCCACTAGCATCCGTCATGGCGGATTCGCCGCATTGGTTCTAGACCCAATTATTGATGGATTTCATCTCACTAGAGTCCTTATGGACGGCGGCAGCAGCCTGAACCTGCTTTACCAGGATACAGTGCGGAAACTGGGCATAGATCCCTCGAGGATTAAGCCCACCAAAACGACCTTTAAAGGCATAATACCAGGTGTAGAGGCCAACTGTACAGGCTCAGTCAAACTTGAAGTGGTCTTCGGATCTCTGGATAATTTCCGAAGCGAGGAGTTAATCTTCGACATAGTCCCATTCCGTAGTGGCTATCACGCACTGCTCGGGCGAACCGCATTCGCCAAGTTCAATGCGGTACCGCACTACGCATACCTTAAGCTCAAAATGCCAGGCCCTTGAGGAGTAATTACGGTCAATGGAAACACCGAACGCTCCCTCCAAACGGAGGAGCACACGGCGGCCCTTGCAGCAGAAGTACAAAGCAGCCTCTCCAGGCAGTTCTCCAGTCCGGCCATTAAACGACCGGACACCGTCAAGCGCGCCCGGAGTAACCTACAATAAGACCGCCTGGCACGACCCGAGCAGGTGTAGCAATGCGGCCCCAACCCCAGCCCTCACAAAAATGCGATACCAGTACTTCACGTACATAACTACACTCTAGAAATACCATGGGTACAGGAGGGAGGGGCACCATCACGGCACGCCCGAAACACGGCTTAAACCGCACCAGGGGCTGCCGATTTTTTAATTCTCTCTTACTTTCAGGACTCCACTCTTCGGAAGGCCCGTTCGGCAGTTCCATTGCCGAACAAACGATGCAAGAACCAGGGAAGCAGACAAGCCACACCGCATTACGGAACTCCCAGGTGGTCTCTATCACGAGCAGTATACCTGTTTCGCATACTATTCCACAGCCTGCCCCTAGAATGGACACGTTAAATAGTCCAACCTTTTGCTTATTGCATTATTTATATCGTTCTGCTTTGATCGCAGCCCTCTTTAATAAACAATGAATTGCTTTTGTCTATTTTTGCATTACTCTTTTTTAATATATGTTCCTTAACGACATGTTGCATCTGTACACTTTGGTACGACCAAAATACGCCAGGGGCTTTAGTACCCCTCAATATGGTGTGAGAAGTCCGAACACTTTAACAAGTGCGGCTCCCCGAACTTATAGCATTATATGCATCGGCTCCGAATCATGTCTTGGGTCAATAGTTGGGTTTGCCCGGCTCCTATGTTTTGGTGCCTTACGTTCCGCTATATCGGCTAAGGTAGCACTAGGAGAACCACTGCGATTGTGCCCCAGTTGAGCTGGGTCGAGCACCTCAGTGGAGAAAGCTAAAACTGACTGTCATGATGAAGCGAGAGCTGGTCGTTGTTCGAGAGATTTTTCGAGTCCCTAAAGACTTATGCCGCTTAGAGTGAGGATCCGGCTCTGTCCGGCCAAGGTGTGGATAGCGCCCCGAACTCGGTTTTCCGAATACCAGGGGCTTCGCCGAAATTTAAAATTATAGAATTCTATGGCTAAGTGAGAGTGTTCACGCATTATAGTCCGATTACCTTGTTCGTTGGGCTGAGCGCCTCCCTTGAAGGACCCAAACATGGGAAAAAGAGCGCTCAGGTTTATCCCTGAACACCCCAGCACTAGCGGCACGGGGGCAGAAGCCGACGACTTGCCATCTCTCAGAATTGATAAACAGCCGCACAGAAGGTAATATTTTAAATTCCAACAGCATTGCTTAGCGCATATGAACAAGTTTTCAGCGCACAGGACAAAACGAGCAAGTTTCACTCAAAAATTACATACCTGGAACATTCATCCGCCACAAGGCAGACACCCTTCAGAACATCCTTATAATAATTCTCGGGCTTGCGATGCTCTTTCCCCGGCGGTGGCCCATCCTTCACAAGCTTCTTAGCATCCAGCTTGCCCCAGTGCACCTTAGCACGGGCAAGGGCCCTACGGGCACCTTCAATGCAGACGGAGCGCTTGCTGACTTCGAGCCTTGGACAGGCCTCCACCTGCCGCCGCACCAGCCCGAAATAGCTCCCAGGCAGAGCCCCTCCAGGCCACAGCCGAACTATGAGGCCCTTCATGGCCTGTTCGGCCGCCTTGTGGAGCTCGACCAGTTGCTTCAGCTGGTCGCTCAGGGGCACGGGGTGTCCGGCCTCAGCATACTAAGACCAGAACACCTTCTCTGTCGAGCTGCCCTCCTCGGCTCGATAGAATGCGGCGGCATTGGACACACTCCGGGGAAGATCTACGAACGCTCCTGGAGAGCTCCGGATTCGGGTAAGTAACAAGTAACTCATGTTTATGTGTTTGCTTTTCATAAAGAATGCCTTACCTGCCGCTATCTTCTTCACCTCATCCAACTCCTAGAGGGTCTTCTAGGATTCGGCCTTGGCAGACTTGGCATTTTCAATAGCCACCGTGAGCTCGGACGCTCGCGTCTTTGAGTCAAGCTCCAAACTCTCATGTTTTTCCATGAGAGCCTGGAGCTCTTGCCGTACCTTGCCAACCTCGGCCTCATACTTCTCCCGCTCGGTGCGCTCCGTGGCCGCCCTCTTCTCGGCCTCGACCAGCGCTTGCTTAAGGGTCGCCACCTCAGATGTGGCCCCTACAATAGCCACAATAATCCTGTCATTTTTTGCAATTGCACCTTTTTATATATATTTAAACAAGGTATTTCTTACCTTCATTGTCCTCGAGCTGCTTCTTGGCACGCCCGAGCTCTTGCTCGGACCGCTCGAGGTTCTGCTTTAGCGTGTCCACTTCCGCAGTCAGTGTGGCGGACGCAAGCAGAGAAGCCTGCATACACATATTGACTCCTTTTTGTTAGACTCCTGCGAATTTTATTTGATCCTCTATTCGGCTTTTCTTCTCGAACACCAAACAGAGCATCAGGGGCTACTGTCTATGCGGTAATATTATTTACACATTCTTTACTTACCTCAAAGCCTGTTAAAAGGCTAGTACAAGCTTCAGTCAGTCCGCTCTTGGCGGACTGAACCTTCTGGACCACCGCACTCATAATGGTGCGGTGCTCCTCGTCGATGGAGGCGCCTTGGAGCGCCTCCAACAGATTGTCAGGCGCCTCCGGTTGGACGGGGGTCACCAACACGGTGGGCTTGCTCCTCTTGGAAGGAGGTCCCCCGCCGGACTCTGGAACCTTTGAAGGTTCCGGAGCGGTGTCCGGCCTAGAGCTGGACTTGGAGCCCTGGGGGGTTTCATCCCATTTACTCCTGGAGTCCGGGAGGTCGCTTTGCGGCGCCTCCAGGACCACCTCCTCCCCGGCTTGGAACCTGTTGGGACAACACTTTGGTGTCGTCCGTAGGGCGGGGGGAGGAAGCCGTCGGAAGCGAGTTCACATCCGACGAGTCCAGTGAGCCTCTCAACGACGCGTCGAGCCGGTCTTTGGGTGGGCTGCATAAACATATTCGACATAAGGGAAAGCTGTGCAATGAGCGAATACTATGAATGACTCTGGTATCCGGATATTTACGATTTTGTCAGGGGCTTGGCCCTGGGCGGCCACTCCTCTCCGCCGTCATTGGCATTGGTAGAGTAGTCCGGAGGAAGGGTTTTCCCCTTCTTGGACCCTCCGGCCTCCCCAAAGAGGGCGGCCTTCCTTTTCTTCTCTTCCCCCGCTGGAGGGGGAGAGGTCTCTTCTTCTTCCTCCTCGTCTTCACGGGAGGAATGTGCCTCGGAACCGTCGGATGATGGATCCGACACCTCCTGGCGCCGGGCACTCTTTAGAGTCCCCGTGGCCTTTTTCGCGACCTTCTTCTCCGGCACCACATAGGGTGCCGGAACTAGCAGCTTCACCAAGTGAGTGTTCGCTGGGCCTTCGGGCAAAGGAGCCGGACAGTTGATCTGTCCGAACGTCACCTGCCAGTCCTGTCAAAGGTAAGGGAGCTTAGATCCCGCATGGAATCAAACTATGAAAAATGGATACCCTATAAAAGGAAAAAACAGCTTACCGCATGAGCGTGATGCTGCGTACTGAATCCGCGATCCTCGGTAGCGGATGCGGGGGCCTCGGCGCCCTTGAAAAGCACCTTCCAGGCATCTTTGTACATTGTGTCGAAGAGCCTGATCAGAGTTTGCTGCCGCGCCGGGTCGAACTCCCACAGGTTGAAAGCCCGTTGTTGACACGGGAGAATCAGGCGGATGAGCATAACCTGGACTACGTTGACAAGTTTGAGCTTCTTGTCCACCAGGGTTTGGACGCATGTTTGGAGTCCGGTCAGCTCTCCTTTTTTACCCCACGACAGGCCCGTCTCCTTCTAGGAGGTGAGCCGCGTAGGGGGTCCGGATCGAAACTCAGGGGCTGCGACCCATTCGGGGTCGCGCGGCTCGGTGATGTAAAACCACCCTGATTGCCACCCCTTCAGGGTCTCCACAAAGGAGCCCCCGAACCATAAGACGTTGGCCATCTTGCCCACCATGGCGCCTCCGCACTCCACCTGGTTGCTGCGCACCACCTTCGGCTTGACATTGAAAGTCTTGAGCCATAGGCCGAAATGGGGGCGGATGCGGAGGAAAGCCTCGCACACGACGATAAACGCCGAGATGTTGAGGACAAAGTTCGGGGCCAGATCGTGGAAATCCAGGCCATAGTATAACATGAGCCCCCGGACAAATGGGTGAAGAGGGAAGCCCAGTCCGCGGAGGAAATGGGGGAGGAACACCACCCTCTCATGGGGCCTAGGGGTGGGGATGAGCTGCCCCTTTTCGAGAAGCCGGTACGCAATGTCGATAGACAAGTATCCGGCCTTCCTCAGTTTTTTGATGTGTCCCTACATGACGGAGGAGACCATCCACTTGCCTCCCGCTCCGGACATGGCTGGAGAAGGTTGAGGTGGGGAATGCGGACTTGGGCGCTGCAGCTCGAGTGTGTGATAATGGATAAGCAAGGGAGGAAGAAGGCGTAGGTGAAAAGGTGGATCCTTATCCCCTTATATGGGTGGACGCAACTACGTGTCCCCACCAGCCTGGTAAAACTCGCTTATCTCCAAGTAACATAATCAATGGCGCGGTTGGGTTACCCACGCCCGTATTGATGAGAATCCCGGAATAAGGGGACACGATCTCTGCTTTAACAAGACGTGCCAAGGAAACCGCCTCGCATGATGCGCTGAGGTGGGATAATGAAATGACTCGGATAAAGGCTTGGCCGTGGTGTGTCACGCTACAGAATACGTCAGCAGATTAGATTTGTGTAAATATTATTCTCTCTATGGCAATATGTGGAAACTTATTTTGCAGAGCCAGACACTATCTTTGTGTTCAAAATCTTCTATGAAGTACTTGGAGGAGGAACCCGCCTTGCAATGCCGAAGACAATCTGCGCGCCGGACTCGTCGTCATTGAAGCCTGGTTTAGGGGCTACTAAGGGAGTCCTGGATTAGGGGGTGTCCGGATGCCCGGACTATACCTTCAGCCGGACTCCTAGACTATGAAGATACAAGATTGAAGACTTCGTCTCGTGTCCGGAAGGGACTTTCCTTGGCGTGGAAGGCAAGCTTGGCGATACGGATATGCAGATCTCCTACCATTGTAACCGACTTTGTGTAACCCTAACCCTCTCCGGTGTCTATATAAACCGGAGGGTTTTAGTCCGTAGGACAACATACATAACAACAATCATACCATAGGCTAGCTTCTAGGGTTTAGCCTCTACGATCTCGTGGTAGATCTACTCTTGTACTACCCATATCATCAATATTAATCAAGCGGGACGTAGGGTTTTACCTCCATCGAGAGGACCCGAACCTGGGTAAAACTTCGTGTCCCTTGCCTCCTGTTACCATCCGGCCTAGACGCACAGTTCGGGACCCCCTACCCGAGATCCGCCGGTTTTGACACCGACACCCGCATCTTGCTCTCCCTCCCCGTCGTCACCATGCACGTGAAACGGCCGGGCTTTCGACACCTTCCCAGCTACGGCCGGCGGCAGCTGATGAAGCTTACATCAAATCGAGCCAAGGAAGCCTGCACCCCAGTCACCGTCGGCCGCTAGCTAATTGGCAGACATTCCCTGTGCCCGCTTGCCGGCCGGCCGCGGCGTGACCGCCGGTCGGCCCAGAGCCCCGGACGCAACGCAAACCTACACCCCAGCAGGCACGGACAGAAACCACCATTAATTTGCGTGGTGATCATGATCAGGAGCTTATTACGGCAGACAGATGCATCCATCGGTCTCGCTTCTGCCTGTGGGGGTCAAAAGCGCTGCCGGTTACACCACATCCACAGAACCAATTGTACAGAGGGAGGCGACGAGATTCCGTGGCCACGCCAGCTCGGCAGCGCCAAGGAGTACTAGAGCGGCGAGCAGCGGCTGAACTGGTCTGGACATGGACATGTACCCTGCGTGAGCTTTTCGGCCCTATATAAAGTGGCCACCGGCCGTGGGGCAACACTCATCATCACCACTTCCTCAATTCACAGCTTACTCCTGCTCCAGAGAACTTCTGCTTGCTGCCTCGTACCCTAGCTAGCAAGGCAAGCTAGCCGGTCGATCTATACTAGGAAGGAAGGGCTAATGGCCGGTAGGGATAGGGACCCGCTGGTGGTTGGCAGGGTTGTGGGGGACGTGCTGGACCCCTTCGTCCGGACCACCAACCTCAGGGTGACCTTCGGGAACAGGACCGTGTCCAACGGCTGCGAGCTCAAGCCGTCCATGGTCGCCCAGCAGCCCAGGGTTGAGGTGGGCGGCAATGAGATGAGGACCTTCTACACACTCGTACGTACACAGTCACTATCTAATGCCAATTTATCTCTGAAAGTGCTCACCACACGCACATGATCGATCGAGCTCGATCTATAGTACGTGAGGGAAATTGATTTTCGATGCTTCTGTTCACATGTTTGCCTCAGCAAGCACATGACTAATGCTCCATCTTGCATATGTCTCTGTGCCCTCTGGTGTTGATCATGATTTTTCTATGCTTCTTCTATGTTCGGGGAGCATTTATTTTTTATGCTTCTCTTGACATGTTTCATGTTTGTCCTAGCAAGCACACGAGTAATTAAAGCTCGATCTTAAATACTCTCTCCGTCCGAATAAATGTACTTCTAGCTTTTGTCTTAAGTCAAAGTTTTAAAATTTTGACCAACTTTATAGGAAAAAGTAGCAGCATTTATGACACTAAATTAGTATCACTAGATTCGTTTTGAAATGTATTTTCATAATATATCAATTTGATATTATATATGTTACTACTTATTTGTATATAGTTGGTCAAAGTTTTAAAACTTTGACTTAGGATAAAAACTAGAAGTACACTTATTCGTGGACGGAGGGAGTATATGCTTATGTAGGTAGTACTCTCTACTTTGATCATGATGTGCACGCGTTTACTGCCCGCAGGTGATGGTAGACCCAGATGCTCCAAGTCCAAGCGATCCCAACCTTAGGGAGTATCTCCACTGGTAAGTACTAAATTTGTAACTCAGTTGAATAATTTCTCTGTCCCTAGATATACACACTAGCTCATGTGTGCGTGTGTGTGTCTACATGTGTGTGCAGGCTTGTGACAGATATCCCCGGTACAACTGGTGCGTCGTTCGGGCAGGAGGTGATGTGCTACGAGAGCCCTCGTCCGACCATGGGGATCCACCGCTTCGTGCTCGTACTCTTCCAGCAGCTCGGGCGGCAGACGGTGTACGCCCCCGGGTGGCGCCAGAACTTCAACACCAGGGACTTCGCCGAGCTCTACAACCTCGGCCCGCCTGTCGCCGCCGTCTACTTCAACTGCCAGCGTGAGGCCGGCTCCGGCGGCAGGAGGATGTACAATTGATCTACCCACGGCCCTCGTACGCCACCAGCCCGCCGCCAAGTCAGCAAATTATCCAACGTGGCTAGTTTACTAGTATATAGTTTGTGATAAGAAGCCAGCCACGAATTAAGCATTACCTATATATTGGCAACACATACACTACATATATGCATACTATGATCGATGTATAACTAGCCGCATGCATATATGCAATCAACGGCTAATTAAGGGGGGGTGAACCCTAGATCAATGGCTTGGTACTGCACTATATAAATATAGTCTGCAATAAACTGATGCCAATAGTATACAGCACACAAATATTGGAGGAGCTACACGCCATGTGCAACTTAGTGCTACCTGGTACATATCTGCAGGTTGGTCTTGTGCGTTCACTTATGCGTGCATGAACATCAGTCAATCATATAGACATAGTTATGCATGGGAGACAAACATGTAACTGACAGCGACTGCTACAAGACACAGTGATATGTACGCACTTAGCGCAGTAGCAAAGCACATGCATGTGTTGGGTCTTGTACAACTTCCTCATGATT


STEP 1) ANNOTATION OF RETROELEMENT

gnl|TREP|TREP1414 Retrotransposon, LTR, Gypsy, "RLG_Fatima_... 1.801e+04 0.0

STEP2) BLASTX

FIRST GENE:

SECOND GENE:

STEP3) BLASTN (est-others)


NOTE: I have shown the results of the prection programs on the entire sequence. This is not how you should run them! Run only segments that do not contain retroelements. If you notice the genes predicted in the retroelement sequence you will understand why it is necessary to separate these sequences into segments before using these programs.

23