NCBI BLAST Laboratory
We will use BLAST ( to solve an interesting problem and then tworeal-world ones. Three different BLAST programs will be used in this lab, namely BLASTn, BLASTx, and BLASTp.
Example 1: Dinosaur DNA
(Adapted from NCBI Problem Set of NCBI Field Guide.)
Part A.
Michael Crichton's fantasy about cloning dinosaurs, JurassicPark, contains a putative dinosaur DNA sequence. Use BLASTn, thenucleotide-nucleotide BLAST, against the default nucleotide database, nr, to identify the real source of the following sequence. Select, copy and paste it into the BLAST form window. Hit the BLAST! button. When a new window pops up, hit the Format! Button.
This is probably the most common use of nucleotide-nucleotide BLAST: sequence identification, establishing whether an exact match for a sequence is already present in the database.
>DinoDNA from JURASSICPARK p. 103 nt 1-1200
GCGTTGCTGGCGTTTTTCCATAGGCTCCGCCCCCCTGACGAGCATCACAAAAATCGACGC
GGTGGCGAAACCCGACAGGACTATAAAGATACCAGGCGTTTCCCCCTGGAAGCTCCCTCG
TGTTCCGACCCTGCCGCTTACCGGATACCTGTCCGCCTTTCTCCCTTCGGGAAGCGTGGC
TGCTCACGCTGTACCTATCTCAGTTCGGTGTAGGTCGTTCGCTCCAAGCTGGGCTGTGTG
CCGTTCAGCCCGACCGCTGCGCCTTATCCGGTAACTATCGTCTTGAGTCCAACCCGGTAA
AGTAGGACAGGTGCCGGCAGCGCTCTGGGTCATTTTCGGCGAGGACCGCTTTCGCTGGAG
ATCGGCCTGTCGCTTGCGGTATTCGGAATCTTGCACGCCCTCGCTCAAGCCTTCGTCACT
CCAAACGTTTCGGCGAGAAGCAGGCCATTATCGCCGGCATGGCGGCCGACGCGCTGGGCT
GGCGTTCGCGACGCGAGGCTGGATGGCCTTCCCCATTATGATTCTTCTCGCTTCCGGCGG
CCCGCGTTGCAGGCCATGCTGTCCAGGCAGGTAGATGACGACCATCAGGGACAGCTTCAA
CGGCTCTTACCAGCCTAACTTCGATCACTGGACCGCTGATCGTCACGGCGATTTATGCCG
CACATGGACGCGTTGCTGGCGTTTTTCCATAGGCTCCGCCCCCCTGACGAGCATCACAAA
CAAGTCAGAGGTGGCGAAACCCGACAGGACTATAAAGATACCAGGCGTTTCCCCCTGGAA
GCGCTCTCCTGTTCCGACCCTGCCGCTTACCGGATACCTGTCCGCCTTTCTCCCTTCGGG
CTTTCTCAATGCTCACGCTGTAGGTATCTCAGTTCGGTGTAGGTCGTTCGCTCCAAGCTG
ACGAACCCCCCGTTCAGCCCGACCGCTGCGCCTTATCCGGTAACTATCGTCTTGAGTCCA
ACACGACTTAACGGGTTGGCATGGATTGTAGGCGCCGCCCTATACCTTGTCTGCCTCCCC
GCGGTGCATGGAGCCGGGCCACCTCGACCTGAATGGAAGCCGGCGGCACCTCGCTAACGG
CCAAGAATTGGAGCCAATCAATTCTTGCGGAGAACTGTGAATGCGCAAACCAACCCTTGG
CCATCGCGTCCGCCATCTCCAGCAGCCGCACGCGGCGCATCTCGGGCAGCGTTGGGTCCT
What is your discovery?
Part B.
NCBI scientist Mark Boguski noticed this obvious "contaminant" and supplied Crichton with a better sequence, shown below, for the sequel, The Lost World. Identify the most likely source of this sequence using Use nucleotide-nucleotide BLAST as in Part A.
>DinoDNA from THE LOST WORLD p. 135
GAATTCCGGAAGCGAGCAAGAGATAAGTCCTGGCATCAGATACAGTTGGAGATAAGGACG
GACGTGTGGCAGCTCCCGCAGAGGATTCACTGGAAGTGCATTACCTATCCCATGGGAGCC
ATGGAGTTCGTGGCGCTGGGGGGGCCGGATGCGGGCTCCCCCACTCCGTTCCCTGATGAA
GCCGGAGCCTTCCTGGGGCTGGGGGGGGGCGAGAGGACGGAGGCGGGGGGGCTGCTGGCC
TCCTACCCCCCCTCAGGCCGCGTGTCCCTGGTGCCGTGGGCAGACACGGGTACTTTGGGG
ACCCCCCAGTGGGTGCCGCCCGCCACCCAAATGGAGCCCCCCCACTACCTGGAGCTGCTG
CAACCCCCCCGGGGCAGCCCCCCCCATCCCTCCTCCGGGCCCCTACTGCCACTCAGCAGC
GGGCCCCCACCCTGCGAGGCCCGTGAGTGCGTCATGGCCAGGAAGAACTGCGGAGCGACG
GCAACGCCGCTGTGGCGCCGGGACGGCACCGGGCATTACCTGTGCAACTGGGCCTCAGCC
TGCGGGCTCTACCACCGCCTCAACGGCCAGAACCGCCCGCTCATCCGCCCCAAAAAGCGC
CTGCTGGTGAGTAAGCGCGCAGGCACAGTGTGCAGCCACGAGCGTGAAAACTGCCAGACA
TCCACCACCACTCTGTGGCGTCGCAGCCCCATGGGGGACCCCGTCTGCAACAACATTCAC
GCCTGCGGCCTCTACTACAAACTGCACCAAGTGAACCGCCCCCTCACGATGCGCAAAGAC
GGAATCCAAACCCGAAACCGCAAAGTTTCCTCCAAGGGTAAAAAGCGGCGCCCCCCGGGG
GGGGGAAACCCCTCCGCCACCGCGGGAGGGGGCGCTCCTATGGGGGGAGGGGGGGACCCC
TCTATGCCCCCCCCGCCGCCCCCCCCGGCCGCCGCCCCCCCTCAAAGCGACGCTCTGTAC
GCTCTCGGCCCCGTGGTCCTTTCGGGCCATTTTCTGCCCTTTGGAAACTCCGGAGGGTTT
TTTGGGGGGGGGGCGGGGGGTTACACGGCCCCCCCGGGGCTGAGCCCGCAGATTTAAATA
ATAACTCTGACGTGGGCAAGTGGGCCTTGCTGAGAAGACAGTGTAACATAATAATTTGCA
CCTCGGCAATTGCAGAGGGTCGATCTCCACTTTGGACACAACAGGGCTACTCGGTAGGAC
CAGATAAGCACTTTGCTCCCTGGACTGAAAAAGAAAGGATTTATCTGTTTGCTTCTTGCT
GACAAATCCCTGTGAAAGGTAAAAGTCGGACACAGCAATCGATTATTTCTCGCCTGTGTG
AAATTACTGTGAATATTGTAAATATATATATATATATATATATATCTGTATAGAACAGCC
TCGGAGGCGGCATGGACCCAGCGTAGATCATGCTGGATTTGTACTGCCGGAATTC
What is your discovery?
Now, use the translating BLAST (blastx)page with the sequence. The proper use of the translating BLAST services is to look for similar proteins (identify potential homologues) in other species.
What is your discovery? Compare the result to that obtained from BLASTn.
(By the way, Mark imbedded his name in the sequence he provided. Look for MARK WAS HERE NIH.)
Example 2:Identify what organism a given DNA probably belongs to.
>CHEBGLI
ATCCTGAGCATGTTTTTGTCCCTGTCACCATCTCACAATCCTTAACTAAT
ACTAAAATGGTTTGAATAGAACTCCTTCCTAGTCTGATGCTTAGCTACAC
TCCATGGATCATTCTGGAAGAAGGAGAATGGGAGAGATGGATGCCATCGT
GGAGGATGAGAGGGGAAGAACAGAGGGTGAAAATGGGGAATGAGTGGAAA
ATAATGTTGAGGATACAGGAGGACCTAACAAACGAAAGATGTCTCAGGAA
CCAACAGGTTATCATACACTCAATATACATCTCCTTGCTGACCATCAGCT
GACCTGACTCCACCCCTGAGGGACACAGCCTAACCTTGACCAATGACTTC
AAAGGACAAGGGGGAGCAAGGGGGCAGAAGTTCAGCAGTAAAGAATAAAA
GGCCACAGCATCCAGCAGCAGCACAGACTTGCTTCTGATGCTTCTGTGAT
CACCTGTAAGCTCCACGACTTGACATCATGGTGCATTTTACTGCCGAGGA
GAAGGCTGCTATCACTGGCCTGTGGGGCAAAGTCAATGTGGAAGAGGCTG
GAGGCGAGGCTCTGGGCAGGTAGAAAGTGGACTTCATGGGGGAGGATGGT
GAATATGAGCCTGGCAAATCGGCCAGAAAAATTCTTCAAAAATCTGAGTT
GCTGATTTTCCATCTGCTATGTTTCCATCTCATAGGCTCCTGGTTGTCTA
CCCCTGGACCCAGAGGTTCTTTGATAGCTTTGGCAACCTGTCCTCTGCCT
CTGCCATAATGGGAAACCCCAAGGTCAAGGCCCACGGCAAGAAGGTGCTG
ACCTCCTTTGGAGAAGCTATTAAGAATTTGGACAACCTCAAAGGTGCCTT
CGCTAAGCTGAGTGAGCTGCACTGTGACAAGTTGCACGTGGATCCTGAGA
ACTTCAGGGTGAGTTCAGGAAGTGTTCATGCGTTCCCTTTGGCTTTTTAC
CTTGCAATAATAATGGAAGTTGAGTGTTTTATTGGAAAGACTAGAAAGAC
CTCAGAAATCATAGATCAAACTAGGTGTTAGGAGGACAGACTTCCAGTGG
GCATACCGAGCCCACTTGATTCAGGACTAGTGACATAAAGAGCTATGGGC
AGCCTTACTGTGCATGCATGGCTAAGTCGCTTCAGGTGTCAGACTCTTTG
TGACCCCATGGCTGTAGCCACCAGGTCCCTCTGTCCATGGGATTCTCCAG
GCTAGGATACAGGTATGTGTTGCCATTTCTTTTTCCAGGGGATCTACCCA
GCCCAAGGATCATATCTGTATCTCTTACATCTCCTTCAATAGCAGGCATG
TTCTTTATCACTAGCACCATGATGAGCATCCATAAGTTTGCTTAAAAGTT
TTCTGGAACTTCTGTCAGAACTGGATGTATTTACCCCAGAGAATATCAAA
GAATAGCATATTTGTTCTGGGAGAAATGAAATCTGGCTTTTGAAAGAATA
AGTCCAGTCTCTAGGAGGGAGAATTATCCTATGTGANTCCCGATGACTGA
AGTTTAGGAAGATATTTGGGAGAATAATTATTAGCCAGATCATCTCAAAG
AAAAATTGATCAATATCTCAAGGAATTACCCATCAGAACTGTGACTAGGT
GGAGGCTTATTGTTGCATTGAATTGAGGGTTTACTAAGCTCATTCTAACA
ACCCATGCAGCCCTGAATCCTATGAATATAAAATTAGAAGGAGGGAAAAG
GCAACTAAAAATAGTGAAATAGGAGAGAGGCAAGGGATATAGGCAGACAA
AATATTGTATGGAGGGCTCATAGGATTTAAATTAAATTGAAGGACAAGCT
CATCTGAGTTTATTGTATAGGTACAACCCATGGAGAAGTTTAAGATGTGG
ACTTGGGAGTGGTTTAGGTACTAAGCCATTTTCTGTAACTCTTTTAGCAA
ACTTCAACTTGGCCTACCTAATTCTTATTCTGTCTCTCACCCAACAGCTC
CTGGGCAATGTGATTGTGATTATTCTGGCTACTCATTTTGGCAGAGAATT
CACCCCTGACGTGCAGGCTGCCTGGCAGAAGCTGGTGTCTGGTGTTGCCA
CTGCTCTGGCCCACAAGTACCACTGAATTCTCTTTACAATTCACCATTTT
GTGTCCCCAGTGCCTTCCTTCTGCCCCTTGGGACTGGGGTTTGGCCTTGT
GAACCCAGATTCTGTTTAATAAAATACATTCTATTCAGTGATCAAAAATT
AAAATTGTACCTTCTCTATCA
Source: 14th, 2007
Example 3: Investigate Unknown Gene
The following sequence was conserved among family members with a certain disease. BLASTp this sequence to see if you can diagnose the disease. (Go to NCBI's BLAST homepage click on Protein-protein BLAST (blastp) in the Protein area to access BLASTp.)
VAAAETAKHQAKCNICKECPIIGFRYRSLKHFNYDICQSCFFSGRVAKGHKMHYPMVEYCTPTTSGEDVR
Answer the following questions:
a)What is that disease? Use one or two sentences to describe the symptoms of the disease.
b)Is human the only species that have the gene? What other species share the gene with human? (Find at least two more in addition to human.)
c)Find the name and loci of the gene in those species.