NCBI BLAST Laboratory

We will use BLAST ( to solve an interesting problem and then tworeal-world ones. Three different BLAST programs will be used in this lab, namely BLASTn, BLASTx, and BLASTp.

Example 1: Dinosaur DNA

(Adapted from NCBI Problem Set of NCBI Field Guide.)

Part A.

Michael Crichton's fantasy about cloning dinosaurs, JurassicPark, contains a putative dinosaur DNA sequence. Use BLASTn, thenucleotide-nucleotide BLAST, against the default nucleotide database, nr, to identify the real source of the following sequence. Select, copy and paste it into the BLAST form window. Hit the BLAST! button. When a new window pops up, hit the Format! Button.

This is probably the most common use of nucleotide-nucleotide BLAST: sequence identification, establishing whether an exact match for a sequence is already present in the database.

>DinoDNA from JURASSICPARK p. 103 nt 1-1200

GCGTTGCTGGCGTTTTTCCATAGGCTCCGCCCCCCTGACGAGCATCACAAAAATCGACGC

GGTGGCGAAACCCGACAGGACTATAAAGATACCAGGCGTTTCCCCCTGGAAGCTCCCTCG

TGTTCCGACCCTGCCGCTTACCGGATACCTGTCCGCCTTTCTCCCTTCGGGAAGCGTGGC

TGCTCACGCTGTACCTATCTCAGTTCGGTGTAGGTCGTTCGCTCCAAGCTGGGCTGTGTG

CCGTTCAGCCCGACCGCTGCGCCTTATCCGGTAACTATCGTCTTGAGTCCAACCCGGTAA

AGTAGGACAGGTGCCGGCAGCGCTCTGGGTCATTTTCGGCGAGGACCGCTTTCGCTGGAG

ATCGGCCTGTCGCTTGCGGTATTCGGAATCTTGCACGCCCTCGCTCAAGCCTTCGTCACT

CCAAACGTTTCGGCGAGAAGCAGGCCATTATCGCCGGCATGGCGGCCGACGCGCTGGGCT

GGCGTTCGCGACGCGAGGCTGGATGGCCTTCCCCATTATGATTCTTCTCGCTTCCGGCGG

CCCGCGTTGCAGGCCATGCTGTCCAGGCAGGTAGATGACGACCATCAGGGACAGCTTCAA

CGGCTCTTACCAGCCTAACTTCGATCACTGGACCGCTGATCGTCACGGCGATTTATGCCG

CACATGGACGCGTTGCTGGCGTTTTTCCATAGGCTCCGCCCCCCTGACGAGCATCACAAA

CAAGTCAGAGGTGGCGAAACCCGACAGGACTATAAAGATACCAGGCGTTTCCCCCTGGAA

GCGCTCTCCTGTTCCGACCCTGCCGCTTACCGGATACCTGTCCGCCTTTCTCCCTTCGGG

CTTTCTCAATGCTCACGCTGTAGGTATCTCAGTTCGGTGTAGGTCGTTCGCTCCAAGCTG

ACGAACCCCCCGTTCAGCCCGACCGCTGCGCCTTATCCGGTAACTATCGTCTTGAGTCCA

ACACGACTTAACGGGTTGGCATGGATTGTAGGCGCCGCCCTATACCTTGTCTGCCTCCCC

GCGGTGCATGGAGCCGGGCCACCTCGACCTGAATGGAAGCCGGCGGCACCTCGCTAACGG

CCAAGAATTGGAGCCAATCAATTCTTGCGGAGAACTGTGAATGCGCAAACCAACCCTTGG

CCATCGCGTCCGCCATCTCCAGCAGCCGCACGCGGCGCATCTCGGGCAGCGTTGGGTCCT

What is your discovery?

Part B.

NCBI scientist Mark Boguski noticed this obvious "contaminant" and supplied Crichton with a better sequence, shown below, for the sequel, The Lost World. Identify the most likely source of this sequence using Use nucleotide-nucleotide BLAST as in Part A.

>DinoDNA from THE LOST WORLD p. 135

GAATTCCGGAAGCGAGCAAGAGATAAGTCCTGGCATCAGATACAGTTGGAGATAAGGACG

GACGTGTGGCAGCTCCCGCAGAGGATTCACTGGAAGTGCATTACCTATCCCATGGGAGCC

ATGGAGTTCGTGGCGCTGGGGGGGCCGGATGCGGGCTCCCCCACTCCGTTCCCTGATGAA

GCCGGAGCCTTCCTGGGGCTGGGGGGGGGCGAGAGGACGGAGGCGGGGGGGCTGCTGGCC

TCCTACCCCCCCTCAGGCCGCGTGTCCCTGGTGCCGTGGGCAGACACGGGTACTTTGGGG

ACCCCCCAGTGGGTGCCGCCCGCCACCCAAATGGAGCCCCCCCACTACCTGGAGCTGCTG

CAACCCCCCCGGGGCAGCCCCCCCCATCCCTCCTCCGGGCCCCTACTGCCACTCAGCAGC

GGGCCCCCACCCTGCGAGGCCCGTGAGTGCGTCATGGCCAGGAAGAACTGCGGAGCGACG

GCAACGCCGCTGTGGCGCCGGGACGGCACCGGGCATTACCTGTGCAACTGGGCCTCAGCC

TGCGGGCTCTACCACCGCCTCAACGGCCAGAACCGCCCGCTCATCCGCCCCAAAAAGCGC

CTGCTGGTGAGTAAGCGCGCAGGCACAGTGTGCAGCCACGAGCGTGAAAACTGCCAGACA

TCCACCACCACTCTGTGGCGTCGCAGCCCCATGGGGGACCCCGTCTGCAACAACATTCAC

GCCTGCGGCCTCTACTACAAACTGCACCAAGTGAACCGCCCCCTCACGATGCGCAAAGAC

GGAATCCAAACCCGAAACCGCAAAGTTTCCTCCAAGGGTAAAAAGCGGCGCCCCCCGGGG

GGGGGAAACCCCTCCGCCACCGCGGGAGGGGGCGCTCCTATGGGGGGAGGGGGGGACCCC

TCTATGCCCCCCCCGCCGCCCCCCCCGGCCGCCGCCCCCCCTCAAAGCGACGCTCTGTAC

GCTCTCGGCCCCGTGGTCCTTTCGGGCCATTTTCTGCCCTTTGGAAACTCCGGAGGGTTT

TTTGGGGGGGGGGCGGGGGGTTACACGGCCCCCCCGGGGCTGAGCCCGCAGATTTAAATA

ATAACTCTGACGTGGGCAAGTGGGCCTTGCTGAGAAGACAGTGTAACATAATAATTTGCA

CCTCGGCAATTGCAGAGGGTCGATCTCCACTTTGGACACAACAGGGCTACTCGGTAGGAC

CAGATAAGCACTTTGCTCCCTGGACTGAAAAAGAAAGGATTTATCTGTTTGCTTCTTGCT

GACAAATCCCTGTGAAAGGTAAAAGTCGGACACAGCAATCGATTATTTCTCGCCTGTGTG

AAATTACTGTGAATATTGTAAATATATATATATATATATATATATCTGTATAGAACAGCC

TCGGAGGCGGCATGGACCCAGCGTAGATCATGCTGGATTTGTACTGCCGGAATTC

What is your discovery?

Now, use the translating BLAST (blastx)page with the sequence. The proper use of the translating BLAST services is to look for similar proteins (identify potential homologues) in other species.

What is your discovery? Compare the result to that obtained from BLASTn.

(By the way, Mark imbedded his name in the sequence he provided. Look for MARK WAS HERE NIH.)

Example 2:Identify what organism a given DNA probably belongs to.

>CHEBGLI

ATCCTGAGCATGTTTTTGTCCCTGTCACCATCTCACAATCCTTAACTAAT

ACTAAAATGGTTTGAATAGAACTCCTTCCTAGTCTGATGCTTAGCTACAC

TCCATGGATCATTCTGGAAGAAGGAGAATGGGAGAGATGGATGCCATCGT

GGAGGATGAGAGGGGAAGAACAGAGGGTGAAAATGGGGAATGAGTGGAAA

ATAATGTTGAGGATACAGGAGGACCTAACAAACGAAAGATGTCTCAGGAA

CCAACAGGTTATCATACACTCAATATACATCTCCTTGCTGACCATCAGCT

GACCTGACTCCACCCCTGAGGGACACAGCCTAACCTTGACCAATGACTTC

AAAGGACAAGGGGGAGCAAGGGGGCAGAAGTTCAGCAGTAAAGAATAAAA

GGCCACAGCATCCAGCAGCAGCACAGACTTGCTTCTGATGCTTCTGTGAT

CACCTGTAAGCTCCACGACTTGACATCATGGTGCATTTTACTGCCGAGGA

GAAGGCTGCTATCACTGGCCTGTGGGGCAAAGTCAATGTGGAAGAGGCTG

GAGGCGAGGCTCTGGGCAGGTAGAAAGTGGACTTCATGGGGGAGGATGGT

GAATATGAGCCTGGCAAATCGGCCAGAAAAATTCTTCAAAAATCTGAGTT

GCTGATTTTCCATCTGCTATGTTTCCATCTCATAGGCTCCTGGTTGTCTA

CCCCTGGACCCAGAGGTTCTTTGATAGCTTTGGCAACCTGTCCTCTGCCT

CTGCCATAATGGGAAACCCCAAGGTCAAGGCCCACGGCAAGAAGGTGCTG

ACCTCCTTTGGAGAAGCTATTAAGAATTTGGACAACCTCAAAGGTGCCTT

CGCTAAGCTGAGTGAGCTGCACTGTGACAAGTTGCACGTGGATCCTGAGA

ACTTCAGGGTGAGTTCAGGAAGTGTTCATGCGTTCCCTTTGGCTTTTTAC

CTTGCAATAATAATGGAAGTTGAGTGTTTTATTGGAAAGACTAGAAAGAC

CTCAGAAATCATAGATCAAACTAGGTGTTAGGAGGACAGACTTCCAGTGG

GCATACCGAGCCCACTTGATTCAGGACTAGTGACATAAAGAGCTATGGGC

AGCCTTACTGTGCATGCATGGCTAAGTCGCTTCAGGTGTCAGACTCTTTG

TGACCCCATGGCTGTAGCCACCAGGTCCCTCTGTCCATGGGATTCTCCAG

GCTAGGATACAGGTATGTGTTGCCATTTCTTTTTCCAGGGGATCTACCCA

GCCCAAGGATCATATCTGTATCTCTTACATCTCCTTCAATAGCAGGCATG

TTCTTTATCACTAGCACCATGATGAGCATCCATAAGTTTGCTTAAAAGTT

TTCTGGAACTTCTGTCAGAACTGGATGTATTTACCCCAGAGAATATCAAA

GAATAGCATATTTGTTCTGGGAGAAATGAAATCTGGCTTTTGAAAGAATA

AGTCCAGTCTCTAGGAGGGAGAATTATCCTATGTGANTCCCGATGACTGA

AGTTTAGGAAGATATTTGGGAGAATAATTATTAGCCAGATCATCTCAAAG

AAAAATTGATCAATATCTCAAGGAATTACCCATCAGAACTGTGACTAGGT

GGAGGCTTATTGTTGCATTGAATTGAGGGTTTACTAAGCTCATTCTAACA

ACCCATGCAGCCCTGAATCCTATGAATATAAAATTAGAAGGAGGGAAAAG

GCAACTAAAAATAGTGAAATAGGAGAGAGGCAAGGGATATAGGCAGACAA

AATATTGTATGGAGGGCTCATAGGATTTAAATTAAATTGAAGGACAAGCT

CATCTGAGTTTATTGTATAGGTACAACCCATGGAGAAGTTTAAGATGTGG

ACTTGGGAGTGGTTTAGGTACTAAGCCATTTTCTGTAACTCTTTTAGCAA

ACTTCAACTTGGCCTACCTAATTCTTATTCTGTCTCTCACCCAACAGCTC

CTGGGCAATGTGATTGTGATTATTCTGGCTACTCATTTTGGCAGAGAATT

CACCCCTGACGTGCAGGCTGCCTGGCAGAAGCTGGTGTCTGGTGTTGCCA

CTGCTCTGGCCCACAAGTACCACTGAATTCTCTTTACAATTCACCATTTT

GTGTCCCCAGTGCCTTCCTTCTGCCCCTTGGGACTGGGGTTTGGCCTTGT

GAACCCAGATTCTGTTTAATAAAATACATTCTATTCAGTGATCAAAAATT

AAAATTGTACCTTCTCTATCA

Source: 14th, 2007

Example 3: Investigate Unknown Gene

The following sequence was conserved among family members with a certain disease. BLASTp this sequence to see if you can diagnose the disease. (Go to NCBI's BLAST homepage click on Protein-protein BLAST (blastp) in the Protein area to access BLASTp.)

VAAAETAKHQAKCNICKECPIIGFRYRSLKHFNYDICQSCFFSGRVAKGHKMHYPMVEYCTPTTSGEDVR

Answer the following questions:

a)What is that disease? Use one or two sentences to describe the symptoms of the disease.

b)Is human the only species that have the gene? What other species share the gene with human? (Find at least two more in addition to human.)

c)Find the name and loci of the gene in those species.