DNA BLAST Search
Spring 2009
General Directions:
READ ALL THE DIRECTIONS (both sides) and practice the example BLAST search BEFORE STARTING YOUR OWN DATA SEARCH
This assignment is help you learn how access and use an online research tool to identify a gene, protein or disease from a nucleotide sequence. The database is a national resource offered free of charge from the National Institutes of Health.
www.ncbi.nlm.nih.gov provides the database that you will use to search your DNA unknowns. An example of how to access this database will be provided in class and is also given at the end of these instructions. Please hand in your results typed and as outlined below.
HOW TO ASSEMBLE YOUR SEARCH RESULTS:
a) The FIRST page must be the original numbered list of your unknowns that you just received in class with your name. You should use the sheet you received as the first page of your report, do not retype it.
b) Page 2 of your report: is a summary table that should be set-up as follows:
Column 1: simply the number of the unknown, i.e. 1 through 10
Column 2: name of the gene
Column 3 name of the associated disease (if provided); sometimes the results may indicate a protein that has not been associated with a disease.
Column 4: name of the protein (if provided)
There should always be a name of the gene provided and many times, if not always the protein and sometimes the disease. You will have to check the information provided in the data search, you will probably have to check a couple of the article titles and key words to find some of this information. Due to the fact that the lists of gene sequences were randomly arranged; some students may have by chance the same gene for more than more search, that is fine.
c) Page 3 of your report. A copy of the “bit-match” and reference you obtained for each sequence. You should copy and paste this information into a word document otherwise you will end up with huge print outs that are difficult to interpret. I will show you an example of this step in class and an example is also provided at the end of these instructions.
d) Please do not place your search results in a folder of any kind, but simply staple it together BEFORE handing it in.
e) Please follow the directions, obtaining all the points for this assignment requires following the directions.
Please read through these points before beginning your search.
· The information regarding the name of the gene, protein and/or disease often requires reading the title of the article or other information provided in the reference. The information may not always be quickly evident without looking at the reference information.
· Note that the computer will give you a long list of possible matches for your DNA sequence. You should select just one of these matches (otherwise you will be printing pages of information!). I recommend doing a simple “copy and paste” of the match that you use into a word doc. for each sequence and its corresponding reference. This will save you lots of paper and headaches.
· Lastly it is required that you do your work independently. It is not allowed to hand a zerox copy of the work, the work must be a printout from your own search. (You are given different searches.)
Detailed Steps for performing your BLAST search:
Go to the following web site for the National Center of Biotechnology Information
a. www.ncbi.nlm.nih.gov
b. Click on “BLAST” near the upper part of the page
c. Click on nucleotide search (approximately half way down the page)
d. A blank search box will now appear that is labeled “Enter Query Sequence” . Type one of your unknown sequences into this box. (You can only perform one search at a time.)
e. Scroll to the bottom of the page and Click on the blue button called “BLAST!”
f. A list of matches all matches will now be seen.
g. Scroll down to the “bit-matches”. This shows how well the sequence you entered matches to the database. (see example)
h. Scroll down further to “transcripts” and click on the accession number, this will take you to the references. Please refer to the example that will show you this step.
i. Open up a blank word doc. to copy and paste the pertinent information you just have retrieved. These databases give you more information than you will need to submit, so please copy and paste the required information into this separate word document; otherwise you will print out a huge volume of information with each search. It will save you lots of work and paper to copy and paste into a separate word document.
j. Sometimes a list of matches will have appeared, but which one should you use?
k. Select the highest scoring matches. Sometimes you will notice the identical high score is given to more than one reference. This simply means that more than one researcher has published on the same identical gene sequence. However on a very practical point: Note that some of these equally scoring references may have more information about your gene sequence, thus I HIGHLY RECOMMEND that you click on more than one of these highest scoring matches for information about your gene if it is not immediately evident. Copy and paste the score match into your word document.
l. reference information. Now copy and paste the reference information in your word document: Definition, accession number, Author(s), title of article, name of journal, citation pages and the year of publication go beneath the matching score bit.
m. You are not expected to find the original article and read it; however you are expected to look at the information provided in your Blast search that is often provided within the reference information. Be sure to check the title of the article since this can provide helpful information.
n. Like any database it may take you some time to read through the information provided from a couple of these references before you have a good idea of what gene you have identified. With each search you will get faster and soon you will be able to pick up on such things as key words or other helpful hints the authors have provided about their sequence quite quickly.
A Practice Example of BLAST search:
Practice sequence: aagcgctcct gtcggtgcca cgaggggtac tctctgctgg cagacggggt gtcctgcaca
Bit Match: Score = 111 bits (60), Expect = 4e-23
Identities = 60/60 (100%), Gaps = 0/60 (0%)
Strand=Plus/Plus
Query 1 AAGCGCTCCTGTCGGTGCCACGAGGGGTACTCTCTGCTGGCAGACGGGGTGTCCTGCACA 60
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct 490 AAGCGCTCCTGTCGGTGCCACGAGGGGTACTCTCTGCTGGCAGACGGGGTGTCCTGCACA 549
Click on the accession number
Reference in bold and additional information were obtained by clicking on the accession number. Notice that next to definition it states this defines the sequence as matching to Homo sapien coagulation factor VII.
NM_019616 2412 bp mRNA linear PRI 08-OCT-2006
DEFINITION Homo sapiens coagulation factor VII (serum prothrombin conversion
accelerator) (F7), transcript variant 2, mRNA.
ACCESSION NM_019616
VERSION NM_019616.1 GI:10518502
KEYWORDS .
SOURCE Homo sapiens (human)
ORGANISM Homo sapiens
Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi;
Mammalia; Eutheria; Euarchontoglires; Primates; Haplorrhini;
Catarrhini; Hominidae; Homo.
REFERENCE 1 (bases 1 to 2412)
AUTHORS Smith,S.A., Comp,P.C. and Morrissey,J.H.
TITLE Traces of factor VIIa modulate thromboplastin sensitivity to
factors V, VII, X, and prothrombin
JOURNAL J. Thromb. Haemost. 4 (7), 1553-1558 (2006)
EXAMPLE OF SUMMARY TABLE
Number of unknowns / Name of gene / Name of protein / Name of disease(if available)
1 / Coagulation factor gene / Coagulation factor VII / N/A
2
3
4
5
6
7
8
9
10
1