DNA BLAST Search Directions

FALL SEMESTER 2011

General Directions:

READ ALL THE DIRECTIONS (both sides) and practice the example BLAST search BEFORE STARTING YOUR OWN DATA SEARCH

This assignment is help you learn how access and use an online research tool to identify a gene and disease from a nucleotide sequence. The database is a national resource offered free of charge from the National Institutes of Health.

www.ncbi.nlm.nih.gov provides the database that you will use to search your DNA unknowns. An example of how to access this database will be provided in class and is also given at the end of these instructions. Please hand in your results typed and as outlined below.

HOW TO ASSEMBLE YOUR SEARCH RESULTS:

a) The FIRST page must be the original list of unknowns that you have received in class with your name. You should use the sheet you received as the first page of your report, do not retype it.

b) Page 2 of your report: is a summary table that should be set-up as follows:

Column 1: simply the number of the unknown, i.e. 1 through 10

Column 2: name of the gene, protein and/or name of the associated disease (if provided)

You will have to check the information provided in the data search, you will probably have to check a couple of the article titles and key words to find some of this information. Due to the fact that the lists of gene sequences were randomly arranged; some students may have by chance alone the same gene for more than more search, that is fine.

c) Page 3 of your report. A copy of the “bit-match” and reference you obtained for each sequence. You should copy and paste this information into a word document otherwise you will end up with huge print outs that are difficult to interpret. I will show you an example of this step in class and an example is also provided at the end of these instructions.

d) Please do not place your search results in a folder of any kind, but simply staple it together BEFORE handing it in.

e) Please follow the directions, obtaining all the points for this assignment requires following the directions.

Please read through these points before beginning your search.

·  The information regarding the name of the gene and/or disease often requires reading the title of the article or other information provided in the reference. The information may not always be quickly evident without looking at the reference information. You do not have to read the article but it is very helpful to read the title; key words or look for other identifying information in the BLAST search.

·  Note that the computer will give you a long list of possible matches for your DNA sequence. You should select just one of these matches (otherwise you will be printing pages of information!). I recommend doing a simple “copy and paste” of the match that you use into a word doc. for each sequence and its corresponding reference. This will save you lots of paper and headaches.

·  Lastly it is required that you do your work independently. It is not allowed to hand a zerox copy of the work, the work must be a printout from your own search. (You are given different searches.)


Detailed Steps for performing your BLAST search:

Go to the following web site for the National Center of Biotechnology Information

a.  www.ncbi.nlm.nih.gov

b.  Click on “BLAST” near the upper part of the page

c.  Click on nucleotide search (approximately half way down the page)

d.  A blank search box will now appear that is labeled “Enter Query Sequence”. Type one of your unknown sequences into this box. (You can only perform one search at a time.)

e.  Scroll to the bottom of the page and Click on the blue button called “BLAST!”

f.  Prepare to collect the BLAST information by: Opening up a blank word doc. to copy and paste the pertinent information you just have retrieved. These databases give you more information than you will need to submit, so please copy and paste the required information into this separate word document; otherwise you will print out a huge volume of information with each search. It will save you lots of work and paper to copy and paste into a separate word document.

g.  After clicking on “Blast” scroll to Taxonomy Reports; this link will provide the specific references to the matches that you found. It is at first easy to overlook “taxonomy reports” but it is approximately 1/3rd down the page.

h.  Taxonomy Reports provides two useful links. If you click on the number to left of the reference, it will look similar to this: ref|NM_000948.3|; you will now be linked to the specific reference. Copy and paste the reference information as shown below into your word doc.

i.  The reference information. Should include: Definition, accession number, Author(s), title of article, name of journal, citation pages and the year of publication go beneath the matching score bit. You only need to submit one of these references since it is very common that there are multiple references for each sequence that you submit.

LOCUS NM_019616 3075 bp mRNA linear PRI 18-OCT-2009

DEFINITION Homo sapiens coagulation factor VII (serum prothrombin conversion

accelerator) (F7), transcript variant 2, mRNA.

ACCESSION NM_019616

VERSION NM_019616.2 GI:116805323

KEYWORDS .

SOURCE Homo sapiens (human)

ORGANISM Homo sapiens

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi;

Mammalia; Eutheria; Euarchontoglires; Primates; Haplorrhini;

Catarrhini; Hominidae; Homo.

REFERENCE 1 (bases 1 to 3075)

AUTHORS Matarin,M., Brown,W.M., Dena,H., Britton,A., De Vrieze,F.W.,

Brott,T.G., Brown,R.D. Jr., Worrall,B.B., Case,L.D., Chanock,S.J.,

Metter,E.J., Ferruci,L., Gamble,D., Hardy,J.A., Rich,S.S.,

Singleton,A. and Meschia,J.F.

TITLE Candidate Gene Polymorphisms for Ischemic Stroke

JOURNAL Stroke (2009) In press

PUBMED 19729601

REMARK GeneRIF: Observational study of gene-disease association. (HuGE

Navigator)

Publication Status: Available-Online prior to print

j.  Taxonomy Reports: also provides a second critical piece of information that must be included. Notice a number to the far right, for example:

ref|NM_000948.3| Homo sapiens prolactin (PRL), mRNA 111

The number 111 is a “score”; click on this “score” and it will take you a “bit match”. The scores can vary; so do not always expect to see “111”.

You will now see for example:

Homo sapiens prolactin (PRL), mRNA

Length=1384

Score = 111 bits (60), Expect = 4e-23

Identities = 60/60 (100%), Gaps = 0/60 (0%)

Strand=Plus/Plus

Query 1 AGATATCAAAGGTTTATAAAGCCAATATCTGGGAAAGAGAAAACCGTGAGACTTCCAGAT 60

||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

Sbjct 421 AGATATCAAAGGTTTATAAAGCCAATATCTGGGAAAGAGAAAACCGTGAGACTTCCAGAT 480

What you are viewing is “a bit match” this is a visual representation between the sequence that you entered and what is found in the database. Please copy and paste this information into your word doc.

k.  You are not expected to find the original article and read it; however you are expected to look at the information provided in your Blast search that is often provided within the reference information. Be sure to check the title of the article since this can provide helpful information.

l.  Like any database it may take you some time to read through the information provided from a couple of these references before you have a good idea of what gene you have identified. With each search you will get faster and soon you will be able to pick up on such things as key words or other helpful hints the authors have provided about their sequence quite quickly.

m.  Lastly, please prepare this very logically and be organized. Present your References and bit match in sequential order; meaning do sequence-1 reference and bit match; then copy and past the reference and bit match for sequence-2 and so on. If the information is disorganized, it is not possible to receive full credit since it is virtually impossible for me to figure out the information from all these searches if they are not organized.

A Practice Summary Example of BLAST search:

Practice sequence: aagcgctcct gtcggtgcca cgaggggtac tctctgctgg cagacggggt gtcctgcaca

Bit Match: Score = 111 bits (60), Expect = 4e-23

Identities = 60/60 (100%), Gaps = 0/60 (0%)

Strand=Plus/Plus

Query 1 AAGCGCTCCTGTCGGTGCCACGAGGGGTACTCTCTGCTGGCAGACGGGGTGTCCTGCACA 60

||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

Sbjct 490 AAGCGCTCCTGTCGGTGCCACGAGGGGTACTCTCTGCTGGCAGACGGGGTGTCCTGCACA 549

Click on “Taxonomy Reports” and the accession number that corresponds to refernce:

NM_019616 2412 bp mRNA linear PRI 08-OCT-2006

DEFINITION Homo sapiens coagulation factor VII (serum prothrombin conversion

accelerator) (F7), transcript variant 2, mRNA.

ACCESSION NM_019616

VERSION NM_019616.1 GI:10518502

KEYWORDS .

SOURCE Homo sapiens (human)

ORGANISM Homo sapiens

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi;

Mammalia; Eutheria; Euarchontoglires; Primates; Haplorrhini;

Catarrhini; Hominidae; Homo.

REFERENCE 1 (bases 1 to 2412)

AUTHORS Smith,S.A., Comp,P.C. and Morrissey,J.H.

TITLE Traces of factor VIIa modulate thromboplastin sensitivity to

factors V, VII, X, and prothrombin

JOURNAL J. Thromb. Haemost. 4 (7), 1553-1558 (2006)

Plus remarks: should there be any

EXAMPLE OF SUMMARY TABLE (using prolactin example)

Number of unknowns / Name of gene and/or protein / Name of disease
(if available)
1 / Prolactin / Rheumatoid arthiritis
2
3
4
5
6
7
8
9
10

1