Class
6
Essentials of Next Generation Sequencing 2013Page 1 of 2
Sequence Alignment: BLAST
Goal:Be able to install and use the Basic Local Alignment Search Tool (BLAST ) to align and compare sequences Search the NCBI non-redundant BLAST database with a query file
Input: BLAST/MoTeR_retrotransposons.fasta
BLAST/MoRepeats.fasta
BLAST/magnaporthe_oryzae_70-15_8_supercontigs.fasta
Output:BLAST/MoTeRs.nrBLASTn
BLAST/MoRepeats.Moryzae_genomeBLASTn1
blast-2.2.29+/db/Moryzae_genome.fasta
6.1 Installing BLAST
First, we will download the BLAST binaries directly from the NCBI website.
Go the NCBI homepage at
Click the “Data and Software” link (left-hand panel)
Click the “Downloads” tab
Click the “BLAST (Stand-alone)” link
Click the ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/link under “BLAST+ executables”.
Note that the latest executable for Linux is: ncbi-blast-2.2.29+-ia64-linux.tar.gz
If we were to click on this link, it would download the file to the machine that we are working on – not the Linuxserver where the program needs to be installed.
Instead, we will copy the link to the file to make downloading it via the command line easier. We have a 64-bit (“x64”) Linux system, so right-click the ncbi-blast-2.2.29+-x64-linux.tar.gzlink and select “Copy Link Address” or the equivalent in your browser.
Now use PuTTY to connect to your server via SSH.
Download the latest BLAST executables to your home directory from the NCBI FTP server using the link you copied. Right-clicking pastes it into PuTTY. The command should look like:
- wget ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/ncbi-blast-2.2.29+-x64-linux.tar.gz
After downloading, unpack the executables:
- tar zxvpf ncbi-blast-2.2.29+-x64-linux.tar.gz
Add the directory with the executables to your system PATH:
- PATH=/home/yourusername/ncbi-blast-2.2.29+/bin:$PATH
- export PATH
Make sure to replace ‘yourusername’ with your actual user name (i.e. ngs13u10)!
6.2 Run a Local Remote BLAST Search
Goal:Search the NCBI non-redundant BLAST database with a query file
Input: BLAST/MoTeR_retrotransposons.fasta
Output:BLAST/MoTeRs.nrBLASTn
Now, use your locally installed blastn program to search the NCBI database using the query file MoTeR_retrotransposons.fasta:
- blastn –db nr –query BLAST/MoTeR_retrotransposons.fasta –out BLAST/MoTeRs.nrBLASTn –evalue 1e-20 –outfmt 1 –remote
BLAST takes several parameters here:
-db:specifies the database to be searched (we will use the NCBI “nr” database)
-query:specifies the local query sequence file (full path required)
-out:name of output file
-evalue:tells program to only report matches with ≤ specified value
-outfmt:specifies format of output (values can range from 0 to 11). Output formats are listed below.
-remote:tells program to search a remote (NCBI) database
Here are the possible parameters to –outfmt:
- 0 = pairwise
- 1 = query-anchored showing identities
- 2 = query-anchored no identities
- 3 = flat query-anchored, show identities
- 4 = flat query-anchored, no identities
- 5 = XML Blast output
- 6 = tabular
- 7 = tabular with comment lines
- 8 = Text ASN.1
- 9 = Binary ASN.1
- 10 = Comma-separated values
- 11 = BLAST archive format (ASN.1)
You may want to experiment with this parameter to see which format suits your needs.
Examine the BLAST output with less.
6.3 Create and Search a Custom BLAST Database
Goal:Create a BLAST nucleotide database from the genome assembly and perform a query against it
Input: BLAST/MoRepeats.fasta
BLAST/magnaporthe_oryzae_70-15_8_supercontigs.fasta
Output:BLAST/MoRepeats.Moryzae_genomeBLASTn1
blast-2.2.29+/db/Moryzae_genome.fasta
Next, we will create a BLAST database from our existing genome data that we can search against. First, we will need to tell BLAST where to look for your custom database with a.ncbircfile.
Use the vim text editor to create a file named.ncbirc (yes, the prefix period should be included) inside your home directory. This file should contain the text:
[BLAST]
BLASTDB=/home/yourusername/ncbi-blast-2.2.29+/db
Again, be sure to replace ‘yourusername’with your actual user name (i.e. ngs13u10)
Create a subdirectory named dbwithin the ncbi-blast-2.2.29+ directory.
Copy theBLAST/magnaporthe_oryzae_70-15_8_supercontigs.fastagenome file into your newly created dbdirectory.
Change to the dbdirectory, and use makeblastdb to create a new BLAST database:
- makeblastdb–in magnaporthe_oryzae_70-15_8_supercontigs.fasta
–dbtypenucl –out Moryzae_genome.fasta
Now change back to your home directory and run a blastn search using the sequence in MoRepeats.fastaas the query and your new genome as the database:
- blastn -dbMoryzae_genome.fasta -query BLAST/MoRepeats.fasta –out BLAST/MoRepeats.Moryzae_genomeBLASTn1
-evalue 1e-20 –outfmt 1
Examine your output file with less.
Now is a good time to try running the search using a few different output format options (0 through 11). Try –outfmt 6for example!
- blastn -dbMoryzae_genome.fasta -query BLAST/MoRepeats.fasta –out BLAST/MoRepeats.Moryzae_genomeBLASTn6
-evalue 1e-20 –outfmt 6
BLAST comes in many flavors, not just blastn.
- blastn: nucleotide-nucleotide alignment
- blastp: protein-protein alignment
- blastx: does six-frame translation of query nucleotide sequence and aligns against a protein database
- and many more!
Essentials of Next Generation Sequencing 2014Page 1 of 4