Class

6

Essentials of Next Generation Sequencing 2013Page 1 of 2

Sequence Alignment: BLAST

Goal:Be able to install and use the Basic Local Alignment Search Tool (BLAST ) to align and compare sequences Search the NCBI non-redundant BLAST database with a query file

Input: BLAST/MoTeR_retrotransposons.fasta
BLAST/MoRepeats.fasta
BLAST/magnaporthe_oryzae_70-15_8_supercontigs.fasta

Output:BLAST/MoTeRs.nrBLASTn
BLAST/MoRepeats.Moryzae_genomeBLASTn1
blast-2.2.29+/db/Moryzae_genome.fasta

6.1 Installing BLAST

First, we will download the BLAST binaries directly from the NCBI website.

Go the NCBI homepage at

Click the “Data and Software” link (left-hand panel)

Click the “Downloads” tab

Click the “BLAST (Stand-alone)” link

Click the ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/link under “BLAST+ executables”.

Note that the latest executable for Linux is: ncbi-blast-2.2.29+-ia64-linux.tar.gz

If we were to click on this link, it would download the file to the machine that we are working on – not the Linuxserver where the program needs to be installed.

Instead, we will copy the link to the file to make downloading it via the command line easier. We have a 64-bit (“x64”) Linux system, so right-click the ncbi-blast-2.2.29+-x64-linux.tar.gzlink and select “Copy Link Address” or the equivalent in your browser.

Now use PuTTY to connect to your server via SSH.

Download the latest BLAST executables to your home directory from the NCBI FTP server using the link you copied. Right-clicking pastes it into PuTTY. The command should look like:

  • wget ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/ncbi-blast-2.2.29+-x64-linux.tar.gz

After downloading, unpack the executables:

  • tar zxvpf ncbi-blast-2.2.29+-x64-linux.tar.gz

Add the directory with the executables to your system PATH:

  • PATH=/home/yourusername/ncbi-blast-2.2.29+/bin:$PATH
  • export PATH

Make sure to replace ‘yourusername’ with your actual user name (i.e. ngs13u10)!

6.2 Run a Local  Remote BLAST Search

Goal:Search the NCBI non-redundant BLAST database with a query file

Input: BLAST/MoTeR_retrotransposons.fasta

Output:BLAST/MoTeRs.nrBLASTn

Now, use your locally installed blastn program to search the NCBI database using the query file MoTeR_retrotransposons.fasta:

  • blastn –db nr –query BLAST/MoTeR_retrotransposons.fasta –out BLAST/MoTeRs.nrBLASTn –evalue 1e-20 –outfmt 1 –remote

BLAST takes several parameters here:

-db:specifies the database to be searched (we will use the NCBI “nr” database)

-query:specifies the local query sequence file (full path required)

-out:name of output file

-evalue:tells program to only report matches with ≤ specified value

-outfmt:specifies format of output (values can range from 0 to 11). Output formats are listed below.

-remote:tells program to search a remote (NCBI) database

Here are the possible parameters to –outfmt:

  • 0 = pairwise
  • 1 = query-anchored showing identities
  • 2 = query-anchored no identities
  • 3 = flat query-anchored, show identities
  • 4 = flat query-anchored, no identities
  • 5 = XML Blast output
  • 6 = tabular
  • 7 = tabular with comment lines
  • 8 = Text ASN.1
  • 9 = Binary ASN.1
  • 10 = Comma-separated values
  • 11 = BLAST archive format (ASN.1)

You may want to experiment with this parameter to see which format suits your needs.

Examine the BLAST output with less.

6.3 Create and Search a Custom BLAST Database

Goal:Create a BLAST nucleotide database from the genome assembly and perform a query against it

Input: BLAST/MoRepeats.fasta
BLAST/magnaporthe_oryzae_70-15_8_supercontigs.fasta

Output:BLAST/MoRepeats.Moryzae_genomeBLASTn1
blast-2.2.29+/db/Moryzae_genome.fasta

Next, we will create a BLAST database from our existing genome data that we can search against. First, we will need to tell BLAST where to look for your custom database with a.ncbircfile.

Use the vim text editor to create a file named.ncbirc (yes, the prefix period should be included) inside your home directory. This file should contain the text:

[BLAST]
BLASTDB=/home/yourusername/ncbi-blast-2.2.29+/db

Again, be sure to replace ‘yourusername’with your actual user name (i.e. ngs13u10)

Create a subdirectory named dbwithin the ncbi-blast-2.2.29+ directory.

Copy theBLAST/magnaporthe_oryzae_70-15_8_supercontigs.fastagenome file into your newly created dbdirectory.

Change to the dbdirectory, and use makeblastdb to create a new BLAST database:

  • makeblastdb–in magnaporthe_oryzae_70-15_8_supercontigs.fasta
    –dbtypenucl –out Moryzae_genome.fasta

Now change back to your home directory and run a blastn search using the sequence in MoRepeats.fastaas the query and your new genome as the database:

  • blastn -dbMoryzae_genome.fasta -query BLAST/MoRepeats.fasta –out BLAST/MoRepeats.Moryzae_genomeBLASTn1
    -evalue 1e-20 –outfmt 1

Examine your output file with less.

Now is a good time to try running the search using a few different output format options (0 through 11). Try –outfmt 6for example!

  • blastn -dbMoryzae_genome.fasta -query BLAST/MoRepeats.fasta –out BLAST/MoRepeats.Moryzae_genomeBLASTn6
    -evalue 1e-20 –outfmt 6

BLAST comes in many flavors, not just blastn.

  • blastn: nucleotide-nucleotide alignment
  • blastp: protein-protein alignment
  • blastx: does six-frame translation of query nucleotide sequence and aligns against a protein database
  • and many more!

Essentials of Next Generation Sequencing 2014Page 1 of 4