User Tools

Site Tools


blast_and_plast

This is an old revision of the document!


Here is a shell to run PLAST:

#!/bin/bash
#$ -S /bin/bash
. /etc/profile
#$ -o logo
#$ -cwd
#$ -pe threaded 10
cd $PWD
CPUs=10
DB=/db1/nr-nt-fasta-oct-2020/nt
QF=yourquery.fasta
plast -e 1e-10 -max-hit-per-query 1 -outfmt 1 -a $CPUs -p plastn -max-database-size 10000000000 -i $QF -d $DB -o $QF.plout -force-query-order 1000

to parse the output see http://129.173.88.134:81/dokuwiki/doku.php?id=dayana_salas_-_utility_scripts_taxonomy_coloring_trees_phylogenetics_mixture_models_domain_architecture_and_more

Here is a shell example to run BLAST:

#!/bin/bash
#$ -S /bin/bash
. /etc/profile
#$ -pe threaded 1
#$ -cwd
source activate blast
export BLASTDB=/db1/nr-nt-oct-2020-v5/
DB=nt
query=your_query.fasta
blastn -db $DB -query $query -out yourqueryresults.blout -num_threads 1 -outfmt "6 qseqid sseqid evalue pident qcovs length slen qlen qstart qend sstart send stitle"
source deactivate

Both shells using NCBI nt database (/db1/nr-nt-jan-2019/nt.nal), but the formats for specifying DB are different for BLAST and PLAST.

Guide for BLAST usage

  1. blastp:search protein database(e.g., SwissProt db, NCBI-nr) using protein sequence query
  2. blastn:search nucleotide database(e.g., NCBI-nt, MMETSP_DB_clean.v2018.fa)using nucleotide sequence query
  3. blastx:search protein database with translated nucleotide sequence query
  4. tblastn:search translated nucleotide database with protein sequence query
  5. tblastx:search translated nucleotide database with translated nucleotide sequence query

Note: blastp and blastx can usually provide better hit alignments than blastn, especially for distantly related species.This is because amino acids sequences are more conserved than nucleotides (Koonin and Galperin, 2002).

General bugs

when mistakenly use blast options(e.g., blastn or blastp) or query sequence (amino acids or nucleotides sequences):

Error 1:
FASTA-Reader: Ignoring invalid residues at position(s): On line 7: 4, 8, 10, 13, 27-29, 32, 42, 45, 51, 53, 56, 63, 66-67, 70, 78
FASTA-Reader: Ignoring invalid residues at position(s): On line 8: 6, 9, 15, 19-20, 22, 28, 34-39, 45-48, 52

Solve : This is due to mistakenly using the blast options.

Error 2:
BLAST Database error: No alias or index file found for protein database [XXX.fa] in search path [/misc/scratch2/XXX:]

Solve 2: This is due to mistakenly treating nucleotide database as protein database.

blast_and_plast.1630526171.txt.gz · Last modified: by 38.20.199.40