User Tools

Site Tools


blast_and_plast

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
blast_and_plast [2021/09/02 15:59] 38.20.199.40blast_and_plast [2024/11/04 10:14] (current) 110.239.172.216
Line 10: Line 10:
 cd $PWD cd $PWD
 CPUs=10 CPUs=10
-DB=/db1/nr-nt-fasta-oct-2020/nt+DB=/scratch3/rogerlab_databases/other_dbs/nr_March252023/plast/nr.fasta
 QF=yourquery.fasta QF=yourquery.fasta
-plast -e 1e-10 -max-hit-per-query 1 -outfmt 1 -a $CPUs -p plastn -max-database-size 10000000000 -i $QF -d $DB -o $QF.plout -force-query-order 1000+plast -e 1e-10 -max-hit-per-query 1 -outfmt 1 -a $CPUs -p plastp -max-database-size 10000000000 -i $QF -d $DB -o $QF.plout -force-query-order 1000
 </code> </code>
  
-to parse the output see http://129.173.88.134:81/dokuwiki/doku.php?id=dayana_salas_-_utility_scripts_taxonomy_coloring_trees_phylogenetics_mixture_models_domain_architecture_and_more+to parse the output see https://perun.biochem.dal.ca/user-wiki/doku.php?id=taxonomy_recovery
  
  
Line 26: Line 26:
 #$ -S /bin/bash #$ -S /bin/bash
 . /etc/profile . /etc/profile
-#$ -pe threaded 1+#$ -pe threaded 10
 #$ -cwd #$ -cwd
 source activate blast source activate blast
-export BLASTDB=/db1/nr-nt-oct-2020-v5/+export BLASTDB=/db1/blast-may-2024/
 DB=nt DB=nt
 query=your_query.fasta query=your_query.fasta
-blastn -db $DB -query $query -out yourqueryresults.blout -num_threads -outfmt "6 qseqid sseqid evalue pident qcovs length slen qlen qstart qend sstart send stitle" +blastn -db $DB -query $query -out yourqueryresults.blout -num_threads 10 -outfmt "6 qseqid sseqid evalue pident qcovs length slen qlen qstart qend sstart send stitle" 
-source deactivate+conda deactivate
 </code> </code>
  
-Both shells using NCBI nt database (/db1/nr-nt-jan-2019/nt.nal), but the formats for specifying DB are different for BLAST and PLAST.+Both shells use NCBI nt database, but PLAST doesn't support new v5 NCBI nr and nt databases and can cause Segmentation fault error.
  
-**Guide for BLAST usage**+<Last updated by Dandan Zhao on Jun 11, 2024>
  
-  - blastp:search protein database(e.g., SwissProt db, NCBI-nr) using protein sequence query 
-  - blastn:search nucleotide database(e.g., NCBI-nt, MMETSP_DB_clean.v2018.fa)using nucleotide sequence query 
-  - blastx:search protein database with translated nucleotide sequence query 
-  - tblastn:search translated nucleotide database with protein sequence query 
-  - tblastx:search translated nucleotide database with translated nucleotide sequence query 
  
-//Note: blastp and blastx can usually provide better hit alignments than blastn, especially for distantly related species.This is because amino acids sequences are more conserved than nucleotides (Koonin and Galperin, 2002).//  
- 
-**General bugs**  
- 
-when mistakenly use blast options(e.g., blastn or blastp) or query sequence (amino acids or nucleotides sequences): 
- 
-<code> 
-Error 1: 
-FASTA-Reader: Ignoring invalid residues at position(s): On line 7: 4, 8, 10, 13, 27-29, 32, 42, 45, 51, 53, 56, 63, 66-67, 70, 78 
-FASTA-Reader: Ignoring invalid residues at position(s): On line 8: 6, 9, 15, 19-20, 22, 28, 34-39, 45-48, 52 
-</code> 
- 
-Solve : 
-This is due to mistakenly using the blast options.  
- 
-<code> 
-Error 2: 
-BLAST Database error: No alias or index file found for protein database [XXX.fa] in search path [/misc/scratch2/XXX:] 
-</code> 
- 
-Solve 2: 
-This is due to mistakenly treating nucleotide database as protein database.  
- 
-**Parsing Blast results** 
- 
-Using BLASTP search option to blast the amino acid sequences against uniport_db database. 
-<code> 
-> ./blastp -query XXX.fasta -db uniprot_db -out BLASTP_XXX_uniprot.xml -evalue 1e-5 -outfmt 5 
-</code> 
- 
- 
-The **BLAST XML file** (-outfmt 5) can include useful information comparing to the BLAST Tabular file (-outfmt 6), such as the aligned sequence, the sequence of the hit, and the description of hits into the database. However, the XML format is not human-readable. 
- 
-Users will need to employ a commonly used parser (//Blastxml_to_tabular.py//) from the link(https://github.com/peterjc/galaxy_blast/blob/master/tools/ncbi_blast_plus/blastxml_to_tabular.py)(Cock et al.,2015), which is a custom python script, to convert a BLAST XML file to a desired tabular output (tab-delimited file). 
- 
-<code> 
- 
-python blastxml_to_tabular.py -c qseqid,qlen,salltitles,sseqid,slen,bitscore,qframe,pident,evalue,qstart,qend,sstart,send,length BLASTP_XXX_uniprot.xml > BLASTP_XXX_uniprot.tsv 
- 
-</code> 
- 
-"-c" option is to infer the displayed columns, there are 25 columns in total. User can infer the front 12 columns via "-c std"; infer 25 columns via "-c ext";infer the personalized via using selected column name delimited by comma "-c qseqid,sseqid"  
- 
-<code> 
-     1 qseqid    Query Seq-id (ID of your sequence) 
-     2 sseqid    Subject Seq-id (ID of the database hit) 
-     3 pident    Percentage of identical matches 
-     4 length    Alignment length 
-     5 mismatch  Number of mismatches 
-     6 gapopen   Number of gap openings 
-     7 qstart    Start of alignment in query 
-     8 qend      End of alignment in query 
-     9 sstart    Start of alignment in subject (database hit) 
-    10 send      End of alignment in subject (database hit) 
-    11 evalue    Expectation value (E-value) 
-    12 bitscore  Bit score 
-    13 sallseqid     All subject Seq-id(s), separated by ';' 
-    14 score         Raw score 
-    15 nident        Number of identical matches 
-    16 positive      Number of positive-scoring matches 
-    17 gaps          Total number of gaps 
-    18 ppos          Percentage of positive-scoring matches 
-    19 qframe        Query frame 
-    20 sframe        Subject frame 
-    21 qseq          Aligned part of query sequence 
-    22 sseq          Aligned part of subject sequence 
-    23 qlen          Query sequence length 
-    24 slen          Subject sequence length 
-    25 salltitles    All subject titles, separated by '&lt;&gt;' 
-</code>   
-   
-<code>   
-    $ python blastxml_to_tabular.py -o output.tabular -c std input.xml 
-    $ python blastxml_to_tabular.py -o output.tabular -c ext input.xml 
-    $ python blastxml_to_tabular.py -o output.tabular -c qseqid,qlen,salltitles,sseqid,slen,bitscore,qframe,pident,evalue,qstart,qend,sstart,send,length input.xml 
-</code> 
blast_and_plast.1630609192.txt.gz · Last modified: by 38.20.199.40