User Tools

Site Tools


blast_protocol

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
blast_protocol [2021/09/02 16:30] 38.20.199.40blast_protocol [2022/09/06 14:49] (current) 134.190.232.106
Line 7: Line 7:
   - __tblastx__: search translated nucleotide database with translated nucleotide sequence query   - __tblastx__: search translated nucleotide database with translated nucleotide sequence query
  
-//Note: blastp and blastx can usually provide better hit alignments than blastn, especially for distantly related species.This is because amino acids sequences are more conserved than nucleotides (Koonin and Galperin, 2002).// +{{:blast.png?400|}} 
 + 
 +//**blastp** can usually provide better hit alignments than blastn, especially for distantly related species.This is partially because amino acids sequences are more conserved than nucleotides (Koonin and Galperin, 2002).//  
 + 
 +//**blastx** translates the query sequence in all six reading frames and provides combined significance statistics for hits to different frames, it is particularly useful __when the reading frame of the query sequence is unknown or it contains errors that may lead to frame shifts or other coding errors__. Thus blastx is often the first analysis performed with a newly determined nucleotide sequence.// 
 + 
 +//**tblastn** is useful for __finding homologous protein coding regions in unannotated nucleotide sequences such as expressed sequence tags (ESTs)__ and draft genome records, ESTs are short, single-read cDNA sequences. They comprise the largest pool of sequence data for many organisms and contain portions of transcripts from many uncharacterized genes. __Since ESTs have no annotated coding sequences, there are no corresponding protein translations in the BLAST protein databases.__ Hence a tblastn search is the only way to search for these potential coding regions at the protein level.//  
 +Courtesy of the web source: https://guides.lib.berkeley.edu/ncbi/blast 
  
 **General bugs**  **General bugs** 
Line 34: Line 41:
 Using BLASTP search option to blast the amino acid sequences against uniport_db database. Using BLASTP search option to blast the amino acid sequences against uniport_db database.
 <code> <code>
-./blastp -query XXX.fasta -db uniprot_db -out BLASTP_XXX_uniprot.xml -evalue 1e-5 -outfmt 5+./blastp -query XXX.fasta -db uniprot_db -out BLASTP_XXX_uniprot.xml -evalue 1e-5 -outfmt 5
 </code> </code>
  
Line 77: Line 84:
     25 salltitles    All subject titles, separated by '&lt;&gt;'     25 salltitles    All subject titles, separated by '&lt;&gt;'
  
-    python blastxml_to_tabular.py -o output.tabular -c std input.xml +    python blastxml_to_tabular.py -o output.tabular -c std input.xml 
-    python blastxml_to_tabular.py -o output.tabular -c ext input.xml +    python blastxml_to_tabular.py -o output.tabular -c ext input.xml 
-    python blastxml_to_tabular.py -o output.tabular -c qseqid,qlen,salltitles,sseqid,slen,bitscore,qframe,pident,evalue,qstart,qend,sstart,send,length input.xml+    python blastxml_to_tabular.py -o output.tabular -c qseqid,qlen,salltitles,sseqid,slen,bitscore,qframe,pident,evalue,qstart,qend,sstart,send,length input.xml
 </code> </code>
 +
 +#This is another way to parse BLAST outputs via using -outfmt '6 qseqid sseqid ...'
 +
 +<code>
 +#!/bin/bash
 +#$ -S /bin/bash
 +. /etc/profile
 +#$ -pe threaded 2
 +#$ -cwd
 +source activate blast
 +export BLASTDB= /misc/scratch3/rogerlab_databases/other_dbs/nr_010621
 +DB=nr
 +query=ATCG00670.1.fasta
 +blastp -db $DB -query $query -out /scratch2/xizhang/BLASTP_nr.tsv -num_threads 2 -outfmt '6 qseqid sseqid evalue pident qcovs length slen qlen qstart qend sstart send stitle'
 +source deactivate
 +</code>
 +
 +Sep 6th,2022 Since Diamond is faster on BLASTP and BLASTx, this is another way using Diamond 
 +
 +<code>
 +#!/bin/bash
 +#$ -S /bin/bash
 +. /etc/profile
 +#$ -pe threaded 40
 +#$ -cwd
 +source activate /scratch2/software/anaconda/envs/diamond-2.0.7
 +#DB=nr
 +while read line
 +do 
 +
 +diamond blastp -p 40 -k 5 -e 1e-10 -f 6 qseqid sseqid pident length mismatch gapopen qstart qend sstart send evalue bitscore stitle salltitles --header -d /misc/scratch3/rogerlab_databases/other_dbs/nr_02032022/diamond_nr.dmnd -q $line -o BLASTP_nr.$line.tsv --sensitive
 +
 +done <$1
 +
 +conda deactivate
 +
 +</code>
 +
  
 **V5 NCBI database** **V5 NCBI database**
Line 89: Line 134:
 The V5 NCBI database can be found via https://ftp.ncbi.nlm.nih.gov/blast/db/v5. The good thing about V5 than V4 database is not just the former is faster, but also the option to screen out your interested taxonomy. The V5 NCBI database can be found via https://ftp.ncbi.nlm.nih.gov/blast/db/v5. The good thing about V5 than V4 database is not just the former is faster, but also the option to screen out your interested taxonomy.
  
 +In order to limit your BLAST+ search by taxonomy, you’ll need to obtain the taxid(s) for your organism(s). Two options can be used here: "taxids" or "taxidlist"
 +
 +This is to acquire the taxid list for your interested organism e.g.,bacteria 
 <code> <code>
 +./get_species_taxids.sh -n bacteria
 +</code> 
 +get_species_taxids.sh script is from the blast+ package under the bin directory.
 +Taxid for bacteria is 2. Then acquire a list of taxonomy ids from bacteria species. 
  
 +<code>
 +./get_species_taxids.sh -t 2 > 2.txids
 </code> </code>
  
 +Using 2.txids to limit the NCBI v5 database search scope is far more efficient.
  
 +<code>
 +./blastp –db nr –query QUERY –taxidlist 2.txids –outfmt 5 –out OUTPUT.tab
 +./blastp –db nr –query QUERY –taxids 1117,1118,1119,1121 –outfmt 5 –out OUTPUT.tab
 +</code>
 +
 +If use "taxids" option, use comma to separate different organisms.e.g., different cyanobacteria organisms: 1117,1118,1119,1121
  
  
 Note: Please refer to the guide for the most updated information. https://ftp.ncbi.nlm.nih.gov/blast/db/v5/v5/blastdbv5.pdf Note: Please refer to the guide for the most updated information. https://ftp.ncbi.nlm.nih.gov/blast/db/v5/v5/blastdbv5.pdf
  
-{{:22-you-got-this-meme-5.jpg?nolink&400|}}+{{:22-you-got-this-meme-5.jpg?nolink&200|}} 
 + 
 +<Last updated by Xi Zhang on Sep 3rd,2021>
blast_protocol.1630611008.txt.gz · Last modified: by 38.20.199.40 · Currently locked by: 216.73.216.59