| Both sides previous revisionPrevious revisionNext revision | Previous revision |
| blast_protocol [2021/09/03 11:43] – 134.190.232.139 | blast_protocol [2022/09/06 14:49] (current) – 134.190.232.106 |
|---|
| - __tblastx__: search translated nucleotide database with translated nucleotide sequence query | - __tblastx__: search translated nucleotide database with translated nucleotide sequence query |
| |
| //Note: blastp and blastx can usually provide better hit alignments than blastn, especially for distantly related species.This is because amino acids sequences are more conserved than nucleotides (Koonin and Galperin, 2002).// | {{:blast.png?400|}} |
| | |
| | //**blastp** can usually provide better hit alignments than blastn, especially for distantly related species.This is partially because amino acids sequences are more conserved than nucleotides (Koonin and Galperin, 2002).// |
| | |
| | //**blastx** translates the query sequence in all six reading frames and provides combined significance statistics for hits to different frames, it is particularly useful __when the reading frame of the query sequence is unknown or it contains errors that may lead to frame shifts or other coding errors__. Thus blastx is often the first analysis performed with a newly determined nucleotide sequence.// |
| | |
| | //**tblastn** is useful for __finding homologous protein coding regions in unannotated nucleotide sequences such as expressed sequence tags (ESTs)__ and draft genome records, ESTs are short, single-read cDNA sequences. They comprise the largest pool of sequence data for many organisms and contain portions of transcripts from many uncharacterized genes. __Since ESTs have no annotated coding sequences, there are no corresponding protein translations in the BLAST protein databases.__ Hence a tblastn search is the only way to search for these potential coding regions at the protein level.// |
| | Courtesy of the web source: https://guides.lib.berkeley.edu/ncbi/blast |
| |
| **General bugs** | **General bugs** |
| > python blastxml_to_tabular.py -o output.tabular -c qseqid,qlen,salltitles,sseqid,slen,bitscore,qframe,pident,evalue,qstart,qend,sstart,send,length input.xml | > python blastxml_to_tabular.py -o output.tabular -c qseqid,qlen,salltitles,sseqid,slen,bitscore,qframe,pident,evalue,qstart,qend,sstart,send,length input.xml |
| </code> | </code> |
| | |
| | #This is another way to parse BLAST outputs via using -outfmt '6 qseqid sseqid ...' |
| | |
| | <code> |
| | #!/bin/bash |
| | #$ -S /bin/bash |
| | . /etc/profile |
| | #$ -pe threaded 2 |
| | #$ -cwd |
| | source activate blast |
| | export BLASTDB= /misc/scratch3/rogerlab_databases/other_dbs/nr_010621 |
| | DB=nr |
| | query=ATCG00670.1.fasta |
| | blastp -db $DB -query $query -out /scratch2/xizhang/BLASTP_nr.tsv -num_threads 2 -outfmt '6 qseqid sseqid evalue pident qcovs length slen qlen qstart qend sstart send stitle' |
| | source deactivate |
| | </code> |
| | |
| | Sep 6th,2022 Since Diamond is faster on BLASTP and BLASTx, this is another way using Diamond |
| | |
| | <code> |
| | #!/bin/bash |
| | #$ -S /bin/bash |
| | . /etc/profile |
| | #$ -pe threaded 40 |
| | #$ -cwd |
| | source activate /scratch2/software/anaconda/envs/diamond-2.0.7 |
| | #DB=nr |
| | while read line |
| | do |
| | |
| | diamond blastp -p 40 -k 5 -e 1e-10 -f 6 qseqid sseqid pident length mismatch gapopen qstart qend sstart send evalue bitscore stitle salltitles --header -d /misc/scratch3/rogerlab_databases/other_dbs/nr_02032022/diamond_nr.dmnd -q $line -o BLASTP_nr.$line.tsv --sensitive |
| | |
| | done <$1 |
| | |
| | conda deactivate |
| | |
| | </code> |
| | |
| |
| **V5 NCBI database** | **V5 NCBI database** |