User Tools

Site Tools


handy_custom_functions

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
handy_custom_functions [2021/06/10 19:49] 134.190.232.18handy_custom_functions [2023/07/25 12:08] (current) 134.190.232.186
Line 11: Line 11:
 I will discuss here some more custom functions that I found are very useful in my daily workflow. To add these functions to your system, simply add them to your ''.bashrc'' I will discuss here some more custom functions that I found are very useful in my daily workflow. To add these functions to your system, simply add them to your ''.bashrc''
  
-===Selecting or removing sequences from a FASTA file=== 
- 
-<code> 
-# fish out a sequence from a fasta file 
-function grabseq { 
-    fasta=$3 
-    to_grab=$2 
-    case "$1" in 
-        -s) seqtk subseq $fasta <(grep    "$to_grab" $fasta | sed -e 's/>//' -e 's/ .*//');; 
-        -l) seqtk subseq $fasta <(grep -f "$to_grab" $fasta | sed -e 's/>//' -e 's/ .*//');; 
-        *) echo -e "grabseq -s <seqname> <fasta> to grab a single sequence\ngrab_seq -l <seqlist> <fasta> to grab a list of sequences" 
-    esac 
-} 
-</code> 
- 
-This is essentially a wrapper for the [[https://github.com/lh3/seqtk|seqtk]] tool. 
- 
-<code> 
-# select any sequence that has <pattern> in the header 
-# useful if you want to find a single sequence 
-$ grabseq -s <pattern> <FASTA> 
- 
-# select a particular set of sequences that have <pattern> in their header and are in the <pattern_list> file 
-# useful if you want to multiple sequences 
-$ grabseq -l <pattern_list> <FASTA> 
-</code> 
- 
-The next function does the opposite of grabseq. It will remove particular sequences from a FASTA file. 
- 
-<code> 
-# remove a particular entry from a fasta file 
-function rmseq { 
-    fasta=$3 
-    to_rmv=$2 
-    case "$1" in 
-        -s) seqtk subseq $fasta <( grep ">" $fasta | grep -E -v "$to_rmv" | sed "s/>//" );; 
-        -l) seqtk subseq $fasta <( grep ">" $fasta | grep -v -f "$to_rmv" | sed "s/>//" );; 
-        *) echo -e "rmseq -s <seqname> <fasta> to remove a single sequence\nremove_seq -l <seqlist> <fasta> to remove a list of sequences" 
-    esac 
-} 
-</code> 
- 
-<code> 
-# remove a particular sequence that has <pattern> in the header 
-$ rmseq -s <pattern> <FASTA> 
- 
-# remove a particular set of sequences that have <pattern> in the header and are in the <pattern_list> file 
-$ rmseq -l <pattern_list> <FASTA> 
-</code> 
- 
-NOTE: Dayana pointed out to me another tool similar to seqtk called [[https://github.com/shenwei356/seqkit|SeqKit]] which has a lot more functionality, essentially making the above custom functions obsolete. 
- 
-Selecting sequences: 
-<code> 
-# select a sequence with the exact <SeqID> as fasta header ID 
-$ seqkit grep -p <SeqID> <FASTA> 
- 
-# select a sequence of which the fasta header ID matches a <regex_pattern> 
-$ seqkit grep -rp <regex_pattern> <FASTA> 
- 
-# select a set of sequences of which the exact IDs are listed in <SeqID.exact.list> 
-$ seqkit grep -f <SeqID.exact.list> <FASTA> 
- 
-# select a set of sequences of which the IDs match regex patterns listed in <SeqID.regex.list> 
-$ seqkit grep -rf <SeqID.regex.list> <FASTA> 
-</code> 
- 
-Removing sequences 
-<code> 
-# remove a sequence with the exact <SeqID> as fasta header ID 
-$ seqkit grep -vp <SeqID> <FASTA> 
- 
-# remove a sequence of which the fasta header ID matches a <regex_pattern> 
-$ seqkit grep -vrp <regex_pattern> <FASTA> 
- 
-# remove a set of sequences of which the exact IDs are listed in <SeqID.exact.list> 
-$ seqkit grep -vf <SeqID.exact.list> <FASTA> 
- 
-# remove a set of sequences of which the IDs match regex patterns listed in <SeqID.regex.list> 
-$ seqkit grep -vrf <SeqID.regex.list> <FASTA> 
-</code> 
- 
-A FASTA header consists of two parts, the header ID and the header DESCRIPTION. The ID is essentially anything between ''>'' and the first space, and the DESCRIPTION is anything after that. 
  
 ===Reformatting FASTA files downloaded from NCBI=== ===Reformatting FASTA files downloaded from NCBI===
handy_custom_functions.1623365342.txt.gz · Last modified: by 134.190.232.18