handy_custom_functions
Differences
This shows you the differences between two versions of the page.
| Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
| handy_custom_functions [2021/05/12 14:47] – 168.91.18.151 | handy_custom_functions [2023/07/25 12:08] (current) – 134.190.232.186 | ||
|---|---|---|---|
| Line 11: | Line 11: | ||
| I will discuss here some more custom functions that I found are very useful in my daily workflow. To add these functions to your system, simply add them to your '' | I will discuss here some more custom functions that I found are very useful in my daily workflow. To add these functions to your system, simply add them to your '' | ||
| - | ===Selecting or removing sequences from a FASTA file=== | ||
| - | < | + | ===Reformatting FASTA files downloaded |
| - | # fish out a sequence | + | |
| - | function grabseq { | + | |
| - | fasta=$3 | + | |
| - | to_grab=$2 | + | |
| - | case " | + | |
| - | -s) seqtk subseq $fasta < | + | |
| - | -l) seqtk subseq $fasta <(grep -f " | + | |
| - | *) echo -e " | + | |
| - | esac | + | |
| - | } | + | |
| - | </ | + | |
| - | This is essentially a wrapper for the [[https:// | + | For most of my analyses, |
| < | < | ||
| - | # select any sequence that has < | + | # format NCBI headers |
| - | # useful if you want to find a single sequence | + | function |
| - | $ grabseq -s < | + | |
| - | + | | |
| - | # select a particular set of sequences that have < | + | |
| - | # useful if you want to multiple sequences | + | |
| - | $ grabseq -l < | + | |
| - | </ | + | |
| - | + | ||
| - | The next function | + | |
| - | + | ||
| - | < | + | |
| - | # remove a particular entry from a fasta file | + | |
| - | function rmseq { | + | |
| - | | + | |
| - | | + | |
| - | case "$1" in | + | |
| - | -s) seqtk subseq $fasta <( grep ">" | + | |
| - | | + | |
| - | *) echo -e " | + | |
| - | esac | + | |
| } | } | ||
| </ | </ | ||
| - | < | + | ===Replacing work names with final names for publication=== |
| - | # remove a particular sequence that has < | + | |
| - | $ rmseq -s < | + | |
| - | # remove | + | In my experience I do my analyses with new genomes / transcriptomes etc I work with ' |
| - | $ rmseq -l < | + | |
| - | </ | + | |
| - | NOTE: Dayana pointed out to me another tool similar to seqtk called [[https:// | ||
| - | |||
| - | Selecting sequences: | ||
| < | < | ||
| - | # select a sequence with the exact < | + | # replace taxanames in trees, |
| - | $ seqkit grep -p < | + | function replace_names { |
| - | + | input=$1 | |
| - | # select a sequence of which the fasta header ID matches a < | + | |
| - | $ seqkit grep -rp < | + | |
| - | + | | |
| - | # select a set of sequences of which the exact IDs are listed in <SeqID.exact.list> | + | sed -i -r " |
| - | $ seqkit grep -f <SeqID.exact.list> | + | done |
| - | + | } | |
| - | # select a set of sequences of which the IDs match regex patterns listed in < | + | |
| - | $ seqkit grep -rf < | + | |
| </ | </ | ||
| - | Removing sequences | + | ===Some other functions=== |
| - | < | + | |
| - | # remove a sequence with the exact < | + | |
| - | $ seqkit grep -vp < | + | |
| - | + | ||
| - | # remove a sequence of which the fasta header ID matches a < | + | |
| - | $ seqkit grep -vrp < | + | |
| - | + | ||
| - | # remove a set of sequences of which the exact IDs are listed in < | + | |
| - | $ seqkit grep -vf < | + | |
| - | + | ||
| - | # remove a set of sequences of which the IDs match regex patterns listed in < | + | |
| - | $ seqkit grep -vrf < | + | |
| - | </ | + | |
| - | + | ||
| - | A FASTA header consists of two parts, the header ID and the header DESCRIPTION. The ID is essentially anything between ''>'' | + | |
| < | < | ||
| - | </ | + | # fasta to phylip |
| + | # depends on trimal | ||
| + | function fa2phy { | ||
| + | trimal -in $1 -out ${1%.*}.phylip -phylip | ||
| + | } | ||
| + | # reverse complement function | ||
| + | function revcomp { | ||
| + | tr " | ||
| + | } | ||
| - | < | + | # sum up all numbers in a list |
| - | </ | + | function total { |
| - | + | tr ' | |
| - | + | } | |
| - | < | + | |
| - | </code> | + | |
| - | + | ||
| - | + | ||
| - | < | + | |
| </ | </ | ||
handy_custom_functions.1620841671.txt.gz · Last modified: by 168.91.18.151
