seqkit
By Joran Martijn
Selecting sequences:
# select a sequence with the exact <SeqID> as fasta header ID $ seqkit grep -p <SeqID> <FASTA> # select a sequence of which the fasta header ID matches a <regex_pattern> $ seqkit grep -rp <regex_pattern> <FASTA> # select a set of sequences of which the exact IDs are listed in <SeqID.exact.list> $ seqkit grep -f <SeqID.exact.list> <FASTA> # select a set of sequences of which the IDs match regex patterns listed in <SeqID.regex.list> $ seqkit grep -rf <SeqID.regex.list> <FASTA>
Removing sequences
# remove a sequence with the exact <SeqID> as fasta header ID $ seqkit grep -vp <SeqID> <FASTA> # remove a sequence of which the fasta header ID matches a <regex_pattern> $ seqkit grep -vrp <regex_pattern> <FASTA> # remove a set of sequences of which the exact IDs are listed in <SeqID.exact.list> $ seqkit grep -vf <SeqID.exact.list> <FASTA> # remove a set of sequences of which the IDs match regex patterns listed in <SeqID.regex.list> $ seqkit grep -vrf <SeqID.regex.list> <FASTA>
A FASTA header consists of two parts, the header ID and the header DESCRIPTION. The ID is essentially anything between > and the first space, and the DESCRIPTION is anything after that.
seqkit.txt · Last modified: by 134.190.232.186
