User Tools

Site Tools


seqkit

By Joran Martijn

Selecting sequences:

# select a sequence with the exact <SeqID> as fasta header ID
$ seqkit grep -p <SeqID> <FASTA>

# select a sequence of which the fasta header ID matches a <regex_pattern>
$ seqkit grep -rp <regex_pattern> <FASTA>

# select a set of sequences of which the exact IDs are listed in <SeqID.exact.list>
$ seqkit grep -f <SeqID.exact.list> <FASTA>

# select a set of sequences of which the IDs match regex patterns listed in <SeqID.regex.list>
$ seqkit grep -rf <SeqID.regex.list> <FASTA>

Removing sequences

# remove a sequence with the exact <SeqID> as fasta header ID
$ seqkit grep -vp <SeqID> <FASTA>

# remove a sequence of which the fasta header ID matches a <regex_pattern>
$ seqkit grep -vrp <regex_pattern> <FASTA>

# remove a set of sequences of which the exact IDs are listed in <SeqID.exact.list>
$ seqkit grep -vf <SeqID.exact.list> <FASTA>

# remove a set of sequences of which the IDs match regex patterns listed in <SeqID.regex.list>
$ seqkit grep -vrf <SeqID.regex.list> <FASTA>

A FASTA header consists of two parts, the header ID and the header DESCRIPTION. The ID is essentially anything between > and the first space, and the DESCRIPTION is anything after that.

seqkit.txt · Last modified: by 134.190.232.186