FASTX-toolkit http://hannonlab.cshl.edu/fastx_toolkit/
- used mainly to manipulate sequence reads from illumina sequencers
- most of the tools can handle fasta and fastq files
- commandline
- on perun
- conda environment
- source activate fastx_toolkit
fastx_clipper -h
- online graphical interface
http://hannonlab.cshl.edu/fastx_toolkit/galaxy.html https://usegalaxy.org/
- local downloads
http://hannonlab.cshl.edu/fastx_toolkit/download.html
DOES NOT WORK WITH LOWERCASE nucleotides
DOES NOT WORK with interleaved sequences
DOES NOT WORK WITH AMINO ACIDS
-common syntax
- i for input file
- o for output file
- h for help
fastx_clipper [-l N] = discard sequences shorter than N nucleotides. default is 5
more p2.fasta
>sequence1 blahblah ACGTACGTACGTACGTACGTACGTT >sequence47 bluhbluhbluh TTTTTTAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACCCCC >myfavourite AGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAG >myfav2 AGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAG >contig23 acanth AGTAGTGACTGAGTAATAGACGTAG
fastx_clipper -i p2.fasta -o blah -l 50
more blah
>sequence47 bluhbluhbluh TTTTTTAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACCCCC
fastx_renamer
- renames the sequences
- n TYPE
- default or SEQ uses sequence as name
- COUNT uses counter
- the extra information on the header line is lost
more p2.fasta
>sequence1 blahblah ACGTACGTACGTACGTACGTACGTT >sequence47 bluhbluhbluh TTTTTTAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACCCCC >myfavourite AGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAG >contig23 acanth AGTAGTGACTGAGTAATAGACGTAG
fastx_renamer -i p2.fasta -o p2renamed.fasta -n SEQ
>ACGTACGTACGTACGTACGTACGTT ACGTACGTACGTACGTACGTACGTT >TTTTTTAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACCCCC TTTTTTAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACCCCC >AGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAG AGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAG >AGTAGTGACTGAGTAATAGACGTAG AGTAGTGACTGAGTAATAGACGTAG
fastx_renamer -i p2.fasta -o p2renamed.fasta -n COUNT
>1 ACGTACGTACGTACGTACGTACGTT >2 TTTTTTAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACCCCC >3 AGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAG >4 AGTAGTGACTGAGTAATAGACGTAG
fasta_formatter
- can change multiline sequences (interleaved) to single line sequences (sequential)
- can change the length of interleaved lines
more p5.fasta
>M86863.1 GTAACATGACGTTGACCGTGCGGGGCTACATGTAGCAGCTGGGTGTGCTAACTACGGATACATGCCTACA ACCCCCACAAGTCAAGACCATTGCGACGCGGAAACAGGAGCCCGCAAAAGAGGAGAAAAACAACGGCGAG >seq2 ACTCGGGGGCGGAGTGGGTCACGTGACTTTCCTTTTTCCCCTCACCTGGCCCGCTCCGTCCATATCTCTG TCGTACAAGACAATATTGTCGCAACGCAAAAGGTCCATAAATTACTGGGTAGACGCAACTCTATTTGAAG GCAACCTACCGTTTGCTTTTAGTGTTTTGGTTTTGTTACCATATCCAAAAAAAAACCATATATCCAAAAA TTCCGCTGCACCATCTCTTCTTCTCTCCATCAACTACCCCTGCGGAGAAATTCACACCACAGTTACAATG
fasta_formatter -i p5.fasta -o p5oneline
>M86863.1 GTAACATGACGTTGACCGTGCGGGGCTACATGTAGCAGCTGGGTGTGCTAACTACGGATACATGCCTACAACCCCCACAAGTCAAGACCATTGCGACGCGGAGGG >seq2 ACTCGGGGGCGGAGTGGGTCACGTGACTTTCCTTTTTCCCCTCACCTGGCCCGCTCCGTCCATATCTCTGTCGTACAAGACAATATTGTCGCAACGCAAAAGGTC
fasta_formatter -i p5.fasta -o p5.20 -w 20
>M86863.1 GTAACATGACGTTGACCGTG CGGGGCTACATGTAGCAGCT GGGTGTGCTAACTACGGATA CATGCCTACAACCCCCACAA GTCAAGACCATTGCGACGCG GAAACAGGAGCCCGCAAAAG AGGAGAAAAACAACGGCGAG >seq2 ACTCGGGGGCGGAGTGGGTC ACGTGACTTTCCTTTTTCCC CTCACCTGGCCCGCTCCGTC CATATCTCTGTCGTACAAGA CAATATTGTCGCAACGCAAA AGGTCCATAAATTACTGGGT AGACGCAACTCTATTTGAAG GCAACCTACCGTTTGCTTTT AGTGTTTTGGTTTTGTTACC ATATCCAAAAAAAAACCATA TATCCAAAAATTCCGCTGCA CCATCTCTTCTTCTCTCCAT CAACTACCCCTGCGGAGAAA TTCACACCACAGTTACAATG
fastx_trimmer
- first and last base to keep
- default is entire read
- f first base to keep
- 1=first base
- l last base to keep
- based on the entire read before -f
more p6.fasta
>seq1 AAAAAAAAAATTTTTTTTTTGGGGGGGGGG >seq2 CCCAAATTTGGGCCCAAATTTGGG
fastx_trimmer -i p6.fasta -o p6.trimmed -f 10
>seq1 ATTTTTTTTTTGGGGGGGGGG >seq2 GGGCCCAAATTTGGG
fastx_trimmer -i p6.fasta -o p6.trimmed -f 10 -l 15
>seq1 ATTTTT >seq2 GGGCCC
fastx_reverse_complement
- reverse complements the sequences
- does both
more p7.fasta
>seq1 AAAAAAAAAAGGGGGGGGGG >seq2 AGAGAGTT
fastx_reverse_complement -i p7.fasta -o p7rc
>seq1 CCCCCCCCCCTTTTTTTTTT >seq2 AACTCTCT
