This is an old revision of the document!
Table of Contents
Gene prediction with the Funannotate pipeline
Joran Martijn (December 2022)
Funannotate is a genome prediction, annotation, and comparison software package. It was originally written to annotate fungal genomes (small eukaryotes ~ 30 Mb genomes), but has evolved over time to accommodate larger genomes.
In my experience it seems to do quite a lot better in predicting gene models than the Braker2 pipeline with Ergobibamus cyprionides . In addition to gene prediction, it can also facilitate functional annotation (hence the name FUNctional - ANNOTATE)
An additional advantage is that it has the capacity to prepare all the files necessary for a NCBI GenBank submission.
Official documentation can be found in their ReadTheDocs
Clean, sort, mask and train
If you have a genome assembly in plain FASTA format ready, as well as some RNAseq data, you can follow along with the Funannotate tutorial described here.
Here I've adapted those commands so they work with our cluster Perun. Note that all these code snippets below are also represented in the Gospel Of Andrew
Clean
The first step is funannotate clean. This algorithm attempts to find and delete short contigs which are 'repetitive', that is they are already fully represented in a larger contig (≥ 95% sequence similarity and ≥ 95% sequence coverage overlap).
#!/bin/bash
#$ -S /bin/bash
#$ -cwd
#$ -m bea
#$ -M joran.martijn@dal.ca
#$ -pe threaded 1
source activate funannotate
# input
GENOME='ergo_cyp_genome.fasta'
# run funannotate
funannotate clean \
--input $GENOME \
--out ${GENOME/fasta/clean.fasta}
