This is an old revision of the document!

Gene prediction with the Funannotate pipeline

Joran Martijn (December 2022)

Funannotate is a genome prediction, annotation, and comparison software package. It was originally written to annotate fungal genomes (small eukaryotes ~ 30 Mb genomes), but has evolved over time to accommodate larger genomes.

In my experience it seems to do quite a lot better in predicting gene models than the Braker2 pipeline with Ergobibamus cyprionides . In addition to gene prediction, it can also facilitate functional annotation (hence the name FUNctional - ANNOTATE)

An additional advantage is that it has the capacity to prepare all the files necessary for a NCBI GenBank submission.

Official documentation can be found in their ReadTheDocs

Clean, sort, mask and train

If you have a genome assembly in plain FASTA format ready, as well as some RNAseq data, you can follow along with the Funannotate tutorial described here.

Here I've adapted those commands so they work with our cluster Perun. Note that all these code snippets below are also represented in the Gospel Of Andrew

Clean

The first step is funannotate clean. This algorithm attempts to find and delete short contigs which are 'repetitive', that is they are already fully represented in a larger contig (≥ 95% sequence similarity and ≥ 95% sequence coverage overlap).

#!/bin/bash
#$ -S /bin/bash
#$ -cwd
#$ -m bea
#$ -M joran.martijn@dal.ca
#$ -pe threaded 1

source activate funannotate

# input
GENOME='ergo_cyp_genome.fasta'

# run funannotate
funannotate clean \
    --input $GENOME \
    --out ${GENOME/fasta/clean.fasta}

cgeb2001's DokuWiki!

Table of Contents

Gene prediction with the Funannotate pipeline

Clean, sort, mask and train

Clean