This is an old revision of the document!
Table of Contents
Functional annotation with the Funannotate pipeline
Joran Martijn (April 2023)
Funannotate is a gene prediction, functional annotation, and genome comparison software package. It was originally written to annotate fungal genomes (small eukaryotes ~ 30 Mb genomes), but has evolved over time to accommodate larger genomes.
For doing gene prediction with Funannotate, check this other Wiki entry I wrote.
After finishing up your gene predictions, you may want to figure out what these genes actually encode. This is where functional annotation comes into play. Essentially, functional annotation entails a number of similarity searches (BLAST, HMMer) to various databases (SwissProt, Interpro, Pfam, EggNOG, NCBI Refseq Genomes / NR, etc) and protein sequence analyses to predict particular properties (e.g. SignalP, TMHMM, Phobius, antiSMASH etc) as well as other kinds of analyses (GO ontology etc), to best guess how each of our predicted proteins functions in the cell.
Funannotate wraps up lots of these sequence analyses neatly in a few commands. It also produces GenBank, Sequin and other files that allow you to easily submit your annotations to GenBank.
In my experience it was easiest to do a separate InterProScan search, EggNOG mapping and SignalP prediction, and then pointing to each of these resulting outfiles in the final funannotate annotate step.
InterProScan
#!/bin/bash
#$ -S /bin/bash
#$ -cwd
#$ -q 256G-batch
#$ -m bea
#$ -M joran.martijn@dal.ca
#$ -pe threaded 40
source activate funannotate
IPR_PATH='/scratch2/software/interproscan-5.52-86.0/interproscan.sh'
THREADS=40
OUTXML='results/36_interproscan-5.52-86.0/iprscan.xml'
# run funannotate
## ensure that you point to an interproscan installation
## outside of any conda environment
funannotate iprscan \
--input test_funannotate_out \
--method local \
--iprscan_path $IPR_PATH \
--cpus $THREADS \
--out $OUTXML
EggNOG mapping
#!/bin/bash
#$ -S /bin/bash
#$ -cwd
#$ -q 256G-batch
#$ -m bea
#$ -M joran.martijn@dal.ca
#$ -pe threaded 40
source activate eggnog-mapper-2.1.4
# input
EGGNOG_DB='/scratch4/db/eggnog-mapper-2.1.4'
PROTEINS='Ergobibamus_cyprinoides_CL.proteins.fa'
THREADS=40
OUTPUT_DIR='emapper_out'
PREFIX='ergo_emapper'
# the mapper doesn't create an outdir
# automatically? create one here
[[ ! -d "$OUTPUT_DIR" ]] && mkdir -p "$OUTPUT_DIR"
# run eggnog mapper
emapper.py \
-i $PROTEINS \
--data_dir $EGGNOG_DB \
--output_dir $OUTPUT_DIR \
--output $PREFIX \
--cpu $THREADS \
-m diamond \
--dbmem
Instead of -m diamond you may want to consider using -m hmmer, which should be a bit more sensitive.
