functional_annotation_with_the_funannotate_pipeline

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
functional_annotation_with_the_funannotate_pipeline [2023/04/27 13:45] – [InterProScan] 134.190.232.186functional_annotation_with_the_funannotate_pipeline [2025/12/09 13:02] (current) – [EggNOG mapping] 134.190.190.181
Line 1: Line 1:
 ====== Functional annotation with the Funannotate pipeline ====== ====== Functional annotation with the Funannotate pipeline ======
  
-Joran Martijn (April 2023)+Joran Martijn (April 2023) modified December 2024 by Kathy Dunn
  
 Funannotate is a gene prediction, functional annotation, and genome comparison software package. It was originally written to annotate fungal genomes (small eukaryotes ~ 30 Mb genomes), but has evolved over time to accommodate larger genomes.  Funannotate is a gene prediction, functional annotation, and genome comparison software package. It was originally written to annotate fungal genomes (small eukaryotes ~ 30 Mb genomes), but has evolved over time to accommodate larger genomes. 
Line 12: Line 12:
  
 In my experience it was easiest to do a separate InterProScan search, EggNOG mapping and SignalP prediction, and then pointing to each of these resulting outfiles in the final ''funannotate annotate'' step. In my experience it was easiest to do a separate InterProScan search, EggNOG mapping and SignalP prediction, and then pointing to each of these resulting outfiles in the final ''funannotate annotate'' step.
 +
 +**Important note!!!**
 +
 +Before proceeding you need to check your gff3 file for errors by running ''validate_gene_models_in_gff3.py''. (Activate the gffutils environment to run this script, you may also have to comment out the ''import regex as re'' line if you're not checking Blastocystis genomes.  
 + 
 +This script will locate errors in your gff3 file that often occur due to manual editing (premature stop codons, incorrect exon numbering, missing start and stop codons, start or stop location not matching exon location, etc).  Incorrect phase designation for an exon can lead to premature stop codons so keep that in mind when looking for causes of premature stops.  
 +
 +
 +==== Pre-step to prepare files =====
 +
 +To get the results from InterProScan, EggNOG mapper and SignalP to integrate into funannotate annotate results you need to prepare the gff3 file and protein data using the two funannotate scripts below
 +
 +
 +
 +<code>
 +#!/bin/bash
 +#$ -S /bin/bash
 +#$ -cwd
 +
 +source activate funannotate
 +
 +funannotate util gff-rename \
 +            --gff3 {current.gff3} \
 +            --fasta {genome_file_masked.fasta} \
 +            --locus_tag {NCBI assigned locus tag or similar} \
 +            --out {renamed.gff3}
 +            
 +            
 +funannotate util gff2prot \
 +            --gff3 {renamed.gff3} \
 +            --fasta {genome_file_masked.fasta} \
 +            --no_stop \
 +            > {protein_file.faa}
 +</code>           
 +
 +You will use the protein_file.faa generated in the below InterProScan ''--input'', EggNOG mapping ''-i'',  and SignalP ''--fastafile'' scripts, and the renamed.gff3 file in the funannotate annotate script below (see special note).  
 +
 +If you skip the above steps your results from InterProScan, EggNOG mapping and SignalP will not appear in the final output from funannotate annotate!!
  
 ==== InterProScan ==== ==== InterProScan ====
  
 InterProScan will check which InterPro and/or Pfam domains are present in each of your proteins. InterProScan will check which InterPro and/or Pfam domains are present in each of your proteins.
 +
 +NOTE if your genome required edits such that you did not simply run funannote to get you gene models, you can substitute the funannotate folder (--input test_funannotate_out)  below with the protein coding fasta file generated above (--input protein_file.faa)
  
 Funannotate includes a wrapper for executing the ''interproscan.sh'' script: Funannotate includes a wrapper for executing the ''interproscan.sh'' script:
Line 57: Line 97:
 #$ -cwd #$ -cwd
 #$ -q 256G-batch #$ -q 256G-batch
-#$ -m bea 
-#$ -M joran.martijn@dal.ca 
 #$ -pe threaded 40 #$ -pe threaded 40
  
Line 121: Line 159:
  
 If you have used funannotate to predict your genes, you should have a funannotate directory, where all related output files are stored. It is called upon here as well: If you have used funannotate to predict your genes, you should have a funannotate directory, where all related output files are stored. It is called upon here as well:
 +
 +Special note: If you have not generated your gene models solely from funannotate then rather than point at the funannotate directory you will supply the gff (--gff) and fasta (--fasta) file in replace of --input second example script
 +
 +
  
 <code> <code>
Line 156: Line 198:
  
 </code> </code>
 +
 +If your gene models have not been called solely by funannotate you can use your gff3 and genome.fasta files instead 
 +
 +<code>
 +#!/bin/bash
 +#$ -S /bin/bash
 +#$ -cwd
 +#$ -q 256G-batch
 +#$ -pe threaded 40
 +
 +source activate funannotate
 +
 +## --eggnog asks for the '.annotations' file
 +EGGNOG_RESULTS='functional_annotation/emapper_out/BlastoST2_emapper.emapper.annotations'
 +SIGNALP_RESULTS='functional_annotation/signalp_out/prediction_results.txt'
 +IPRSCAN_RESULTS='functional_annotation/interpro_results/iprscan_results.xml'
 +THREADS=40
 +
 +funannotate annotate \
 +    --gff renamed.gff3 \
 +    --fasta genome_masked.fasta \
 +    --species Blastocystis_ST2 \
 +    --out functional_annotation \
 +    --eggnog $EGGNOG_RESULTS \
 +    --signalp $SIGNALP_RESULTS \
 +    --iprscan $IPRSCAN_RESULTS \
 +    --busco_db eukaryota \
 +    --cpus $THREADS 
 +</code>
 +
functional_annotation_with_the_funannotate_pipeline.1682613957.txt.gz · Last modified: by 134.190.232.186