functional_annotation_with_the_funannotate_pipeline
Differences
This shows you the differences between two versions of the page.
| Next revision | Previous revision | ||
| functional_annotation_with_the_funannotate_pipeline [2023/04/07 12:46] – created 134.190.232.186 | functional_annotation_with_the_funannotate_pipeline [2025/12/09 13:02] (current) – [EggNOG mapping] 134.190.190.181 | ||
|---|---|---|---|
| Line 1: | Line 1: | ||
| ====== Functional annotation with the Funannotate pipeline ====== | ====== Functional annotation with the Funannotate pipeline ====== | ||
| - | Joran Martijn (April 2023) | + | Joran Martijn (April 2023) modified December 2024 by Kathy Dunn |
| Funannotate is a gene prediction, functional annotation, and genome comparison software package. It was originally written to annotate fungal genomes (small eukaryotes ~ 30 Mb genomes), but has evolved over time to accommodate larger genomes. | Funannotate is a gene prediction, functional annotation, and genome comparison software package. It was originally written to annotate fungal genomes (small eukaryotes ~ 30 Mb genomes), but has evolved over time to accommodate larger genomes. | ||
| Line 12: | Line 12: | ||
| In my experience it was easiest to do a separate InterProScan search, EggNOG mapping and SignalP prediction, and then pointing to each of these resulting outfiles in the final '' | In my experience it was easiest to do a separate InterProScan search, EggNOG mapping and SignalP prediction, and then pointing to each of these resulting outfiles in the final '' | ||
| + | |||
| + | **Important note!!!** | ||
| + | |||
| + | Before proceeding you need to check your gff3 file for errors by running '' | ||
| + | |||
| + | This script will locate errors in your gff3 file that often occur due to manual editing (premature stop codons, incorrect exon numbering, missing start and stop codons, start or stop location not matching exon location, etc). Incorrect phase designation for an exon can lead to premature stop codons so keep that in mind when looking for causes of premature stops. | ||
| + | |||
| + | |||
| + | ==== Pre-step to prepare files ===== | ||
| + | |||
| + | To get the results from InterProScan, | ||
| + | |||
| + | |||
| + | |||
| + | < | ||
| + | #!/bin/bash | ||
| + | #$ -S /bin/bash | ||
| + | #$ -cwd | ||
| + | |||
| + | source activate funannotate | ||
| + | |||
| + | funannotate util gff-rename \ | ||
| + | --gff3 {current.gff3} \ | ||
| + | --fasta {genome_file_masked.fasta} \ | ||
| + | --locus_tag {NCBI assigned locus tag or similar} \ | ||
| + | --out {renamed.gff3} | ||
| + | | ||
| + | | ||
| + | funannotate util gff2prot \ | ||
| + | --gff3 {renamed.gff3} \ | ||
| + | --fasta {genome_file_masked.fasta} \ | ||
| + | --no_stop \ | ||
| + | > {protein_file.faa} | ||
| + | </ | ||
| + | |||
| + | You will use the protein_file.faa generated in the below InterProScan '' | ||
| + | |||
| + | If you skip the above steps your results from InterProScan, | ||
| ==== InterProScan ==== | ==== InterProScan ==== | ||
| + | |||
| + | InterProScan will check which InterPro and/or Pfam domains are present in each of your proteins. | ||
| + | |||
| + | NOTE if your genome required edits such that you did not simply run funannote to get you gene models, you can substitute the funannotate folder (--input test_funannotate_out) | ||
| + | |||
| + | Funannotate includes a wrapper for executing the '' | ||
| < | < | ||
| Line 41: | Line 85: | ||
| </ | </ | ||
| + | NOTE: Your '' | ||
| + | |||
| + | NOTE: When you specify your '' | ||
| ==== EggNOG mapping ==== | ==== EggNOG mapping ==== | ||
| + | |||
| + | The eggnog mapping algorithm ('' | ||
| < | < | ||
| Line 48: | Line 97: | ||
| #$ -cwd | #$ -cwd | ||
| #$ -q 256G-batch | #$ -q 256G-batch | ||
| - | #$ -m bea | ||
| - | #$ -M joran.martijn@dal.ca | ||
| #$ -pe threaded 40 | #$ -pe threaded 40 | ||
| Line 77: | Line 124: | ||
| Instead of '' | Instead of '' | ||
| + | |||
| + | ==== SignalP ==== | ||
| + | |||
| + | The code below runs SignalP v6.0, which use sophisticated methods to find proteins that have N-terminal signal peptides, and are thus destined to be located to endoplasmatic reticulum or secreted outside the cell. | ||
| + | |||
| + | < | ||
| + | #!/bin/bash | ||
| + | #$ -S /bin/bash | ||
| + | #$ -cwd | ||
| + | #$ -q 256G-batch | ||
| + | #$ -m bea | ||
| + | #$ -M joran.martijn@dal.ca | ||
| + | #$ -pe threaded 8 | ||
| + | |||
| + | source activate signalp6-fast | ||
| + | |||
| + | PROTEINS=' | ||
| + | OUTDIR=' | ||
| + | THREADS=8 | ||
| + | |||
| + | # run signalp6 | ||
| + | signalp6 \ | ||
| + | --fastafile $PROTEINS \ | ||
| + | --output_dir $OUTDIR \ | ||
| + | --format all \ | ||
| + | --organism eukarya \ | ||
| + | --mode fast \ | ||
| + | --write_procs $THREADS | ||
| + | </ | ||
| + | |||
| + | ==== Funannotate annotate ==== | ||
| + | |||
| + | Finally, we can integrate our above analyses with other annotate steps with the single funannotate command below: | ||
| + | |||
| + | If you have used funannotate to predict your genes, you should have a funannotate directory, where all related output files are stored. It is called upon here as well: | ||
| + | |||
| + | Special note: If you have not generated your gene models solely from funannotate then rather than point at the funannotate directory you will supply the gff (--gff) and fasta (--fasta) file in replace of --input second example script | ||
| + | |||
| + | |||
| + | |||
| + | < | ||
| + | #!/bin/bash | ||
| + | #$ -S /bin/bash | ||
| + | #$ -cwd | ||
| + | #$ -q 256G-batch | ||
| + | #$ -m bea | ||
| + | #$ -M joran.martijn@dal.ca | ||
| + | #$ -pe threaded 40 | ||
| + | |||
| + | source activate funannotate | ||
| + | |||
| + | FUN_DIR=' | ||
| + | |||
| + | ## --eggnog asks for the ' | ||
| + | EGGNOG_RESULTS=' | ||
| + | |||
| + | SIGNALP_RESULTS=' | ||
| + | IPRSCAN_RESULTS=' | ||
| + | SBT_FILE=' | ||
| + | |||
| + | THREADS=40 | ||
| + | |||
| + | # run funannotate | ||
| + | |||
| + | funannotate annotate \ | ||
| + | --input $FUN_DIR \ | ||
| + | --eggnog $EGGNOG_RESULTS \ | ||
| + | --signalp $SIGNALP_RESULTS \ | ||
| + | --iprscan $IPRSCAN_RESULTS \ | ||
| + | --busco_db eukaryota \ | ||
| + | --cpus $THREADS \ | ||
| + | --sbt $SBT_FILE | ||
| + | |||
| + | </ | ||
| + | |||
| + | If your gene models have not been called solely by funannotate you can use your gff3 and genome.fasta files instead | ||
| + | |||
| + | < | ||
| + | #!/bin/bash | ||
| + | #$ -S /bin/bash | ||
| + | #$ -cwd | ||
| + | #$ -q 256G-batch | ||
| + | #$ -pe threaded 40 | ||
| + | |||
| + | source activate funannotate | ||
| + | |||
| + | ## --eggnog asks for the ' | ||
| + | EGGNOG_RESULTS=' | ||
| + | SIGNALP_RESULTS=' | ||
| + | IPRSCAN_RESULTS=' | ||
| + | THREADS=40 | ||
| + | |||
| + | funannotate annotate \ | ||
| + | --gff renamed.gff3 \ | ||
| + | --fasta genome_masked.fasta \ | ||
| + | --species Blastocystis_ST2 \ | ||
| + | --out functional_annotation \ | ||
| + | --eggnog $EGGNOG_RESULTS \ | ||
| + | --signalp $SIGNALP_RESULTS \ | ||
| + | --iprscan $IPRSCAN_RESULTS \ | ||
| + | --busco_db eukaryota \ | ||
| + | --cpus $THREADS | ||
| + | </ | ||
| + | |||
functional_annotation_with_the_funannotate_pipeline.1680882376.txt.gz · Last modified: by 134.190.232.186
