gene_prediction_with_funannotate
Differences
This shows you the differences between two versions of the page.
| Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
| gene_prediction_with_funannotate [2022/12/20 15:48] – [Predict] 134.190.232.140 | gene_prediction_with_funannotate [2026/02/26 12:11] (current) – 129.173.242.70 | ||
|---|---|---|---|
| Line 1: | Line 1: | ||
| ====== Gene prediction with the Funannotate pipeline ====== | ====== Gene prediction with the Funannotate pipeline ====== | ||
| - | Joran Martijn | + | Created by Joran Martijn |
| + | |||
| + | Updated by Jason Shao on February 26th, 2026 | ||
| Funannotate is a genome prediction, annotation, and comparison software package. It was originally written to annotate fungal genomes (small eukaryotes ~ 30 Mb genomes), but has evolved over time to accommodate larger genomes. | Funannotate is a genome prediction, annotation, and comparison software package. It was originally written to annotate fungal genomes (small eukaryotes ~ 30 Mb genomes), but has evolved over time to accommodate larger genomes. | ||
| - | In my experience it seems to do quite a lot better in predicting gene models than the Braker2 pipeline with // | + | In my experience it seems to do quite a lot better in predicting gene models than the Braker2 pipeline with // |
| An additional advantage is that it has the capacity to prepare all the files necessary for a NCBI GenBank submission. | An additional advantage is that it has the capacity to prepare all the files necessary for a NCBI GenBank submission. | ||
| Line 135: | Line 137: | ||
| NOTE also that this step generates the `funannotate_out` output directory, which can be used as an input argument in future funannotate jobs. | NOTE also that this step generates the `funannotate_out` output directory, which can be used as an input argument in future funannotate jobs. | ||
| + | An esoteric error with funannotate 1.8.17 might happen at the PASA step. In which case, check: | ||
| + | |||
| + | '' | ||
| + | < | ||
| + | ... | ||
| + | CMD: cdna_alignment_orf_to_genome_orf.pl Blastocystis_ST2_pasa.assemblies.fasta.transdecoder.gff3 Blastocystis_ST2_pasa.pasa_assemblies.gff3 Blastocystis_ST2_pasa.assemblies.fasta > Blastocystis_ST2_pasa.assemblies.fasta.transdecoder. | ||
| + | sh: 1: cdna_alignment_orf_to_genome_orf.pl: | ||
| + | Error, cmd: cdna_alignment_orf_to_genome_orf.pl Blastocystis_ST2_pasa.assemblies.fasta.transdecoder.gff3 | ||
| + | </ | ||
| + | |||
| + | However, '' | ||
| + | |||
| + | A simple fix would be to include this '' | ||
| + | < | ||
| + | export PATH=" | ||
| + | </ | ||
| ==== Predict ==== | ==== Predict ==== | ||
| Line 165: | Line 183: | ||
| </ | </ | ||
| - | Many of the required inputs do not have to be explicitly specified, since they have been generated in the previous | + | NOTE: If you are running '' |
| + | |||
| + | To verify the versions of the databases: | ||
| + | < | ||
| + | funannotate database | ||
| + | </ | ||
| + | |||
| + | If for some reason you need to re-install the databases from scratch, you can do so with: | ||
| + | < | ||
| + | funannotate setup -d < | ||
| + | </ | ||
| + | |||
| + | And if you do this on a shared system, you might receive this error: | ||
| + | < | ||
| + | urllib.error.HTTPError: | ||
| + | </ | ||
| + | |||
| + | This is known issue with GO or possibly other database hosts who deny institutional proxies as " | ||
| + | The fix is to make the following modifications to appear to be accessing through a regular browser: | ||
| + | |||
| + | '' | ||
| + | < | ||
| + | 9 from urlib.request import urlopen, Request | ||
| + | ... | ||
| + | 75 req = Request(url, | ||
| + | 76 u = urlopen(req) | ||
| + | </ | ||
| + | Make sure not to use tabs for whitespace. | ||
| + | |||
| + | Many of the required inputs do not have to be explicitly specified, since they have been generated in the previous | ||
| Funannotate (according to log files) uses **AUGUSTUS**, | Funannotate (according to log files) uses **AUGUSTUS**, | ||
| Line 192: | Line 239: | ||
| </ | </ | ||
| - | In some of the final steps, funannotate calls upon ``tRNAscan`` (I'm guessing the SE version) to predict tRNA genes. | + | In some of the final steps, funannotate calls upon '' |
| + | |||
| + | If you intend to curate the gene predictions, | ||
| + | |||
| + | ==== Update ==== | ||
| + | |||
| + | < | ||
| + | # | ||
| + | #$ -S /bin/bash | ||
| + | #$ -cwd | ||
| + | #$ -m bea | ||
| + | #$ -M joran.martijn@dal.ca | ||
| + | #$ -pe threaded 40 | ||
| + | |||
| + | source activate funannotate | ||
| + | |||
| + | # input | ||
| + | FUNDIR=' | ||
| + | THREADS=40 | ||
| + | LOCUS_TAG=' | ||
| + | SBT=' | ||
| + | ACCESSION=' | ||
| + | MEMORY=' | ||
| + | |||
| + | # run funannotate | ||
| + | funannotate update \ | ||
| + | --input $FUNDIR \ | ||
| + | --cpus $THREADS \ | ||
| + | --name $LOCUS_TAG \ | ||
| + | --sbt $SBT \ | ||
| + | --SeqCenter RogerLab \ | ||
| + | --SeqAccession $ACCESSION \ | ||
| + | --no_trimmomatic \ | ||
| + | --memory $MEMORY | ||
| + | </ | ||
| + | |||
| + | '' | ||
| + | |||
| + | It will also attempt to fix certain gene models if they are in strong disagreement with the RNAseq data. | ||
| + | |||
| + | You can also optionally provide the **locus tag**, an **SBT file**, a **WGS accession number** to make your final files ready for GenBank submission. To get these, you need to start a genome submission at the [[https:// | ||
| + | |||
| + | When you are in the Submission Portal, you can start a new submission. You will be asked to fill in several forms, and perhaps create a new BioProject and/or BioSample along the way. It's kind of annoying, but if you intend to publish your genome you'll need to put it on GenBank and this is the most straightforward way to do it. | ||
| + | |||
| + | Shortly after your initial submission you'll receive a **locus tag** and a **WGS accession number**. You can create the **SBT file** by going to // | ||
| + | |||
| + | Now you should be able to run funannotate update! | ||
gene_prediction_with_funannotate.1671565681.txt.gz · Last modified: by 134.190.232.140
