This is an old revision of the document!
Table of Contents
By Jason Shao & Joran Martijn (Last Edited: October 27th 2024)
Intro
Augustus is an ab initio gene predictor that employs Hidden Markov Models (HMMs) pre-trained on existing datasets. Training a custom HMM model with your own data is possible, but in this basic tutorial, we are only going to consider pre-existing models.
Example Usage
source activate augustus-3.5.0
# only required for species=generic
export AUGUSTUS_CONFIG_PATH="/misc/scratch3/jasons/protist_gene_prediction/software/custom_augustus_config"
augustus \
--species=generic \
<your genome> \
--gff3=on \
--outfile=<outfile name>.gff3
conda deactivate
Note that the species here is set to generic to minimize biases for a divergent organism. If your organism is closely related to one of the pre-trained species below, you can specify that instead to yield a better prediction.
If you decide to use a pre-trained species, then you don't have to include the export line. The reason for including the export line is because augustus had stopped shipping probability files with generic species around 3.3.3. Why have they done that, you might ask? Well, unfortunately it's another great mystery for which science cannot explain.
Pre-trained Species
| Identifier | Species | Major Lineage |
|---|---|---|
| human | Homo sapiens | Opisthokonta (Metazoa) |
| fly | Drosophila melanogaster | Opisthokonta (Metazoa) |
| arabidopsis | Arabidopsis thaliana | Archaeplastida (Plantae) |
| brugia | Brugia malayi | Opisthokonta (Metazoa) |
| aedes | Aedes aegypti | Opisthokonta (Metazoa) |
| tribolium | Tribolium castaneum | Opisthokonta (Metazoa) |
| schistosoma | Schistosoma mansoni | Opisthokonta (Metazoa) |
| tetrahymena | Tetrahymena thermophila | SAR (Alveolata) |
| galdieria | Galdieria sulphuraria | Archaeplastida (Plantae) |
| maize | Zea mays | Archaeplastida (Plantae) |
| toxoplasma | Toxoplasma gondii | SAR (Alveolata) |
| caenorhabditis | Caenorhabditis elegans | Opisthokonta (Metazoa) |
| aspergillus_fumigatus | Aspergillus fumigatus | Opisthokonta (Fungi) |
| aspergillus_nidulans | Aspergillus nidulans | Opisthokonta (Fungi) |
| aspergillus_oryzae | Aspergillus oryzae | Opisthokonta (Fungi) |
| aspergillus_terreus | Aspergillus terreus | Opisthokonta (Fungi) |
| botrytis_cinerea | Botrytis cinerea | Opisthokonta (Fungi) |
| candida_albicans | Candida albicans | Opisthokonta (Fungi) |
| candida_guilliermondii | Candida guilliermondii | Opisthokonta (Fungi) |
| candida_tropicalis | Candida tropicalis | Opisthokonta (Fungi) |
| chaetomium_globosum | Chaetomium globosum | Opisthokonta (Fungi) |
| coccidioides_immitis | Coccidioides immitis | Opisthokonta (Fungi) |
| coprinus | Coprinus cinereus | Opisthokonta (Fungi) |
| coyote_tobacco | Nicotiana attenuata | Archaeplastida (Plantae) |
| cryptococcus_neoformans_gattii | Cryptococcus neoformans gattii | Opisthokonta (Fungi) |
| cryptococcus_neoformans_neoformans_B | Cryptococcus neoformans | Opisthokonta (Fungi) |
| debaryomyces_hansenii | Debaryomyces hansenii | Opisthokonta (Fungi) |
| encephalitozoon_cuniculi_GB | Encephalitozoon cuniculi | Opisthokonta (Fungi) |
| eremothecium_gossypii | Eremothecium gossypii | Opisthokonta (Fungi) |
| fusarium_graminearum | Fusarium graminearum | Opisthokonta (Fungi) |
| histoplasma_capsulatum | Histoplasma capsulatum | Opisthokonta (Fungi) |
| kluyveromyces_lactis | Kluyveromyces lactis | Opisthokonta (Fungi) |
| laccaria_bicolor | Laccaria bicolor | Opisthokonta (Fungi) |
| lamprey | Petromyzon marinus | Opisthokonta (Metazoa) |
| leishmania_tarentolae | Leishmania tarentolae | Excavata |
| lodderomyces_elongisporus | Lodderomyces elongisporus | Opisthokonta (Fungi) |
| magnaporthe_grisea | Magnaporthe grisea | Opisthokonta (Fungi) |
| neurospora_crassa | Neurospora crassa | Opisthokonta (Fungi) |
| phanerochaete_chrysosporium | Phanerochaete chrysosporium | Opisthokonta (Fungi) |
| pichia_stipitis | Pichia stipitis | Opisthokonta (Fungi) |
| rhizopus_oryzae | Rhizopus oryzae | Opisthokonta (Fungi) |
| saccharomyces_cerevisiae_S288C | Saccharomyces cerevisiae | Opisthokonta (Fungi) |
| schizosaccharomyces_pombe | Schizosaccharomyces pombe | Opisthokonta (Fungi) |
| thermoanaerobacter_tengcongensis | Thermoanaerobacter tengcongensis | Bacteria |
| trichinella | Trichinella spiralis | Opisthokonta (Metazoa) |
| ustilago_maydis | Ustilago maydis | Opisthokonta (Fungi) |
| yarrowia_lipolytica | Yarrowia lipolytica | Opisthokonta (Fungi) |
| nasonia | Nasonia vitripennis | Opisthokonta (Metazoa) |
| tomato | Solanum lycopersicum | Archaeplastida (Plantae) |
| chlamydomonas | Chlamydomonas reinhardtii | Archaeplastida |
| amphimedon | Amphimedon queenslandica | Opisthokonta (Metazoa) |
| pneumocystis | Pneumocystis jirovecii | Opisthokonta (Fungi) |
| wheat | Triticum aestivum | Archaeplastida (Plantae) |
| chicken | Gallus gallus | Opisthokonta (Metazoa) |
| zebrafish | Danio rerio | Opisthokonta (Metazoa) |
| E_coli_K12 | Escherichia coli | Bacteria |
| s_aureus | Staphylococcus aureus | Bacteria |
| volvox | Volvox carteri | Archaeplastida |
