gene_prediction_just_genemark
Differences
This shows you the differences between two versions of the page.
| Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
| gene_prediction_just_genemark [2023/01/09 16:09] – 134.190.232.140 | gene_prediction_just_genemark [2026/02/26 11:53] (current) – 129.173.242.70 | ||
|---|---|---|---|
| Line 1: | Line 1: | ||
| ====== Gene prediction with just GeneMark ====== | ====== Gene prediction with just GeneMark ====== | ||
| - | Joran Martijn | + | Created by Joran Martijn |
| - | **GeneMark** is one of oldest gene prediction tools still in development, | + | Updated by Jason Shao on February 26th, 2026. |
| + | |||
| + | **GeneMark** is one of oldest gene prediction tools still in development, | ||
| + | |||
| + | GeneMark was originally developed for prokaryotes but has since been extended to work with eukaryotes as well. We are currently on the 4th Eukaryotic version. The first two versions (Lukashin and Borodovsky, unpublished) and (Tarasenko and Borodovsky, unpublished) have been cited by the third as unpublished data. The third version is published [[https:// | ||
| GeneMark is maintained and developed by Mark Borodovsky' | GeneMark is maintained and developed by Mark Borodovsky' | ||
| Line 10: | Line 14: | ||
| Unfortunately the GeneMark tools are not distributed in CONDA repositories, | Unfortunately the GeneMark tools are not distributed in CONDA repositories, | ||
| + | |||
| + | < | ||
| + | # use your browser to download the license key relevant to your system | ||
| + | |||
| + | # unpack and rename the file | ||
| + | gunzip gm_key_64.gz | ||
| + | mv gm_key_64 .gm_key | ||
| + | </ | ||
| The set of gene prediction algorithms relevant to Eukaryotes are collected in the GeneMark-ES suite, where E stands for Eukaryotic and S for Self-training. The suite contains GeneMark.hmm, | The set of gene prediction algorithms relevant to Eukaryotes are collected in the GeneMark-ES suite, where E stands for Eukaryotic and S for Self-training. The suite contains GeneMark.hmm, | ||
| Line 29: | Line 41: | ||
| ==== GeneMark-ES ==== | ==== GeneMark-ES ==== | ||
| - | This is perhaps the most straightforward and pure //ab initio// gene prediction tool. Only the genome FASTA file is provided, and the algorithm will do its best without any external sources of evidence, to predict the gene start and end locations, including possible introns. | + | This is perhaps the most straightforward and pure //ab initio// gene prediction tool. Only the genome FASTA file is provided, and the algorithm will do its best without any external sources of evidence |
| + | |||
| + | Create a conda environment for GeneMark-ES: | ||
| + | |||
| + | < | ||
| + | conda create -n genemark-es perl perl-mce perl-yaml perl-hash-merge perl-parallel-forkmanager | ||
| + | </ | ||
| + | |||
| + | Running GeneMark-ES: | ||
| < | < | ||
| + | source activate genemark-es | ||
| gmes_petap.pl --sequence < | gmes_petap.pl --sequence < | ||
| </ | </ | ||
| Line 41: | Line 62: | ||
| </ | </ | ||
| + | ==== GeneMark-ET ==== | ||
| + | This algorithm takes " | ||
| + | [[https:// | ||
| + | |||
| + | < | ||
| + | # get hints from rnaseq alignment bam file | ||
| + | bam2hints --intronsonly --minintronlen 20 --in=rnaseq_vs_genome.sort.bam --out=intron_hints.gff | ||
| + | |||
| + | # process hints | ||
| + | cat intron_hints.gff | sort -n -k4,4 | sort -s -n -k5,5 | sort -s -n -k3,3 | sort -s -k1,1 > intron_hints.sort.gff | ||
| + | join_multiple_hints.pl < intron_hints.sort.gff > hintsfile.tmp.gff | ||
| + | filterIntronsFindStrand.pl < | ||
| + | |||
| + | # run GeneMark-ET | ||
| + | gmes_petap.pl --verbose --sequence=< | ||
| + | </ | ||
| + | |||
| + | NOTE that '' | ||
| + | |||
| + | ==== Running GeneMark on perun ==== | ||
| + | |||
| + | There is no working environment on perun dedicated to GeneMark as far as I know, but braker2 calls GeneMark so the braker2 environment has all the necessary dependencies for running GeneMark as well | ||
| + | |||
| + | < | ||
| + | #!/bin/bash | ||
| + | #$ -S /bin/bash | ||
| + | #$ -cwd | ||
| + | #$ -m bea | ||
| + | #$ -pe threaded 20 | ||
| + | |||
| + | source activate braker2 | ||
| + | |||
| + | # add gmes_petap.pl installation location to the $PATH | ||
| + | export PATH="/ | ||
| + | |||
| + | # input | ||
| + | ORIGINAL_GENOME=' | ||
| + | RNASEQ=' | ||
| + | THREADS=20 | ||
| + | |||
| + | # if you have no transcriptome data and you just want to do ab initio gene prediction | ||
| + | gmes_petap.pl --sequence $ORIGINAL_GENOME --ES --cores=$THREADS | ||
| + | |||
| + | # if you have a fungal like genome, use genemark-ES with --fungus | ||
| + | gmes_petap.pl --sequence $ORIGINAL_GENOME --ES --cores=$THREADS --fungus | ||
| + | |||
| + | # if you have transcriptome data, use genemark-ET | ||
| + | ## get hints from rnaseq alignment bam file | ||
| + | bam2hints --intronsonly --minintronlen 20 --in=$RNASEQ --out=intron_hints.gff | ||
| + | ## process hints | ||
| + | cat intron_hints.gff | sort -n -k4,4 | sort -s -n -k5,5 | sort -s -n -k3,3 | sort -s -k1,1 > intron_hints.sort.gff | ||
| + | join_multiple_hints.pl < intron_hints.sort.gff > hintsfile.tmp.gff | ||
| + | filterIntronsFindStrand.pl < | ||
| + | ## run GeneMark-ET | ||
| + | gmes_petap.pl --verbose --sequence=$ORGINAL_GENOME --ET=hintsfile.gff --et_score 10 --cores=2 | ||
| + | |||
| + | </ | ||
gene_prediction_just_genemark.1673294980.txt.gz · Last modified: by 134.190.232.140
