User Tools

Site Tools


gene_prediction_just_genemark

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
gene_prediction_just_genemark [2023/02/23 12:57] 134.190.232.186gene_prediction_just_genemark [2026/02/26 11:53] (current) 129.173.242.70
Line 1: Line 1:
 ====== Gene prediction with just GeneMark ====== ====== Gene prediction with just GeneMark ======
  
-Joran Martijn (January 2023)+Created by Joran Martijn in 2023
 + 
 +Updated by Jason Shao on February 26th, 2026.
  
 **GeneMark** is one of oldest gene prediction tools still in development, with papers describing the first algorithms as early as [[http://exon.gatech.edu/Genemark/PDF/Statistical_Patterns_in_Primary___Article.pdf|1986]], [[https://www.sciencedirect.com/science/article/pii/030326479390068N|1993 (1)]], [[https://www.sciencedirect.com/science/article/pii/009784859385004V|1993 (2)]] and [[https://academic.oup.com/nar/article/26/4/1107/2902172?login=true|1998]]. The latest update (as of January 2023, GeneMark-EP+) has been published in [[https://academic.oup.com/nargab/article/2/2/lqaa026/5836691?login=true|2022]]. **GeneMark** is one of oldest gene prediction tools still in development, with papers describing the first algorithms as early as [[http://exon.gatech.edu/Genemark/PDF/Statistical_Patterns_in_Primary___Article.pdf|1986]], [[https://www.sciencedirect.com/science/article/pii/030326479390068N|1993 (1)]], [[https://www.sciencedirect.com/science/article/pii/009784859385004V|1993 (2)]] and [[https://academic.oup.com/nar/article/26/4/1107/2902172?login=true|1998]]. The latest update (as of January 2023, GeneMark-EP+) has been published in [[https://academic.oup.com/nargab/article/2/2/lqaa026/5836691?login=true|2022]].
Line 40: Line 42:
  
 This is perhaps the most straightforward and pure //ab initio// gene prediction tool. Only the genome FASTA file is provided, and the algorithm will do its best without any external sources of evidence or training input (hence Self-training), to predict the gene start and end locations, including possible introns. This is perhaps the most straightforward and pure //ab initio// gene prediction tool. Only the genome FASTA file is provided, and the algorithm will do its best without any external sources of evidence or training input (hence Self-training), to predict the gene start and end locations, including possible introns.
 +
 +Create a conda environment for GeneMark-ES:
 +
 +<code>
 +conda create -n genemark-es perl perl-mce perl-yaml perl-hash-merge perl-parallel-forkmanager
 +</code>
 +
 +Running GeneMark-ES:
  
 <code> <code>
 +source activate genemark-es
 gmes_petap.pl --sequence <genome.fasta> --ES gmes_petap.pl --sequence <genome.fasta> --ES
 </code> </code>
Line 72: Line 83:
 NOTE that ''join_multiple_hints.pl'' doesn't really do anything if you only provide with a single hints file. NOTE that ''join_multiple_hints.pl'' doesn't really do anything if you only provide with a single hints file.
  
 +==== Running GeneMark on perun ====
  
 +There is no working environment on perun dedicated to GeneMark as far as I know, but braker2 calls GeneMark so the braker2 environment has all the necessary dependencies for running GeneMark as well
 +
 +<code>
 +#!/bin/bash
 +#$ -S /bin/bash
 +#$ -cwd
 +#$ -m bea
 +#$ -pe threaded 20
 +
 +source activate braker2
 +
 +# add gmes_petap.pl installation location to the $PATH
 +export PATH="/scratch2/software/gmes_linux_64-aug-2020/:$PATH"
 +
 +# input
 +ORIGINAL_GENOME='ergo_cyp_genome.fasta.masked'
 +RNASEQ='rnaseq_vs_masked_ergo_cyp_genome.sort.bam'
 +THREADS=20
 +
 +# if you have no transcriptome data and you just want to do ab initio gene prediction
 +gmes_petap.pl --sequence $ORIGINAL_GENOME --ES --cores=$THREADS
 +
 +# if you have a fungal like genome, use genemark-ES with --fungus
 +gmes_petap.pl --sequence $ORIGINAL_GENOME --ES --cores=$THREADS --fungus
 +
 +# if you have transcriptome data, use genemark-ET
 +## get hints from rnaseq alignment bam file
 +bam2hints --intronsonly --minintronlen 20 --in=$RNASEQ --out=intron_hints.gff
 +## process hints
 +cat intron_hints.gff | sort -n -k4,4 | sort -s -n -k5,5 | sort -s -n -k3,3 | sort -s -k1,1 > intron_hints.sort.gff
 +join_multiple_hints.pl < intron_hints.sort.gff > hintsfile.tmp.gff
 +filterIntronsFindStrand.pl <genome.fasta> hintsfile.tmp.gff --score > hintsfile.gff
 +## run GeneMark-ET
 +gmes_petap.pl --verbose --sequence=$ORGINAL_GENOME --ET=hintsfile.gff --et_score 10 --cores=2
 +
 +</code>
gene_prediction_just_genemark.1677171427.txt.gz · Last modified: by 134.190.232.186