User Tools

Site Tools


gene_prediction_with_braker2_pipeline

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
gene_prediction_with_braker2_pipeline [2025/11/18 12:04] – [Braker2] 134.190.191.148gene_prediction_with_braker2_pipeline [2025/11/18 14:23] (current) – [Genome-guided transcriptome assembly] 134.190.191.148
Line 130: Line 130:
 #$ -cwd #$ -cwd
 #$ -pe threaded 10 #$ -pe threaded 10
 +
 cd $PWD cd $PWD
 +
 source activate trinity-2.11-with-workaround source activate trinity-2.11-with-workaround
-Trinity --CPU 10 --max_memory 100G --genome_guided_bam yourgenome.fasta.sambamsorted.bam --genome_guided_max_intron 1000 --SS_lib_type RF+ 
 +Trinity 
 +    --CPU 10 
 +    --max_memory 100G 
 +    --genome_guided_bam yourgenome.fasta.sambamsorted.bam 
 +    --genome_guided_max_intron 1000 
 +    --SS_lib_type RF 
 conda deactivate conda deactivate
 </code> </code>
Line 147: Line 156:
 ===== Braker2 ===== ===== Braker2 =====
  
-[[https://github.com/Gaius-Augustus/BRAKER|Braker]] is a fully automated pipeline that makes use of the ab initio gene predictor GeneMark, RNAseq data mapping, and using the data of those two to train the machine learning algorithm of AUGUSTUS, which then promptly does a final round of gene prediction. Or something like that..+[[https://github.com/Gaius-Augustus/BRAKER|Braker]] is a fully automated pipeline in which 
 + 
 +  - Intron start and end coordinates (//intron hints//) are extracted from the RNAseq BAM file 
 +  - These are then used along with the genome FASTA file to train GeneMarkET 
 +  - The trained GeneMarkET performs an "//ab initio//" gene prediction 
 +  - Those predicted gene structures for which all introns are supported by the RNAseq data (//anchored introns//) are selected to train AUGUSTUS 
 +  - The trained AUGUSTUS now predicts gene structures using again the intron hints as "extrinsic evidence"
  
 {{::braker1_pipeline.png|}} {{::braker1_pipeline.png|}}
 +
 +The intron hints are extracted using a the ''bam2hints'' tool, with flag ''--intronsonly'', which comes with AUGUSTUS and BRAKER tools.
 +
 +If you only use RNAseq as extrinsic evidence, you essentially can only use //donor splice site// and //acceptor splice site// hints. If you also have protein homology information, you can also infer and use //start//, //stop//, //exonpart// and //exon// hints (Stanke et al 2006)
 +
 +The intron hints contain explicit location information and influence the optimal path through the GHMM machine (AUGUSTUS+ paper, Stanke et al 2006). It is important to note that since this is a probabilistic model, **hints can sometimes be ignored if the intrinsic information is strong enough!**
  
 Predict genes using Genemark-ET and Augustus through braker2: Predict genes using Genemark-ET and Augustus through braker2:
gene_prediction_with_braker2_pipeline.1763481859.txt.gz · Last modified: by 134.190.191.148