gene_prediction_with_braker2_pipeline
Differences
This shows you the differences between two versions of the page.
| Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
| gene_prediction_with_braker2_pipeline [2024/10/28 10:05] – 134.190.221.230 | gene_prediction_with_braker2_pipeline [2025/11/18 14:23] (current) – [Genome-guided transcriptome assembly] 134.190.191.148 | ||
|---|---|---|---|
| Line 1: | Line 1: | ||
| - | ====== Gene prediction with Braker2 ====== | + | ====== Gene prediction with the Braker2 |
| GP using machine learning and extrinsic hints by **DE Salas-Leiva** (last updated Oct-21-2020)\\ | GP using machine learning and extrinsic hints by **DE Salas-Leiva** (last updated Oct-21-2020)\\ | ||
| Line 25: | Line 25: | ||
| ===== Repeat masking ===== | ===== Repeat masking ===== | ||
| + | |||
| + | From the BRAKER1 paper: | ||
| + | |||
| + | " | ||
| Some repetitive elements in your genome may per-chance look like ORFs or even protein coding genes. The main purpose of masking these repeats is to prevent your gene predictor from even looking at these regions, so they will not predict any false positive genes there. | Some repetitive elements in your genome may per-chance look like ORFs or even protein coding genes. The main purpose of masking these repeats is to prevent your gene predictor from even looking at these regions, so they will not predict any false positive genes there. | ||
| Line 126: | Line 130: | ||
| #$ -cwd | #$ -cwd | ||
| #$ -pe threaded 10 | #$ -pe threaded 10 | ||
| + | |||
| cd $PWD | cd $PWD | ||
| + | |||
| source activate trinity-2.11-with-workaround | source activate trinity-2.11-with-workaround | ||
| - | Trinity --CPU 10 --max_memory 100G --genome_guided_bam yourgenome.fasta.sambamsorted.bam --genome_guided_max_intron 1000 --SS_lib_type RF | + | |
| + | Trinity | ||
| + | | ||
| + | | ||
| + | | ||
| + | | ||
| + | | ||
| conda deactivate | conda deactivate | ||
| </ | </ | ||
| Line 143: | Line 156: | ||
| ===== Braker2 ===== | ===== Braker2 ===== | ||
| - | [[https:// | + | [[https:// |
| + | |||
| + | - Intron start and end coordinates (//intron hints//) are extracted from the RNAseq BAM file | ||
| + | - These are then used along with the genome FASTA file to train GeneMarkET | ||
| + | - The trained GeneMarkET performs an "//ab initio//" | ||
| + | - Those predicted gene structures for which all introns are supported by the RNAseq data (//anchored introns//) are selected to train AUGUSTUS | ||
| + | - The trained AUGUSTUS now predicts gene structures using again the intron hints as " | ||
| + | |||
| + | {{:: | ||
| + | |||
| + | The intron hints are extracted using a the '' | ||
| + | |||
| + | If you only use RNAseq as extrinsic evidence, you essentially can only use //donor splice site// and //acceptor splice site// hints. If you also have protein homology information, | ||
| + | |||
| + | The intron hints contain explicit location information and influence | ||
| Predict genes using Genemark-ET and Augustus through braker2: | Predict genes using Genemark-ET and Augustus through braker2: | ||
gene_prediction_with_braker2_pipeline.1730120748.txt.gz · Last modified: by 134.190.221.230
