Differences

This shows you the differences between two versions of the page.

--- gene_prediction_framework [2022/12/06 10:48] – 134.190.232.140
+++ gene_prediction_framework [2022/12/13 15:49] (current) – jason
@@ Line 1: / Line 1: @@
+**This page is now deprecated, check out** [[gene_prediction_with_braker2_pipeline|Gene prediction with Braker2 pipeline]]
+===== Gene prediction with Braker2 pipeline =====
 GP using machine learning and extrinsic hints by **DE Salas-Leiva** (last updated Oct-21-2020)\\
 Updated by Joran Martijn in July 2022
@@ Line 22: / Line 26: @@
    mv gm_key_64 .gm_key
-===== Repeat masking =====
+==== Repeat masking ====
 Mask the repetitive regions in your assembly using the following shell script. BuildDatabase and RepeatModeler will create a species-specific library of repeats from your genome, and then RepeatMasker will use that library to mask repetitive regions in your assembly.
@@ Line 69: / Line 73: @@
 </code>
-===== RNAseq mapping =====
+==== RNAseq mapping ====
@@ Line 95: / Line 99: @@
 </code>
-===== Genome-guided transcriptome assembly =====
+==== Genome-guided transcriptome assembly ====
@@ Line 120: / Line 124: @@
-===== Gene prediction with Braker2 =====
+==== Braker2 ====
@@ Line 150: / Line 154: @@
-===== Predicting gene models with PASA =====
+==== Predicting gene models with PASA ====
+PASA will use the genome-guided transcriptome assembly to estimate where gene models are located. It does this by aligning the assembled transcripts to the reference genome.
+You need to specify your pasa config file. Below an example:
+<code>
+## templated variables to be replaced exist as <__var_name__>
+# database settings
+## pasa will create an sqlite database in the location desired below
+DATABASE=/scratch3/jmartijn/ergo-genome/results/24_pasa-2.5.2/ergobibamus.sqlite
+#######################################################
+# Parameters to specify to specific scripts in pipeline
+# create a key = "script_name" + ":" + "parameter"
+# assign a value as done above.
+#script validate_alignments_in_db.dbi
+validate_alignments_in_db.dbi:--MIN_PERCENT_ALIGNED=75
+validate_alignments_in_db.dbi:--MIN_AVG_PER_ID=95
+validate_alignments_in_db.dbi:--NUM_BP_PERFECT_SPLICE_BOUNDARY=0
+#script subcluster_builder.dbi
+subcluster_builder.dbi:-m=50
+</code>
+<code>
+#!/bin/bash
+#$ -S /bin/bash
+#$ -cwd
+#$ -pe threaded 20
+# input
+CONFIG='pasa.config'
+GENOME='ergo_cyp_genome.fasta.masked'
+TRANSCRIPTOME='Trinity-GG.fasta'
+THREADS=20
+source activate pasa-2.5.2
+# run pasa
+Launch_PASA_pipeline.pl \
+        --create --run \
+        -c $CONFIG \
+        -g $GENOME \
+        -t $TRANSCRIPTOME \
+        --transcribed_is_aligned_orient \
+        --ALIGNERS blat,gmap,minimap2 \
+        --CPU $THREADS
+conda deactivate
+</code>
+==== Compiling the final gene models with EvidenceModeler ====