User Tools

Site Tools


gene_prediction_framework

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
gene_prediction_framework [2022/12/06 10:48] 134.190.232.140gene_prediction_framework [2022/12/13 15:49] (current) jason
Line 1: Line 1:
 +**This page is now deprecated, check out** [[gene_prediction_with_braker2_pipeline|Gene prediction with Braker2 pipeline]]
 +
 +===== Gene prediction with Braker2 pipeline =====
 +
 GP using machine learning and extrinsic hints by **DE Salas-Leiva** (last updated Oct-21-2020)\\ GP using machine learning and extrinsic hints by **DE Salas-Leiva** (last updated Oct-21-2020)\\
 Updated by Joran Martijn in July 2022 Updated by Joran Martijn in July 2022
Line 22: Line 26:
    mv gm_key_64 .gm_key    mv gm_key_64 .gm_key
  
-===== Repeat masking =====+==== Repeat masking ====
  
 Mask the repetitive regions in your assembly using the following shell script. BuildDatabase and RepeatModeler will create a species-specific library of repeats from your genome, and then RepeatMasker will use that library to mask repetitive regions in your assembly. Mask the repetitive regions in your assembly using the following shell script. BuildDatabase and RepeatModeler will create a species-specific library of repeats from your genome, and then RepeatMasker will use that library to mask repetitive regions in your assembly.
Line 69: Line 73:
 </code> </code>
  
-===== RNAseq mapping =====+==== RNAseq mapping ====
  
  
Line 95: Line 99:
 </code>   </code>  
          
-===== Genome-guided transcriptome assembly =====+==== Genome-guided transcriptome assembly ====
  
  
Line 120: Line 124:
  
  
-===== Gene prediction with Braker2 =====+==== Braker2 ====
  
  
Line 150: Line 154:
  
  
-===== Predicting gene models with PASA =====+==== Predicting gene models with PASA ==== 
 + 
 +PASA will use the genome-guided transcriptome assembly to estimate where gene models are located. It does this by aligning the assembled transcripts to the reference genome. 
 + 
 +You need to specify your pasa config file. Below an example: 
 + 
 +<code> 
 +## templated variables to be replaced exist as <__var_name__> 
 + 
 +# database settings 
 +## pasa will create an sqlite database in the location desired below 
 +DATABASE=/scratch3/jmartijn/ergo-genome/results/24_pasa-2.5.2/ergobibamus.sqlite 
 + 
 +####################################################### 
 +# Parameters to specify to specific scripts in pipeline 
 +# create a key = "script_name" + ":" + "parameter" 
 +# assign a value as done above. 
 + 
 +#script validate_alignments_in_db.dbi 
 +validate_alignments_in_db.dbi:--MIN_PERCENT_ALIGNED=75 
 +validate_alignments_in_db.dbi:--MIN_AVG_PER_ID=95 
 +validate_alignments_in_db.dbi:--NUM_BP_PERFECT_SPLICE_BOUNDARY=0 
 + 
 +#script subcluster_builder.dbi 
 +subcluster_builder.dbi:-m=50 
 +</code> 
 + 
 +<code> 
 +#!/bin/bash                                                                                                                          
 +#$ -S /bin/bash 
 +#$ -cwd 
 +#$ -pe threaded 20 
 + 
 +# input 
 +CONFIG='pasa.config' 
 +GENOME='ergo_cyp_genome.fasta.masked' 
 +TRANSCRIPTOME='Trinity-GG.fasta' 
 +THREADS=20 
 + 
 +source activate pasa-2.5.2 
 + 
 +# run pasa 
 +Launch_PASA_pipeline.pl \ 
 +        --create --run \ 
 +        -c $CONFIG \ 
 +        -g $GENOME \ 
 +        -t $TRANSCRIPTOME \ 
 +        --transcribed_is_aligned_orient \ 
 +        --ALIGNERS blat,gmap,minimap2 \ 
 +        --CPU $THREADS 
 + 
 +conda deactivate 
 +</code>
  
 +==== Compiling the final gene models with EvidenceModeler ====
  
gene_prediction_framework.1670338086.txt.gz · Last modified: by 134.190.232.140