gene_prediction_framework
Differences
This shows you the differences between two versions of the page.
| Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
| gene_prediction_framework [2022/12/06 10:48] – 134.190.232.140 | gene_prediction_framework [2022/12/13 15:49] (current) – jason | ||
|---|---|---|---|
| Line 1: | Line 1: | ||
| + | **This page is now deprecated, check out** [[gene_prediction_with_braker2_pipeline|Gene prediction with Braker2 pipeline]] | ||
| + | |||
| + | ===== Gene prediction with Braker2 pipeline ===== | ||
| + | |||
| GP using machine learning and extrinsic hints by **DE Salas-Leiva** (last updated Oct-21-2020)\\ | GP using machine learning and extrinsic hints by **DE Salas-Leiva** (last updated Oct-21-2020)\\ | ||
| Updated by Joran Martijn in July 2022 | Updated by Joran Martijn in July 2022 | ||
| Line 22: | Line 26: | ||
| mv gm_key_64 .gm_key | mv gm_key_64 .gm_key | ||
| - | ===== Repeat masking | + | ==== Repeat masking ==== |
| Mask the repetitive regions in your assembly using the following shell script. BuildDatabase and RepeatModeler will create a species-specific library of repeats from your genome, and then RepeatMasker will use that library to mask repetitive regions in your assembly. | Mask the repetitive regions in your assembly using the following shell script. BuildDatabase and RepeatModeler will create a species-specific library of repeats from your genome, and then RepeatMasker will use that library to mask repetitive regions in your assembly. | ||
| Line 69: | Line 73: | ||
| </ | </ | ||
| - | ===== RNAseq mapping | + | ==== RNAseq mapping ==== |
| Line 95: | Line 99: | ||
| </ | </ | ||
| | | ||
| - | ===== Genome-guided transcriptome assembly | + | ==== Genome-guided transcriptome assembly ==== |
| Line 120: | Line 124: | ||
| - | ===== Gene prediction with Braker2 | + | ==== Braker2 ==== |
| Line 150: | Line 154: | ||
| - | ===== Predicting gene models with PASA ===== | + | ==== Predicting gene models with PASA ==== |
| + | |||
| + | PASA will use the genome-guided transcriptome assembly to estimate where gene models are located. It does this by aligning the assembled transcripts to the reference genome. | ||
| + | |||
| + | You need to specify your pasa config file. Below an example: | ||
| + | |||
| + | < | ||
| + | ## templated variables to be replaced exist as < | ||
| + | |||
| + | # database settings | ||
| + | ## pasa will create an sqlite database in the location desired below | ||
| + | DATABASE=/ | ||
| + | |||
| + | ####################################################### | ||
| + | # Parameters to specify to specific scripts in pipeline | ||
| + | # create a key = " | ||
| + | # assign a value as done above. | ||
| + | |||
| + | #script validate_alignments_in_db.dbi | ||
| + | validate_alignments_in_db.dbi: | ||
| + | validate_alignments_in_db.dbi: | ||
| + | validate_alignments_in_db.dbi: | ||
| + | |||
| + | #script subcluster_builder.dbi | ||
| + | subcluster_builder.dbi: | ||
| + | </ | ||
| + | |||
| + | < | ||
| + | # | ||
| + | #$ -S /bin/bash | ||
| + | #$ -cwd | ||
| + | #$ -pe threaded 20 | ||
| + | |||
| + | # input | ||
| + | CONFIG=' | ||
| + | GENOME=' | ||
| + | TRANSCRIPTOME=' | ||
| + | THREADS=20 | ||
| + | |||
| + | source activate pasa-2.5.2 | ||
| + | |||
| + | # run pasa | ||
| + | Launch_PASA_pipeline.pl \ | ||
| + | --create --run \ | ||
| + | -c $CONFIG \ | ||
| + | -g $GENOME \ | ||
| + | -t $TRANSCRIPTOME \ | ||
| + | --transcribed_is_aligned_orient \ | ||
| + | --ALIGNERS blat, | ||
| + | --CPU $THREADS | ||
| + | |||
| + | conda deactivate | ||
| + | </ | ||
| + | ==== Compiling the final gene models with EvidenceModeler ==== | ||
gene_prediction_framework.1670338086.txt.gz · Last modified: by 134.190.232.140
