gene_prediction_curation_with_igv
Differences
This shows you the differences between two versions of the page.
| Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
| gene_prediction_curation_with_igv [2023/10/18 09:59] – [Validating curated gene models] 134.190.232.191 | gene_prediction_curation_with_igv [2025/08/07 16:01] (current) – 134.190.145.228 | ||
|---|---|---|---|
| Line 26: | Line 26: | ||
| Note that here two distinct RNAseq tracks are loaded. One for all the reads stemming from transcripts transcribed from the positive strand of the genome, and one for all those transcribed from the negative strand of the genome. This kind of strand-specificity information is only available when a RNAseq library protocol was used to retains this information, | Note that here two distinct RNAseq tracks are loaded. One for all the reads stemming from transcripts transcribed from the positive strand of the genome, and one for all those transcribed from the negative strand of the genome. This kind of strand-specificity information is only available when a RNAseq library protocol was used to retains this information, | ||
| - | If you used HISAT2 to map your RNAseq reads to your genome, all " | + | If you used HISAT2 |
| < | < | ||
| Line 35: | Line 35: | ||
| samtools index < | samtools index < | ||
| </ | </ | ||
| + | |||
| + | ==== GC content ==== | ||
| + | |||
| + | It may be nice to have a track in IGV that shows you per-window-size %GC. Here I generate a so-called '' | ||
| + | |||
| + | < | ||
| + | cat Ergobibamus_cyprinoides_CL.scaffolds.fa \ | ||
| + | | seqkit sliding -W 50 -s 25 \ | ||
| + | | seqkit fx2tab -n -g \ | ||
| + | | sed -e " | ||
| + | > ergo_gc_content.bedgraph | ||
| + | </ | ||
| + | |||
| + | You can play with the window size '' | ||
| ==== Loading into IGV ==== | ==== Loading into IGV ==== | ||
| Line 156: | Line 170: | ||
| " | " | ||
| " | " | ||
| + | }, | ||
| + | { | ||
| + | " | ||
| + | " | ||
| + | " | ||
| + | " | ||
| + | " | ||
| + | " | ||
| + | " | ||
| + | " | ||
| } | } | ||
| + | |||
| ] | ] | ||
| Line 177: | Line 202: | ||
| " | " | ||
| " | " | ||
| - | " | + | " |
| " | " | ||
| " | " | ||
| Line 204: | Line 229: | ||
| For some frustratingly unknow reason, IGV only displays the three forward frames, OR the three reverse frames. It is, as far as I know, not possible to display all six frames simultaneously. To swap between reverse and forward frames, click on small arrow of the // | For some frustratingly unknow reason, IGV only displays the three forward frames, OR the three reverse frames. It is, as far as I know, not possible to display all six frames simultaneously. To swap between reverse and forward frames, click on small arrow of the // | ||
| + | |||
| ==== Curating gene models ==== | ==== Curating gene models ==== | ||
| Line 330: | Line 356: | ||
| * Does the translate CDS start with an M and end with a * (STOP)? | * Does the translate CDS start with an M and end with a * (STOP)? | ||
| * Are there any premature * (STOP) in the CDS? | * Are there any premature * (STOP) in the CDS? | ||
| + | * If Blastocystis genome, and there is no STOP, does it end with a TGTTTGTT motif? | ||
| < | < | ||
| Line 345: | Line 372: | ||
| </ | </ | ||
| - | NOTE: Currently, the checks performed in the script assume that you haven' | + | <del>NOTE: Currently, the checks performed in the script assume that you haven' |
| + | |||
| + | UPDATE (Jan 2025): It should now account for UTRs | ||
| + | |||
| + | ==== Re-streamlining gene names ==== | ||
| + | |||
| + | During your curation, you may be forced to make new gene names that stray from the original format. For example, if you found that gene '' | ||
| + | |||
| + | To re-streamline gene names, you can make use of the '' | ||
| + | |||
| + | < | ||
| + | funannotate util gff-rename \ | ||
| + | --gff3 Ergobibamus_curated.gff3 \ | ||
| + | --fasta Ergobibamus_contigs.fasta \ | ||
| + | --locus_tag PYV62 \ | ||
| + | --out Ergobibamus_curated_renamed.gff3 | ||
| + | </ | ||
| - | TODO: also make it so that it collects all errors are reports them at the end, rather than quitting at the first error. Make use of assert statements? | + | Note that I have also added a new locus tag here, '' |
gene_prediction_curation_with_igv.1697633973.txt.gz · Last modified: by 134.190.232.191
