Differences

This shows you the differences between two versions of the page.

--- gene_prediction_curation_with_igv [2023/10/31 11:24] – [Loading into IGV] 134.190.232.191
+++ gene_prediction_curation_with_igv [2025/08/07 16:01] (current) – 134.190.145.228
@@ Line 26: / Line 26: @@
 Note that here two distinct RNAseq tracks are loaded. One for all the reads stemming from transcripts transcribed from the positive strand of the genome, and one for all those transcribed from the negative strand of the genome. This kind of strand-specificity information is only available when a RNAseq library protocol was used to retains this information, for example the dUTP method. I believe nowadays this is the default so it should be available to you.
-If you used HISAT2 to map your RNAseq reads to your genome, all "positive" reads will have the tag ''XS:A:+'', while all "negative" reads will have the ''XS:A:-'' tag. One can separate the two types of reads from the HISAT2-generated BAM file using the following command:
+If you used HISAT2 with ''--rna-strandness RF'' to map your RNAseq reads to your genome, all "positive" reads will have the tag ''XS:A:+'', while all "negative" reads will have the ''XS:A:-'' tag. One can separate the two types of reads from the HISAT2-generated BAM file using the following command:
 <code>
@@ Line 49: / Line 49: @@
 You can play with the window size ''-W'' and overlap ''-s'' parameters to fine tune it to your own liking. You can either load in this track via ''File -> Load From File ..'' or through the JSON file (see below)
 ==== Loading into IGV ====
@@ Line 228: / Line 229: @@
 For some frustratingly unknow reason, IGV only displays the three forward frames, OR the three reverse frames. It is, as far as I know, not possible to display all six frames simultaneously. To swap between reverse and forward frames, click on small arrow of the //Sequence// track. To see the sequence track, you need to be sufficiently zoomed in.
 ==== Curating gene models ====
@@ Line 354: / Line 356: @@
   * Does the translate CDS start with an M and end with a * (STOP)?
   * Are there any premature * (STOP) in the CDS?
+  * If Blastocystis genome, and there is no STOP, does it end with a TGTTTGTT motif?
 <code>
@@ Line 369: / Line 372: @@
 </code>
-NOTE: Currently, the checks performed in the script assume that you haven't annotated any UTRs in your GFF3. If you have UTR (''three_prime_UTR'' or ''five_prime_UTR'') features, for example the start coordinate of the first CDS will not match the start coordinate of the gene. It will thus throw an error, even though your GFF3 is perfectly fine. Something I'll have to fix in the future.
+<del>NOTE: Currently, the checks performed in the script assume that you haven't annotated any UTRs in your GFF3. If you have UTR (''three_prime_UTR'' or ''five_prime_UTR'') features, for example the start coordinate of the first CDS will not match the start coordinate of the gene. It will thus throw an error, even though your GFF3 is perfectly fine. Something I'll have to fix in the future.</del>
-TODO: also make it so that it collects all errors are reports them at the end, rather than quitting at the first error.
+UPDATE (Jan 2025): It should now account for UTRs
 ==== Re-streamlining gene names ====