blastocystis_orf160
Differences
This shows you the differences between two versions of the page.
| Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
| blastocystis_orf160 [2025/07/10 11:03] – [The search for alternative initiatior tRNAs] 134.190.145.228 | blastocystis_orf160 [2025/11/10 15:20] (current) – [Little to no expression of orf160 in regular and riboZeroPlus RNAseq data of ST7C] 134.190.191.148 | ||
|---|---|---|---|
| Line 1: | Line 1: | ||
| ===== orf160 in Blastocystis mitochondrial genomes ===== | ===== orf160 in Blastocystis mitochondrial genomes ===== | ||
| - | ==== The Literature: Jacob et al, 2016, GBE ==== | + | Knowledge from the Literature: Jacob et al, 2016, GBE |
| === Basic properties === | === Basic properties === | ||
| Line 25: | Line 25: | ||
| === orf160 (negative strand) overlaps with its upstream neighbor, nad7 (negative strand) === | === orf160 (negative strand) overlaps with its upstream neighbor, nad7 (negative strand) === | ||
| - | Jacob //et al// identified a 55 or 56 bp overlap on the N-terminal | + | Jacob //et al// identified a 55 or 56 bp overlap on the 5 prime end of //orf160//. This means that the first 19 codons / aa’s of //orf160// overlap with the 3 prime end of //nad7//. We checked with the // |
| === No evidence as of yet for transcription of this gene === | === No evidence as of yet for transcription of this gene === | ||
| Line 35: | Line 35: | ||
| If this is true, all mitochondrial genes should end with '' | If this is true, all mitochondrial genes should end with '' | ||
| - | === Hypothesis 2: TAG in position 9 is RNA edited to TAA === | + | === Hypothesis 2: TAG in position 9 is RNA edited to a sense codon === |
| Unable to check this hypothesis because no RNA data available | Unable to check this hypothesis because no RNA data available | ||
| Line 190: | Line 190: | ||
| ==== FoldMason alignment of orf160 suggests ATG is not used as a START codon ==== | ==== FoldMason alignment of orf160 suggests ATG is not used as a START codon ==== | ||
| - | //orf160// sequences within // | + | //orf160// sequences within // |
| I ran AlphaFold3 on all Jacob //et al// //orf160// predicted amino acid sequences (replacing ‘*’ with ‘X’) of many different subtypes, and used FoldMason MSA to align these predicted structures with predicted structures of the best Foldseek hits (using ST4 homolog as query). | I ran AlphaFold3 on all Jacob //et al// //orf160// predicted amino acid sequences (replacing ‘*’ with ‘X’) of many different subtypes, and used FoldMason MSA to align these predicted structures with predicted structures of the best Foldseek hits (using ST4 homolog as query). | ||
| Line 306: | Line 306: | ||
| I collected RPS4 homologs from Jacob //et al// MRO genomes and // | I collected RPS4 homologs from Jacob //et al// MRO genomes and // | ||
| - | RPS4 of //Blasto// is indeed about 2x the size of that of // | + | The AF3 predicted structures, even between closely related |
| - | I compared the RPS4 codon alignment with that of the RPL10 / orf160 | + | RPS4 of //Blasto// is indeed about 2x the size of that of // |
| - | Also here there is no codon site that is exclusively '' | + | I compared the RPS4 codon alignment with that of the RPL10 / orf160 alignment. |
| - | - I already had the RPL10 FoldMason alignment (see above). | + | However, since the sequences don't even align that well, I'm not even sure where the true RPS4 gene starts and end in Blastocystis and Proteromonas mtDNA. |
| + | ==== Little to no expression of orf160 in regular and riboZeroPlus RNAseq data of ST7C ==== | ||
| + | |||
| + | To see whether the mysterious //orf160// is actually expressed, I inspected the regular RNAseq, that is, mRNA sequencing via oligo-dT primers/ | ||
| + | |||
| + | Here is the visual overview: | ||
| + | |||
| + | {{: | ||
| + | |||
| + | You can see some RNAseq reads overlapping with ORF160 on its 5' end, and in the proper direction too. However, these may be reads coming from the immediately upstream gene //nad7// (in the IGV figure wrongly annotated as //ndhH//) | ||
| + | |||
| + | Since a regular, polyA capturing RNAseq does by its nature not capture mitochondrial transcripts, | ||
| + | |||
| + | I therefore tried to resequence the same RNA sample using a different kit, the riboZeroPlus kit. This kit uses a set of custom designed probes to remove rRNA transcripts from the total RNA extract prior to library prep and sequencing. | ||
| + | |||
| + | I used the following probes: | ||
| + | |||
| + | < | ||
| + | - 18S rRNA probes (one per line): | ||
| + | |||
| + | TTTCATAAACAAACCAAAAAATCGACTATGAAAGCCAATCTTATTATTCC | ||
| + | CAAACACTTTCAATAAATTATCTAAACTTCAACTACGAGCTTTTTAACTG | ||
| + | TTATCCATATAGAAACTATTCCAAATAAACTATAACTGATATAATGAGCC | ||
| + | CTAACAAGCATGCGATAAAGTCAACAATTATTATTACTCACAATTCAATT | ||
| + | TAGCTTTCGTTCTTGATTAATGAAAACATCCTTGGTAAATGCTTTCGCAC | ||
| + | CAGATACTCGTTGAATAGTTCAGTGTCGCGCGCGTGCAGCCCAGAACATC | ||
| + | CTAAAACTATTTAGACTTACACATGCATGGCTTAATCTTTGAGACGAGCG | ||
| + | CCATGGTAGTCCAATACACTACCATCGAAAGCTGATAGGGCAGAAACTTG | ||
| + | GAAAAATTACAAGCATCAATCCCCATCACGAACTATTTTCAAAAGATTTC | ||
| + | AAATCATAGAATTTCACCTCTAGCTATTGAATATGAATACCCCCAACTGT | ||
| + | TCACCTTCCTCTAGATGATAAGATTTACACGACTTCTCTTCAACTATCTA | ||
| + | ATAAGTACTTCTTTAATGGTTGCCCATCAAAGAAAACACATGTATTAGCC | ||
| + | ACTAACTCCTAGTCGGTATCGTTTATAGCTAAGACTACGAGGGTATCTAA | ||
| + | CTATCAATCTGTCAATCCTTCCTATGTCTGGACCTGGTAAGTTTCCCCGT | ||
| + | TCCTTGCGGAACCATGGCACCCACCTGGATGTCGATAACTTACATAAAAG | ||
| + | GATTTATTGTCACTACCTCCCTGTGTCAGGATTGGGTAATTTACGCGCCT | ||
| + | ATAATTAAAAATCCAAAGTGTTCACCGGATCATCCAATCGGTAGGTGCGA | ||
| + | AAGGGCAGGGACGTAATCAACGCAAGTTGATGACTTGCATTTACTAGGAA | ||
| + | CCTGTTATTGCTTCCAGCTTCCCCGTACTCAAACGCACAGTGTCCCTCTA | ||
| + | ACAATGGGGCATTACTAAAATCCCATTTCATCCAACTAATAGGCGGAAGT | ||
| + | AACTGAACAGTCCGCTTTAAACACTCTAATTTTCTCACAGTAAATGACCA | ||
| + | TGTGGTAGCCATCTCTCAGGCTCCCTCTCCGAAATCGAACCCAAATTCTT | ||
| + | ACTCCCCCCGGAACCCAAAGACTTTGATTTCTCATAAGGTACTAATAGAC | ||
| + | TTGTTTATCGATAACGATTGTACATTGTTCTCAATTCAATTACAAAACCA | ||
| + | |||
| + | - 28S rRNA probes (one per line) | ||
| + | |||
| + | CTAACAATGTCTCCCACGTGGGTTGCAACTCGAGAGAGAAGCTTACACAT | ||
| + | AGCCTTTGATGGAGTTTACCACCAACTTCGAGCTGCAATCCCAAACAACT | ||
| + | AAGCCATCACCCCATATTATGGAATAAGTAAAACAACATTAGAGGTAGTG | ||
| + | TCCATGCATCATTCAACCACTCCTACGCTTAACCCCTCCACGATTTCAAG | ||
| + | ATTCAAAATATTGAATTCCTTTACCAATAACAAAACCTTTTCGCGGATTC | ||
| + | GTCGTCTACAAAGGATCTTTGTTCATTGACCATTAAAAATGCTATCAGGG | ||
| + | AGTCCAGCTTACCCGGAATGGCCCACTAGCAACTACTATTCAAAATTACA | ||
| + | AGGCTGTTCGCTTAAGCGCCATCCATTTTCAGGGCTACTTCATTCGGCAG | ||
| + | TTTTCAAAGTGCTTTTCATCTTTCCCTCACGGTACTTGTTCGCTATCGGT | ||
| + | AGCACTGGGCAGAATTCACATTGTGTCAATATATCTTTCACACTATCACA | ||
| + | TTTATCAGAGATGCAAGACCGGTAGTTGTTGCTAGCTCTCTTTAGACAAA | ||
| + | TTTTCTATCCAACTGAGCGAACAATTAGGCGCCGTACCATATCGTTCGGT | ||
| + | AGGTTGACAAATTGCAGAAATAGTTAATAGGGCCGTCCACCTCCCCAGGG | ||
| + | GTTTCAAGACGGGACGGAGAAGCAGTTATTAGGAAAGAGGAAATTCAGTA | ||
| + | AAGCAACTATAATATCTTACCCATTCAAAGTTTGAGAATAGGTCCAGGAT | ||
| + | AAATGTGTTCCCAAAGGGAGGGAAATAATATTACTTTTCAAGGACCCATT | ||
| + | AAGCCGTATCTACTCAAATAGGCTTCTTTATATAGGTCACATCCTTTGGT | ||
| + | CTGCTTCACAAGTACAATACACTATGCAAATACAGGGTTTTCACCTTCTA | ||
| + | GCTACTTCCACCAAGATCTGCACTAATGGACATTCCATATAAGTTTACAC | ||
| + | CATTATTCTATTAACTAGAGGCTATTCACCTTGGAGACCTGATGCGGTTA | ||
| + | GAGAAGAGGTAATAAGGGAAAGGGAATTAATTGATATTTACCAATTTAAC | ||
| + | TACATATTTTAGGAGGGCTTCATGATTAGAGGCTTTCATCACTACGACCC | ||
| + | CGTTCAAAGATTCAATGACTCACAGACTTCTGCAGTTCGCATTACGTATC | ||
| + | TCTCACATTTTACCCAGTCTGCAAGGTATTGGTAGGAAGAGCCGACATCG | ||
| + | AGTTCAACACGATTCCTATGGAACCTTTCTCCACTTCAGTCTTCAAAGAT | ||
| + | CGAGAACCACTGTATTCATATCACTAACCTAGTCAATTGAACTGTTGTCG | ||
| + | TAGTAGACAGACATCCAAGTCAAATCACACTCCAACAAGCATACTCCCAA | ||
| + | AGAGAGTCATAGTTACTCCCGCCGTTTACCCGCGCTTGGTTGAATTCCTT | ||
| + | CATCAATCATCTCATTCATTTGATAACCAAGAACTGACGATCCTATCATT | ||
| + | TCTGTTACCATTCAATTCCATTTCATTGGTTCAGGAATATTAACCTGATT | ||
| + | ACCTTCATTACGCATTTTAGTTTAACACTAAACTACTCGCAAATATGATA | ||
| + | GTTCTAAAAATTCAAAAGAACTTTTTCAACGGATTTCACCTATCTCTTAG | ||
| + | TTTTCCTCTGCTTAGTTAGATGCTTCAATTCAGCAGGTCTTCTTGCTTGA | ||
| + | ATCCAATTCTCATAGTATACTGTTACTAAACAATACTTCTACACTCCACA | ||
| + | CCTAGCCCTCAGAGCCAATCCTTATCCCGAAGTTACGGATCTAATTTGCC | ||
| + | ATTCTATTTCAATGGAGGAAACTCTTAGTCAATCCACCATCAATCATCGT | ||
| + | TTCGTCCTATTCAGGCATAGTTCACCATCTTTCGGGTCCCACCATCTTTG | ||
| + | CCCTTAAAAAGAGTCTCCCACCTATTCTACACCCTCTAAGTCATTTCACA | ||
| + | CATACTGAAAATCAAAATCAAATGAGCTTTTACCCTTTTATTCTACGTAA | ||
| + | TGAGCTCATCTTAGGACACCTGTGTTATTCTTTAACAGATGTGCCGCCCC | ||
| + | GATAAGTCTCAATTTCTCGTTGAACTAAGTCAACTCGAAAACTTACAACC | ||
| + | CCTCTAATCATTCGCTTTACCTCATAAAACTAGACACAGTTGCAGCTATC | ||
| + | GTGTTAATTCGGATTGGGCTTTTCCCACTTCACTCGCCGTTACTAAGGGA | ||
| + | TCCATCACGCCTTCCTACTTGTCACCCCATAATATAACCATCTACTTGAG | ||
| + | CTAGCTTTAAACTCGAAATTCAAATATCTAAAGGATCGATAGGCCATATT | ||
| + | TAAACAGTCGGATTCCCCTTGTCCGTACCAGTTCTGAGTCAGCTATTCAT | ||
| + | CCCAAATTTAAAGATCAATTTGCACGTTAGAATCCACTCGAACCTCCACC | ||
| + | TTTATTATTGTTAACAAGAAAAGAAAACTCTTCCCAGGAGAGTAACCGAT | ||
| + | TACCACCACTAAACAACCACTCCTTTGCATACATTCTTATCATCACAAAC | ||
| + | CAAGCTCAACAGGGTCTTCTTTCCCCGCTGATATTTCCAAGCCCATTCCC | ||
| + | |||
| + | </ | ||
| + | |||
| + | Sequencing was done at the Genomics CORE Lab in the LSRI, with Mat as contact person. | ||
| + | |||
| + | The sequencing run was excellent. Got a lot of data, and it was also really good quality. | ||
| + | |||
| + | After quality trimming the data, I mapped it to the latest version of the ST7C genome with HISAT2. | ||
| + | |||
| + | A lot of reads still mapped to the rRNA genes, but all the other areas still had more than sufficient enough coverage as well. | ||
| + | |||
| + | Importantly, | ||
| + | |||
| + | {{: | ||
| + | |||
| + | **This is not just the result of a higher throughput**. The throughput of the riboZeroPlus run (179 506 076 mapped reads) was about twice that of the original RNAseq run (85 875 153 reads), but far more than twice time the amount of reads now mapped to the MRO genome. (NOTE that for both IGV figures I used the same coverage scale of 2000 in the Coverage track). | ||
| + | |||
| + | What is striking is that the mitochondrial rRNA genes still had an enormous amount of coverage. Perhaps next time you want to sequence the mitochondrial transcriptome of some organism, also include probes targeting the mitochondrial rRNA genes! | ||
| + | |||
| + | Another striking thing is that it seems that **RNAseq coverage of mitochondrial ribosomal genes is much lower** than that of the //nad// genes! Exceptions seem to be //rps12// and //rpl16//. | ||
| + | |||
| + | Zooming in to //orf160// / //rpl10//: | ||
| + | |||
| + | {{: | ||
| + | |||
| + | Unfortunately again it seems we are not seeing any significant evidence of expression of this gene. It may be that the throughput for this gene in particular was not high enough to detect any real expression, so we can't rule it out. Any reads that are overlapping with orf160 may be 3'UTR reads from the //nad7// gene upstream (here annotated as NdhH) | ||
| ====== Ideas to explore ====== | ====== Ideas to explore ====== | ||
| * Check RNA expression levels | * Check RNA expression levels | ||
| - | * What makes a start codon a start codon? | + | * Try to sequence mitochondrial RNA or specifically orf160 RNA (RT-PCR plus sequencing - Jacob et al designed primers for orf160 and rps4, but were unsuccesful). As of July 2025, we still have (I think) total RNA extracts from ST7C, E and H in the -80 in the main lab (in one of Gregs boxes). |
| + | * | ||
| * check for shine dalgarno sequences, possibly after the in-frame start codon? | * check for shine dalgarno sequences, possibly after the in-frame start codon? | ||
| * Andrew: I guess to know if this is ‘significant’ you’d have to look at the density of these kinds of codons throughout the whole sequence. Since mtDNAs like this are A+T-rich, codons that are ‘close’ to ‘ATG’ might not be so rare. | * Andrew: I guess to know if this is ‘significant’ you’d have to look at the density of these kinds of codons throughout the whole sequence. Since mtDNAs like this are A+T-rich, codons that are ‘close’ to ‘ATG’ might not be so rare. | ||
| Line 343: | Line 466: | ||
| QYPTIKIQFFKKSNRNIYLIFLLPYLTNSLILLGCNELNVFFKLCECVSKNILFIKVQNT | QYPTIKIQFFKKSNRNIYLIFLLPYLTNSLILLGCNELNVFFKLCECVSKNILFIKVQNT | ||
| IYSINQFMDCSSNQIMFGQTLNSLYYNLIKVFYSFSLLHK* | IYSINQFMDCSSNQIMFGQTLNSLYYNLIKVFYSFSLLHK* | ||
| + | |||
| + | $$$, &&&, | ||
| > | > | ||
blastocystis_orf160.1752156197.txt.gz · Last modified: by 134.190.145.228 · Currently locked by: 216.73.216.59
