Differences

This shows you the differences between two versions of the page.

--- evaluating_and_comparing_transcriptome_assemblies_with_rnaquast [2021/01/11 14:44] – created cgeb2001
+++ evaluating_and_comparing_transcriptome_assemblies_with_rnaquast [2021/01/11 15:01] (current) – cgeb2001
@@ Line 1: / Line 1: @@
+===== Evaluating and comparing transcriptome assemblies with rnaQUAST =====
+Documentation by Joran Martijn (11 January 2021)
 Once you have assembled your RNA-seq reads into transcriptomes, you'll want to have some sense of assembly quality, or just a summary of your transcriptome in general (number of transcripts, number of predicted genes, estimated completeness, longest transcript etc.)
@@ Line 29: / Line 33: @@
 Quick explanation on the options:
 <code>
--o            output_directory
+-o                  output_directory
--c            transcriptome1.fasta transcriptome2.fasta ... transcriptomeN.fasta
+-c                  transcriptome1.fasta transcriptome2.fasta ... transcriptomeN.fasta
--t            number_of_threads
+-t                  number_of_threads
--l            transcriptome_label1 transcriptome_label2 ... transcriptome_labelN
+-l                  transcriptome_label1 transcriptome_label2 ... transcriptome_labelN
--ss           invoke if you used a strand specific mRNA library
+-ss                 invoke if you used a strand specific mRNA library
---gene_mark   to activate gene prediction with GeneMark
+--gene_mark         to activate gene prediction with GeneMark
---busco       to activate completeness evaluation with BUSCO
+--busco <busco_db>  to activate completeness evaluation with BUSCO
 </code>
@@ Line 41: / Line 45: @@
 Note that the submission script is specifically asking to run on ''144G-batch'', ''256G-batch'' or ''768G-batch'' nodes. For some unclear reason only these nodes will run the software smoothly without fail. It may work on some specific ''16G-batch'' nodes as well but its a gamble.
+In this case I ran rnaQUAST without reference genomes, but it is possible to do so. That should give you a lot more interesting metrics to look at.
+=== Output ===
+Here is the output that I got as an example. As you can see I got about twice as many transcripts with rnaspades compared to trinity, but trinity yielded in general longer transcripts and more predicted genes. Completeness estimates are about the same.
+''short_report.txt''
+<code>
+SHORT SUMMARY REPORT
+METRICS/TRANSCRIPTS                                    trinity                  rnaspades
+ == BASIC TRANSCRIPTS METRICS ==
+Transcripts                                            16142                    33100
+Transcripts > 500 bp                                   7208                     4977
+Transcripts > 1000 bp                                  4584                     3447
+ == BUSCO METRICS ==
+Complete                                               39.216                   37.647
+Partial                                                8.235                    8.627
+ == GeneMarkS-T METRICS ==
+Predicted genes                                        6936                     5098
+</code>
+rnaQUAST will also output a PDF with a nice Cumulative transcript / isoform length plot.