User Tools

Site Tools


evaluating_and_comparing_transcriptome_assemblies_with_rnaquast

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
evaluating_and_comparing_transcriptome_assemblies_with_rnaquast [2021/01/11 14:44] – created cgeb2001evaluating_and_comparing_transcriptome_assemblies_with_rnaquast [2021/01/11 15:01] (current) cgeb2001
Line 1: Line 1:
 +===== Evaluating and comparing transcriptome assemblies with rnaQUAST =====
 +
 +Documentation by Joran Martijn (11 January 2021)
 +
 Once you have assembled your RNA-seq reads into transcriptomes, you'll want to have some sense of assembly quality, or just a summary of your transcriptome in general (number of transcripts, number of predicted genes, estimated completeness, longest transcript etc.) Once you have assembled your RNA-seq reads into transcriptomes, you'll want to have some sense of assembly quality, or just a summary of your transcriptome in general (number of transcripts, number of predicted genes, estimated completeness, longest transcript etc.)
  
Line 29: Line 33:
 Quick explanation on the options: Quick explanation on the options:
 <code> <code>
--o            output_directory +-o                  output_directory 
--c            transcriptome1.fasta transcriptome2.fasta ... transcriptomeN.fasta +-c                  transcriptome1.fasta transcriptome2.fasta ... transcriptomeN.fasta 
--t            number_of_threads +-t                  number_of_threads 
--l            transcriptome_label1 transcriptome_label2 ... transcriptome_labelN +-l                  transcriptome_label1 transcriptome_label2 ... transcriptome_labelN 
--ss           invoke if you used a strand specific mRNA library +-ss                 invoke if you used a strand specific mRNA library 
---gene_mark   to activate gene prediction with GeneMark +--gene_mark         to activate gene prediction with GeneMark 
---busco       to activate completeness evaluation with BUSCO+--busco <busco_db>  to activate completeness evaluation with BUSCO
 </code> </code>
  
Line 41: Line 45:
  
 Note that the submission script is specifically asking to run on ''144G-batch'', ''256G-batch'' or ''768G-batch'' nodes. For some unclear reason only these nodes will run the software smoothly without fail. It may work on some specific ''16G-batch'' nodes as well but its a gamble. Note that the submission script is specifically asking to run on ''144G-batch'', ''256G-batch'' or ''768G-batch'' nodes. For some unclear reason only these nodes will run the software smoothly without fail. It may work on some specific ''16G-batch'' nodes as well but its a gamble.
 +
 +In this case I ran rnaQUAST without reference genomes, but it is possible to do so. That should give you a lot more interesting metrics to look at.
 +
 +=== Output ===
 +
 +Here is the output that I got as an example. As you can see I got about twice as many transcripts with rnaspades compared to trinity, but trinity yielded in general longer transcripts and more predicted genes. Completeness estimates are about the same.
 +
 +''short_report.txt''
 +
 +<code>
 +SHORT SUMMARY REPORT
 +
 +METRICS/TRANSCRIPTS                                    trinity                  rnaspades
 +
 + == BASIC TRANSCRIPTS METRICS ==
 +Transcripts                                            16142                    33100
 +Transcripts > 500 bp                                   7208                     4977
 +Transcripts > 1000 bp                                  4584                     3447
 +
 + == BUSCO METRICS ==
 +Complete                                               39.216                   37.647
 +Partial                                                8.235                    8.627
 +
 + == GeneMarkS-T METRICS ==
 +Predicted genes                                        6936                     5098
 +</code>
 +
 +rnaQUAST will also output a PDF with a nice Cumulative transcript / isoform length plot.
evaluating_and_comparing_transcriptome_assemblies_with_rnaquast.1610390663.txt.gz · Last modified: by cgeb2001