Trinity - Assembly of transcriptome reads

Documentation by Shelby Williams (last updated by D. Salas-Leiva, 01-07-2020 and J. Martijn 24-04-2023 and K.Dunn 07-11-2025)

Trinity is an assembler of RNA-seq reads, after they have been trimmed. Trinity uses three programs (Inchworm, Chrysalis, and Butterfly) to assemble large volumes of transcriptome reads. The output of Trinity is the Trinity.fasta file found in the /trinity_out_dir/ folder/

A simple Trinity shell script, using the new conda-environments:

For strand-specific data (No genome guided or de novo):

It could be RF or FR. We usually get RF (typical of the dUTP/UDG sequencing method). If unsure, you need to draw violin plots to determine the specificity.

#!/bin/bash
#$ -S /bin/bash
. /etc/profile
#$ -cwd
#$ -pe threaded 10

##source activate trinity-2.11-with-workaround no longer need to use, use newest version instead
## this special built is version is 2.11 - use new version now instead see below 

source activate trinity 
## this is version 2.15.2 

Trinity \
    --seqType fq \
    --SS_lib_type RF \
    --left Reads_R1_PairNtrim.fastq \
    --right Reads_2_PairNtrim.fastq \
    --CPU 10 \
    --max_memory 20G

conda deactivate

If your genome is compact, that is, genes are very close together with minimum intergenic space, it may be beneficial to run Trinity with the –jaccard_clip option. This can prevent Trinity from falsely assembling transcripts from multiple consecutive genes in a row into a single transcript.

From the –help :

#  --jaccard_clip                  :option, set if you have paired reads and
#                                   you expect high gene density with UTR
#                                   overlap (use FASTQ input file format
#                                   for reads).
#                                   (note: jaccard_clip is an expensive
#                                   operation, so avoid using it unless
#                                   necessary due to finding excessive fusion
#                                   transcripts w/o it.)

For strand-specific data (Genome guided):

It could be RF or FR. We usually get RF (typical of the dUTP/UDG sequencing method). If unsure, you need to draw violin plots to determine the specificity.

#!/bin/bash
#$ -S /bin/bash
. /etc/profile
#$ -cwd
#$ -pe threaded 10

source activate trinity-2.11-with-workaround

# this special built is version is 2.11
Trinity \
    --CPU 10 \
    --max_memory 100G \
    --genome_guided_bam yourgenome.fasta.sambamsorted.bam \
    --genome_guided_max_intron 1000 \
    --SS_lib_type RF
    -
conda deactivate

For NO strand specific data:

#!/bin/bash
#$ -S /bin/bash
. /etc/profile
#$ -cwd
#$ -pe threaded 10

source activate trinity-2.11-with-workaround

# this special built is version is 2.11
Trinity \
    --seqType fq \
    --left Reads_R1_PairNtrim.fastq \
    --right Reads_2_PairNtrim.fastq \
    --CPU 10 \
    --max_memory 20G

source deactivate

ATTENTION!

Some Trinity versions, such as 2.4.0, are not compatible with Bowtie2. Skip this step by adding the flag –no_bowtie