cgeb2001's DokuWiki!

This is an old revision of the document!

For various reasons, at some point, you will want to map your RNAseq data onto your assembled genome.

Unless you have a prokaryotic genome or a eukaryotic genome with intronless genes you need a mapper that can properly deal with spliced alignments. Tophat2 is widely used and cited https://ccb.jhu.edu/software/tophat/index.shtml but has now been superseded by HISAT2 http://ccb.jhu.edu/software/hisat2/index.shtml

Before you can map your RNAseq reads to your genome you need to build an index of the genome assembly.

In a shell script that you can qsub include the following line

/scratch2/software/hisat2-2.1.0/hisat2-build -f name_of_genome_assembly.fasta root_for_index

This will create a series (depending on the size of the genome) of files with the root name eg

pce_p3x.1.ht2

pce_p3x.2.ht2

pce_p3x.3.ht2

Once the indices have been built you can proceed with the actual mapping. There are many options available http://ccb.jhu.edu/software/hisat2/manual.shtml but for our purposes we will restrict them to the bare essentials.

A typical shell script to qsub would look like this #!/bin/sh #$ -o logo_p3x_hisat #$ -cwd #$ -pe threaded 6

/scratch2/software/hisat2-2.1.0/hisat2 -q -p 6 –phred33 –max-intronlen 1000 -k 2 -x pce_p3x -1 PCE_RNA_1_PairNtrim.fastq -2 PCE_RNA_2_PairNtrim.fastq -S pce_3x.sam