User Tools

Site Tools


mapping_rnaseq_data_to_your_genome

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
mapping_rnaseq_data_to_your_genome [2024/02/23 09:48] – [The perun shell script] 134.190.232.164mapping_rnaseq_data_to_your_genome [2024/10/25 15:20] (current) – [What the script does] 134.190.144.194
Line 1: Line 1:
 ====== Mapping RNAseq data to your genome ====== ====== Mapping RNAseq data to your genome ======
  
-Initially written by I don't know who. Updated by Joran Martijn, December 2023+Initially written by I don't know who. Updated by Joran Martijn, December 2023. (Minor update, KW, Oct 2024)
  
  
Line 113: Line 113:
 # indicate one or more files containing unpaired reads # indicate one or more files containing unpaired reads
 -U forward_unpaired.fastq,reverse_unpaired.fastq -U forward_unpaired.fastq,reverse_unpaired.fastq
 +
 +# set the maximum (MX) and minimum (MN) mismatch penalties. Default: MX = 6, MN = 2.
 +# NOTE: you may have to increase this to get the optimal mapping of a read across an intron! 
 +# I have noticed this is not always the case with the default penalty settings, 
 +# so it is good to check -Kelsey Williamson
 +--mp MX,MN
 +
 +# set the penalty for non-canonical splice sites (non-GT/AG) - default is 12
 +# so if your genome uses non-canonical splice sites, you want to set this to 0
 +--pen-noncansplice <int>
 </code> </code>
  
Line 185: Line 195:
 Minimap2 has a rich set of parameters that you can finely tune (see also its [[https://lh3.github.io/minimap2/minimap2.html|man]]) but honestly, most of it beyond me. Thankfully Leng Hi has created some presets, which are essentially a preconfigured set of parameters fit for a particular commonly used task. For example, the ''map-ont'' preset (for mapping DNAseq Oxford Nanopore data) uses the default values for all parameters, whereas the ''map-hifi'' (for mapping DNAseq PacBio HiFi data) uses ''-k19 -w19 -U50,500 -g10k -A1 -B4 -O6,26 -E2,1 -s200''. Minimap2 has a rich set of parameters that you can finely tune (see also its [[https://lh3.github.io/minimap2/minimap2.html|man]]) but honestly, most of it beyond me. Thankfully Leng Hi has created some presets, which are essentially a preconfigured set of parameters fit for a particular commonly used task. For example, the ''map-ont'' preset (for mapping DNAseq Oxford Nanopore data) uses the default values for all parameters, whereas the ''map-hifi'' (for mapping DNAseq PacBio HiFi data) uses ''-k19 -w19 -U50,500 -g10k -A1 -B4 -O6,26 -E2,1 -s200''.
  
-For mapping long read RNAseq data, we can use the preset ''splice'', which is short for the following parameter settings: ''-k15 -w5 --splice -g2k -G200k -A1 -B2 -O2,32 -E1,0 -b0 -C9 -z200 -ub --junc-bonus=9 --cap-sw-mem=0 --splice-flank=yes''.+For mapping long read RNAseq data, we can use the preset ''splice'', which is short for the following parameter settings: ''-k15 -w5 --splice -g2k -G200k -A1 -B2 -O2,32 -E1,0 -b0 -C9 -z200 -ub --junc-bonus=9 --cap-sw-mem=0 --splice-flank=yes''. The splice preset also ensures that introns in read mappings are properly denoted with the 'N' symbol rather than the 'D' (for deletion) symbol. For example, 200N in the CIGAR string would indicate an intron of 200 bp in the aligned read. This also makes it nicer to view RNAseq alignments in IGV or Tablet.
  
 I think these parameters are particularly tuned for Human / Mouse genomes because ''-G200k'', as far as I understand, indicates that you can expect introns up to 200k in length, and the ''--splice-flank=yes'' option assumes that the base immediately following a **GT donor site** is an A or G, and the base immediately preceding the **AG acceptor site** is a C or T. So you may want to overwrite these settings by invoking these settings after setting the preset: I think these parameters are particularly tuned for Human / Mouse genomes because ''-G200k'', as far as I understand, indicates that you can expect introns up to 200k in length, and the ''--splice-flank=yes'' option assumes that the base immediately following a **GT donor site** is an A or G, and the base immediately preceding the **AG acceptor site** is a C or T. So you may want to overwrite these settings by invoking these settings after setting the preset:
mapping_rnaseq_data_to_your_genome.1708696107.txt.gz · Last modified: by 134.190.232.164