User Tools

Site Tools


cleaning_of_illumina_paired_end_reads

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
cleaning_of_illumina_paired_end_reads [2019/09/20 00:19] 170.10.235.116cleaning_of_illumina_paired_end_reads [2021/12/16 15:07] (current) 170.10.250.122
Line 1: Line 1:
-Documentation by Dandan Zhao+Updated by Dandan Zhao and D. Salas-Leiva (2020-12-15)
  
 **1. Quality control: FastQC**\\ **1. Quality control: FastQC**\\
 FastQC reads fastq file and produces a quality control report consisting of the different modules that are very useful for cleaning the reads:\\ FastQC reads fastq file and produces a quality control report consisting of the different modules that are very useful for cleaning the reads:\\
-• Basic Statistics: Total Sequences, Sequences flagged as poor quality, Sequence length,%GC +• Basic Statistics: Total Sequences, Sequences flagged as poor quality, Sequence length,%GC\\ 
-• Per base sequence quality  +• Per base sequence quality \\ 
-• Per sequence quality scores +• Per sequence quality scores\\ 
-• Per base sequence content +• Per base sequence content\\ 
-• Per sequence GC content +• Per sequence GC content\\ 
-• Per base N content +• Per base N content\\ 
-• Sequence Length Distribution +• Sequence Length Distribution\\ 
-• Overrepresented sequences +• Overrepresented sequences\\ 
-• Adapter Content+• Adapter Content\\
  
 input can be fastq or bam/sam file, output is a html report and a compressed report file.\\ input can be fastq or bam/sam file, output is a html report and a compressed report file.\\
Line 29: Line 29:
 always check if there is a newer update of Trimmomatic available in perun.// always check if there is a newer update of Trimmomatic available in perun.//
  
-== shell script: do no leave spaces between lines\\+== shell script:
 <code> <code>
 #!/bin/bash #!/bin/bash
-#$ -S /bin/sh+#$ -S /bin/bash
 . /etc/profile . /etc/profile
 #$ -cwd #$ -cwd
-#$ -pe threaded 10 +#$ -pe threaded 20 
-cd /path/to/userdir + 
-java -jar /opt/perun/Trimmomatic-0.36/trimmomatic-0.36.jar PE -threads 10 -phred33 -trimlog LOG_NAME.out READS_R1.fastq READS_R2.fastq READS_R1_PairNtrim.fq READS_R1_unPairNtrim.fq READS_R2_PairNtrim.fq READS_R2_unPairNtrim.fq HEADCROP:20 LEADING:10 TRAILING:10 SLIDINGWINDOW:10:25 MINLEN:40+cd $PWD 
 +R1=/abspath/to/reads.R1.fastq.gz 
 +R2=/abspath/to/reads.R2.fastq.gz 
 +basename=my_bug 
 +# use either your own adapters or the comprehensive adapter list in the path below: 
 +adap_path=/home/dsalas/anvioWrap/TrueSeq2_NexteraSE-PE.fa 
 +# trimmomatic version is 0.39 (latest release) 
 +source activate trimmomatic 
 +trimmomatic PE -threads 20 -phred33 -trimlog $basename\.log $R1 $R2 $basename\.1.PT.fq $basename\.1.unPT.fq $basename\.2.PT.fq $basename\.2.unPT.fq ILLUMINACLIP:$adap_path:2:30:10 HEADCROP:15 LEADING:20 TRAILING:20 SLIDINGWINDOW:40:25 MINLEN:40 
 +conda deactivate 
 </code> </code>
  
 This will perform the following:\\ This will perform the following:\\
-• HEADCROP: Cut the specified number (20) of bases from the start of the read.\\ +• HEADCROP: Cut the specified number (20) of bases from the start of the read. (HEADCROP:15)\\ 
-• LEADING: Remove bases in the start of a read if the quality is below quality 10. (LEADING:10)\\ +• LEADING: Remove bases at the start of a read if the quality is below quality 20. (LEADING:20)\\ 
-• TRAILING: Remove bases in the end of a read if the quality is below quality 10. (TRAILING:10)\\+• TRAILING: Remove bases at the end of a read if the quality is below quality 20. (TRAILING:20)\\
 • SLIDINGWINDOW: Perform a sliding window trimming, cutting once the average quality within the window falls below a threshold. Scan the read with a 10-base wide sliding window, cutting when the average quality per base drops below 25 (SLIDINGWINDOW:10:25)\\ • SLIDINGWINDOW: Perform a sliding window trimming, cutting once the average quality within the window falls below a threshold. Scan the read with a 10-base wide sliding window, cutting when the average quality per base drops below 25 (SLIDINGWINDOW:10:25)\\
 • MINLEN: Drop the read if it is below a specified length (below 40) (MINLEN:40).\\ • MINLEN: Drop the read if it is below a specified length (below 40) (MINLEN:40).\\
 • -phred33: Quality scores in Phred33 format. [https://en.wikipedia.org/wiki/FASTQ_format]\\ • -phred33: Quality scores in Phred33 format. [https://en.wikipedia.org/wiki/FASTQ_format]\\
-• ILLUMINACLIP: Remove adapters <ILLUMINACLIP:/path/to/userdir/Adapters.fas:2:30:10> (/path/to/userdir/Adapters.fas, adapter file). \\+• ILLUMINACLIP: Remove adapters <ILLUMINACLIP:/path/to/userdir/Adapters.fas:2:30:10> (/path/to/userdir/Adapters.fas, adapter file).\\ 
 +  ILLUMINACLIP:<fastaWithAdaptersEtc>:<seed mismatches>:<palindrome clip threshold>:<simple clip threshold> 
 +  *fastaWithAdaptersEtc: specifies the path to a fasta file containing all the adapters, PCR sequences etc. The naming of the various sequences within this file determines how they are used. See below. 
 +  *seedMismatches: specifies the maximum mismatch count which will still allow a full match to be performed 
 +  *palindromeClipThreshold: specifies how accurate the match between the two 'adapter ligated' reads must be for PE palindrome read alignment. 
 +  *simpleClipThreshold: specifies how accurate the match between any adapter etc. sequence must be against a read. 
  
  
-Parameters listed here are a little strict. You can adjust them based on your demand. To test the quality of the read, use fastqc first to get a quality control report.\\+Parameters listed here are a little strict. You can adjust them based on your demand. To test the quality of the read, use fastqc first to get a quality control report. (More details about Trimmomatic: http://www.usadellab.org/cms/?page=trimmomatic)\\
  
 **3. Remove host reads: bowtie2**\\ **3. Remove host reads: bowtie2**\\
Line 70: Line 85:
 Step3. Mapping the metagenome to reference genome (host) and write paired-end reads that fail to align to output file by Bowtie 2 (for options see http://bowtie-bio.sourceforge.net/bowtie2/manual.shtml).\\ Step3. Mapping the metagenome to reference genome (host) and write paired-end reads that fail to align to output file by Bowtie 2 (for options see http://bowtie-bio.sourceforge.net/bowtie2/manual.shtml).\\
  
-Input need decompressed fastq files. Output file+Input need decompressed fastq files. Output files
 SRR1747060.sam (File with SAM alignment infomation ) SRR1747060.sam (File with SAM alignment infomation )
 SRR1747060_bowtie2.1.fastq SRR1747060_bowtie2.1.fastq
cleaning_of_illumina_paired_end_reads.1568949571.txt.gz · Last modified: by 170.10.235.116