cleaning_of_illumina_paired_end_reads
Differences
This shows you the differences between two versions of the page.
| Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
| cleaning_of_illumina_paired_end_reads [2019/09/20 00:12] – 170.10.235.116 | cleaning_of_illumina_paired_end_reads [2021/12/16 15:07] (current) – 170.10.250.122 | ||
|---|---|---|---|
| Line 1: | Line 1: | ||
| - | Documentation | + | Updated |
| - | + | ||
| - | 1. Quality control: FastQC | + | |
| - | 2. Remove adapters: Trimmomatic | + | |
| - | 3. Remove host reads: bowtie2 | + | |
| **1. Quality control: FastQC**\\ | **1. Quality control: FastQC**\\ | ||
| FastQC reads fastq file and produces a quality control report consisting of the different modules that are very useful for cleaning the reads:\\ | FastQC reads fastq file and produces a quality control report consisting of the different modules that are very useful for cleaning the reads:\\ | ||
| - | | + | • Basic Statistics: Total Sequences, Sequences flagged as poor quality, Sequence length, |
| - | • Per base sequence quality \\ | + | • Per base sequence quality \\ |
| - | • Per sequence quality scores\\ | + | • Per sequence quality scores\\ |
| - | • Per base sequence content\\ | + | • Per base sequence content\\ |
| - | • Per sequence GC content\\ | + | • Per sequence GC content\\ |
| - | • Per base N content\\ | + | • Per base N content\\ |
| - | • Sequence Length Distribution\\ | + | • Sequence Length Distribution\\ |
| - | • Overrepresented sequences\\ | + | • Overrepresented sequences\\ |
| - | • Adapter Content\\ | + | • Adapter Content\\ |
| + | |||
| + | input can be fastq or bam/sam file, output is a html report and a compressed report file.\\ | ||
| + | |||
| + | < | ||
| + | | ||
| + | or | ||
| + | | ||
| + | </ | ||
| **2. Remove adapters: Trimmomatic**\\ | **2. Remove adapters: Trimmomatic**\\ | ||
| Line 25: | Line 29: | ||
| always check if there is a newer update of Trimmomatic available in perun.// | always check if there is a newer update of Trimmomatic available in perun.// | ||
| - | == shell script: | + | == shell script: |
| < | < | ||
| #!/bin/bash | #!/bin/bash | ||
| - | #$ -S /bin/sh | + | #$ -S /bin/bash |
| . / | . / | ||
| #$ -cwd | #$ -cwd | ||
| - | #$ -pe threaded | + | #$ -pe threaded |
| - | cd /path/to/userdir | + | |
| - | java -jar /opt/perun/Trimmomatic-0.36/trimmomatic-0.36.jar | + | cd $PWD |
| + | R1=/abspath/to/reads.R1.fastq.gz | ||
| + | R2=/abspath/to/reads.R2.fastq.gz | ||
| + | basename=my_bug | ||
| + | # use either your own adapters or the comprehensive adapter list in the path below: | ||
| + | adap_path=/home/ | ||
| + | # trimmomatic version is 0.39 (latest release) | ||
| + | source activate trimmomatic | ||
| + | trimmomatic | ||
| + | conda deactivate | ||
| </ | </ | ||
| This will perform the following: | This will perform the following: | ||
| - | • HEADCROP: Cut the specified number (20) of bases from the start of the read.\\ | + | • HEADCROP: Cut the specified number (20) of bases from the start of the read. (HEADCROP: |
| - | • LEADING: Remove bases in the start of a read if the quality is below quality | + | • LEADING: Remove bases at the start of a read if the quality is below quality |
| - | • TRAILING: Remove bases in the end of a read if the quality is below quality | + | • TRAILING: Remove bases at the end of a read if the quality is below quality |
| • SLIDINGWINDOW: | • SLIDINGWINDOW: | ||
| • MINLEN: Drop the read if it is below a specified length (below 40) (MINLEN: | • MINLEN: Drop the read if it is below a specified length (below 40) (MINLEN: | ||
| • -phred33: Quality scores in Phred33 format. [https:// | • -phred33: Quality scores in Phred33 format. [https:// | ||
| - | • ILLUMINACLIP: | + | • ILLUMINACLIP: |
| + | ILLUMINACLIP:< | ||
| + | *fastaWithAdaptersEtc: | ||
| + | *seedMismatches: | ||
| + | *palindromeClipThreshold: | ||
| + | *simpleClipThreshold: | ||
| - | Parameters listed here are a little strict. You can adjust them based on your demand. To test the quality of the read, use fastqc first to get a quality control report.\\ | + | Parameters listed here are a little strict. You can adjust them based on your demand. To test the quality of the read, use fastqc first to get a quality control report. |
| **3. Remove host reads: bowtie2**\\ | **3. Remove host reads: bowtie2**\\ | ||
| Line 66: | Line 85: | ||
| Step3. Mapping the metagenome to reference genome (host) and write paired-end reads that fail to align to output file by Bowtie 2 (for options see http:// | Step3. Mapping the metagenome to reference genome (host) and write paired-end reads that fail to align to output file by Bowtie 2 (for options see http:// | ||
| - | Input need decompressed fastq files. Output | + | Input need decompressed fastq files. Output |
| SRR1747060.sam (File with SAM alignment infomation ) | SRR1747060.sam (File with SAM alignment infomation ) | ||
| SRR1747060_bowtie2.1.fastq | SRR1747060_bowtie2.1.fastq | ||
cleaning_of_illumina_paired_end_reads.1568949170.txt.gz · Last modified: by 170.10.235.116
