cleaning_of_illumina_paired_end_reads
Differences
This shows you the differences between two versions of the page.
| Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
| cleaning_of_illumina_paired_end_reads [2019/09/19 23:37] – 170.10.235.116 | cleaning_of_illumina_paired_end_reads [2021/12/16 15:07] (current) – 170.10.250.122 | ||
|---|---|---|---|
| Line 1: | Line 1: | ||
| - | **Basic Usage of Trimmomatic**\\ | + | Updated by Dandan Zhao and D. Salas-Leiva (2020-12-15) |
| + | |||
| + | **1. Quality control: FastQC**\\ | ||
| + | FastQC reads fastq file and produces a quality control report consisting | ||
| + | • Basic Statistics: Total Sequences, Sequences flagged as poor quality, Sequence length, | ||
| + | • Per base sequence quality \\ | ||
| + | • Per sequence quality scores\\ | ||
| + | • Per base sequence content\\ | ||
| + | • Per sequence GC content\\ | ||
| + | • Per base N content\\ | ||
| + | • Sequence Length Distribution\\ | ||
| + | • Overrepresented sequences\\ | ||
| + | • Adapter Content\\ | ||
| + | |||
| + | input can be fastq or bam/sam file, output is a html report and a compressed report file.\\ | ||
| + | |||
| + | < | ||
| + | | ||
| + | or | ||
| + | | ||
| + | </ | ||
| + | |||
| + | **2. Remove adapters: | ||
| The shell script below will trim/clean up Illumina paired reads (for options see http:// | The shell script below will trim/clean up Illumina paired reads (for options see http:// | ||
| - | If you have your own set of adapters, let's say in the following path (/// | + | If you have your own set of adapters, let's say in the following path (/// |
| //input can be fastq (decompressed or compressed as gz), output will be decompressed by default.\\ | //input can be fastq (decompressed or compressed as gz), output will be decompressed by default.\\ | ||
| always check if there is a newer update of Trimmomatic available in perun.// | always check if there is a newer update of Trimmomatic available in perun.// | ||
| - | == shell script: | + | == shell script: |
| - | Paired End Mode:\\ | + | |
| < | < | ||
| #!/bin/bash | #!/bin/bash | ||
| - | #$ -S /bin/sh | + | #$ -S /bin/bash |
| . / | . / | ||
| #$ -cwd | #$ -cwd | ||
| - | #$ -pe threaded | + | #$ -pe threaded |
| - | cd /path/to/userdir | + | |
| - | java -jar /opt/perun/Trimmomatic-0.36/trimmomatic-0.36.jar | + | cd $PWD |
| + | R1=/abspath/to/reads.R1.fastq.gz | ||
| + | R2=/abspath/to/reads.R2.fastq.gz | ||
| + | basename=my_bug | ||
| + | # use either your own adapters or the comprehensive adapter list in the path below: | ||
| + | adap_path=/home/ | ||
| + | # trimmomatic version is 0.39 (latest release) | ||
| + | source activate trimmomatic | ||
| + | trimmomatic | ||
| + | conda deactivate | ||
| </ | </ | ||
| - | Single End Mode:\\ | + | This will perform the following: |
| + | • HEADCROP: Cut the specified number (20) of bases from the start of the read. (HEADCROP: | ||
| + | • LEADING: Remove bases at the start of a read if the quality is below quality 20. (LEADING: | ||
| + | • TRAILING: Remove bases at the end of a read if the quality is below quality 20. (TRAILING: | ||
| + | • SLIDINGWINDOW: | ||
| + | • MINLEN: Drop the read if it is below a specified length (below 40) (MINLEN: | ||
| + | • -phred33: Quality scores in Phred33 format. [https:// | ||
| + | • ILLUMINACLIP: | ||
| + | ILLUMINACLIP:< | ||
| + | *fastaWithAdaptersEtc: | ||
| + | *seedMismatches: | ||
| + | *palindromeClipThreshold: | ||
| + | *simpleClipThreshold: | ||
| + | |||
| + | |||
| + | Parameters listed here are a little strict. You can adjust them based on your demand. To test the quality of the read, use fastqc first to get a quality control report. (More details about Trimmomatic: | ||
| + | |||
| + | **3. Remove host reads: bowtie2**\\ | ||
| + | |||
| + | Step1. Download host reference genome (https:// | ||
| + | |||
| + | Step2. Use // | ||
| + | |||
| + | == shell script:\\ | ||
| < | < | ||
| #!/bin/bash | #!/bin/bash | ||
| Line 25: | Line 79: | ||
| . / | . / | ||
| #$ -cwd | #$ -cwd | ||
| - | #$ -pe threaded 10 | + | cd $PWD |
| - | cd / | + | /opt/perun/bin/bowtie2-build -f Host_ref_genome.fna Host_ref_genome |
| - | java -jar /opt/perun/Trimmomatic-0.36/trimmomatic-0.36.jar SE -threads 10 -phred33 -trimlog LOG_NAME.out READS.fastq READS_Ntrim.fq HEADCROP:20 LEADING:10 TRAILING:10 SLIDINGWINDOW: | + | |
| </ | </ | ||
| - | This will perform the following: | + | Step3. Mapping |
| - | • HEADCROP: Cut the specified number (20) of bases from the start of the read.\\ | + | |
| - | • LEADING: Remove bases in the start of a read if the quality is below quality 10. (LEADING:10)\\ | + | |
| - | • TRAILING: Remove bases in the end of a read if the quality is below quality 10. (TRAILING: | + | |
| - | • SLIDINGWINDOW: | + | |
| - | • MINLEN: Drop the read if it is below a specified length (below 40) (MINLEN: | + | |
| - | • -phred33: Quality scores in Phred33 format. [https://en.wikipedia.org/wiki/FASTQ_format]\\ | + | |
| - | • ILLUMINACLIP: | + | |
| + | Input need decompressed fastq files. Output files: | ||
| + | SRR1747060.sam (File with SAM alignment infomation ) | ||
| + | SRR1747060_bowtie2.1.fastq | ||
| + | SRR1747060_bowtie2.2.fastq | ||
| + | SRR1747060_bowtie2_un.fastq | ||
| - | Parameters listed here are a little strict. You can adjust them based on your demand. To test the quality of the read, use fastqc first to get a quality control report.\\ | + | == shell script:\\ |
| + | < | ||
| + | #!/bin/sh | ||
| + | #$ -S /bin/sh | ||
| + | . / | ||
| + | #$ -o bowtie2.log | ||
| + | #$ -cwd | ||
| + | #$ -pe threaded 10 | ||
| + | bowtie2 --threads 10 -x Host_ref_genome -1 READS_R1_PairNtrim.fq -2 READS_R2_PairNtrim.fq -U READS_R1_unPairNtrim.fq,READS_R2_unPairNtrim.fq -S SRR1747060.sam --un-conc READS_bowtie2.fastq --un READS_bowtie2_un.fastq | ||
| + | </ | ||
| + | | ||
cleaning_of_illumina_paired_end_reads.1568947040.txt.gz · Last modified: by 170.10.235.116
