cleaning_of_illumina_paired_end_reads
Differences
This shows you the differences between two versions of the page.
| Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
| cleaning_of_illumina_paired_end_reads [2017/08/09 13:26] – 129.173.94.20 | cleaning_of_illumina_paired_end_reads [2021/12/16 15:07] (current) – 170.10.250.122 | ||
|---|---|---|---|
| Line 1: | Line 1: | ||
| - | **Basic Usage of Trimmomatic**\\ | + | Updated by Dandan Zhao and D. Salas-Leiva (2020-12-15) |
| - | The shell script below will trim/clean up Illumina paired reads (for options see http:// | + | **1. Quality control: FastQC**\\ |
| + | FastQC reads fastq file and produces a quality control report consisting of the different modules that are very useful for cleaning the reads:\\ | ||
| + | • Basic Statistics: Total Sequences, Sequences flagged as poor quality, Sequence length, | ||
| + | • Per base sequence quality \\ | ||
| + | • Per sequence quality scores\\ | ||
| + | • Per base sequence content\\ | ||
| + | • Per sequence GC content\\ | ||
| + | • Per base N content\\ | ||
| + | • Sequence Length Distribution\\ | ||
| + | • Overrepresented sequences\\ | ||
| + | • Adapter Content\\ | ||
| + | |||
| + | input can be fastq or bam/sam file, output is a html report and a compressed report file.\\ | ||
| + | |||
| + | < | ||
| + | | ||
| + | or | ||
| + | | ||
| + | </ | ||
| + | |||
| + | **2. Remove adapters: Trimmomatic**\\ | ||
| + | |||
| + | The shell script below will trim/clean up Illumina paired reads (for options see http:// | ||
| + | If you have your own set of adapters, | ||
| //input can be fastq (decompressed or compressed as gz), output will be decompressed by default.\\ | //input can be fastq (decompressed or compressed as gz), output will be decompressed by default.\\ | ||
| always check if there is a newer update of Trimmomatic available in perun.// | always check if there is a newer update of Trimmomatic available in perun.// | ||
| - | == shell script: | + | == shell script: |
| < | < | ||
| #!/bin/bash | #!/bin/bash | ||
| + | #$ -S /bin/bash | ||
| + | . / | ||
| + | #$ -cwd | ||
| + | #$ -pe threaded 20 | ||
| + | |||
| + | cd $PWD | ||
| + | R1=/ | ||
| + | R2=/ | ||
| + | basename=my_bug | ||
| + | # use either your own adapters or the comprehensive adapter list in the path below: | ||
| + | adap_path=/ | ||
| + | # trimmomatic version is 0.39 (latest release) | ||
| + | source activate trimmomatic | ||
| + | trimmomatic PE -threads 20 -phred33 -trimlog $basename\.log $R1 $R2 $basename\.1.PT.fq $basename\.1.unPT.fq $basename\.2.PT.fq $basename\.2.unPT.fq ILLUMINACLIP: | ||
| + | conda deactivate | ||
| + | |||
| + | </ | ||
| + | |||
| + | This will perform the following: | ||
| + | • HEADCROP: Cut the specified number (20) of bases from the start of the read. (HEADCROP: | ||
| + | • LEADING: Remove bases at the start of a read if the quality is below quality 20. (LEADING: | ||
| + | • TRAILING: Remove bases at the end of a read if the quality is below quality 20. (TRAILING: | ||
| + | • SLIDINGWINDOW: | ||
| + | • MINLEN: Drop the read if it is below a specified length (below 40) (MINLEN: | ||
| + | • -phred33: Quality scores in Phred33 format. [https:// | ||
| + | • ILLUMINACLIP: | ||
| + | ILLUMINACLIP:< | ||
| + | *fastaWithAdaptersEtc: | ||
| + | *seedMismatches: | ||
| + | *palindromeClipThreshold: | ||
| + | *simpleClipThreshold: | ||
| + | |||
| + | |||
| + | Parameters listed here are a little strict. You can adjust them based on your demand. To test the quality of the read, use fastqc first to get a quality control report. (More details about Trimmomatic: | ||
| + | |||
| + | **3. Remove host reads: bowtie2**\\ | ||
| + | |||
| + | Step1. Download host reference genome (https:// | ||
| + | |||
| + | Step2. Use // | ||
| + | |||
| + | == shell script:\\ | ||
| + | < | ||
| + | #!/bin/bash | ||
| + | #$ -S /bin/sh | ||
| + | . / | ||
| + | #$ -cwd | ||
| + | cd $PWD | ||
| + | / | ||
| + | </ | ||
| + | |||
| + | Step3. Mapping the metagenome to reference genome (host) and write paired-end reads that fail to align to output file by Bowtie 2 (for options see http:// | ||
| + | |||
| + | Input need decompressed fastq files. Output files: | ||
| + | SRR1747060.sam (File with SAM alignment infomation ) | ||
| + | SRR1747060_bowtie2.1.fastq | ||
| + | SRR1747060_bowtie2.2.fastq | ||
| + | SRR1747060_bowtie2_un.fastq | ||
| + | |||
| + | == shell script:\\ | ||
| + | < | ||
| + | #!/bin/sh | ||
| #$ -S /bin/sh | #$ -S /bin/sh | ||
| . / | . / | ||
| + | #$ -o bowtie2.log | ||
| #$ -cwd | #$ -cwd | ||
| #$ -pe threaded 10 | #$ -pe threaded 10 | ||
| - | cd / | + | bowtie2 |
| - | java -jar / | + | |
| </ | </ | ||
| + | |||
cleaning_of_illumina_paired_end_reads.1502295997.txt.gz · Last modified: by 129.173.94.20
