nanopore_tools_for_polishing
Differences
This shows you the differences between two versions of the page.
| Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
| nanopore_tools_for_polishing [2018/07/17 12:54] – 129.173.91.14 | nanopore_tools_for_polishing [2024/08/07 13:01] (current) – 134.190.232.164 | ||
|---|---|---|---|
| Line 1: | Line 1: | ||
| ====== Polishing your MinION assembly ====== | ====== Polishing your MinION assembly ====== | ||
| - | Documentation by Jon Jerlstrom | + | Documentation by Jon Jerlström |
| + | (updates by Joran Martijn) | ||
| + | |||
| + | **Be aware that some scripts and commands might not be working any longer on Perun due to the switch to the new conda-environment system. Sections will be progressively updated to reflect this.** | ||
| The following are tools for polishing Oxford Nanopore assemblies, to help eliminate some of the error associated with the long-read data. Each polisher uses a mapping step prior to correcting the assemblies. BWA MEM is used here as the mapping tool, but can be replaced with another mapper. | The following are tools for polishing Oxford Nanopore assemblies, to help eliminate some of the error associated with the long-read data. Each polisher uses a mapping step prior to correcting the assemblies. BWA MEM is used here as the mapping tool, but can be replaced with another mapper. | ||
| Line 17: | Line 20: | ||
| {{ : | {{ : | ||
| - | Formatting for this script: | + | The following is a script |
| < | < | ||
| minimap -t8 assembly.fasta / | minimap -t8 assembly.fasta / | ||
| </ | </ | ||
| - | Note: for each polish performed, the script must be changed to reflect the new file names. | + | **Note 1**: for each polish performed, the script must be changed to reflect the new file names. |
| + | For genomes with lots of stubborn indels that persist after Nanopolishing and Unicycling, play around with the flags -e (error threshold), -w (window size), and -q (coverage threshold). I found that increasing window size and decreasing coverage threshold works well. The following is a script that contains one round of minimap2 mapping (read **Note 2** on interleaved short reads), Racon polishing, then Bowtie2 mapping (for viewing bam files): | ||
| + | < | ||
| + | #!/bin/bash | ||
| + | #$ -S /bin/bash | ||
| + | . / | ||
| + | #$ -cwd | ||
| + | #$ -pe threaded 8 | ||
| + | |||
| + | cd / | ||
| + | input=your_input_genome.fasta | ||
| + | output=your_output_racon_round1.fasta | ||
| + | minimap2 -t 8 $input interleavedshortreads.fq > temporary.paf | ||
| + | echo " | ||
| + | racon -u -e 0.1 -w 500 -q 1 -t 8 interleavedshortreads.fq temporary.paf $input >$output | ||
| + | echo "racon done" | ||
| + | rm temporary.paf | ||
| + | source activate bowtie2 | ||
| + | bowtie2-build $output $output | ||
| + | bowtie2 -k 2 --threads 8 -x $output --very-sensitive \ | ||
| + | -1 your_forward_short_reads.fq \ | ||
| + | -2 your_reverse_short_reads.fq \ | ||
| + | -S $output.sam | ||
| + | echo " | ||
| + | conda deactivate | ||
| + | samtools view -F 4 -bS $output.sam |samtools sort > $output.sorted.bam | ||
| + | samtools index $output.sorted.bam > $output.sorted.bam.bai | ||
| + | rm $output.sam | ||
| + | rm $output.*.bt2 | ||
| + | </ | ||
| + | You can modify the above script to loop for multiple rounds, but if you're looking for the best settings for correcting errors always check the bam files by eye after each round. | ||
| + | |||
| + | **Note 2**: The " | ||
| + | < | ||
| + | source activate fastx_toolkit | ||
| + | fastq-interleave read1.fq read2.fq > interleaved.fq | ||
| + | </ | ||
| + | **WARNING**: | ||
| ---- | ---- | ||
| Line 42: | Line 82: | ||
| First, make a BWA index of the assembly you wish to map onto by using the following command: | First, make a BWA index of the assembly you wish to map onto by using the following command: | ||
| + | |||
| < | < | ||
| bwa index assembly_to_polish.fasta | bwa index assembly_to_polish.fasta | ||
| </ | </ | ||
| + | |||
| Next, use the meteora_bwa.sh script to map the short reads onto your assembly. This will create a sorted.bam file. In this example, two paired-end read files will be mapped: | Next, use the meteora_bwa.sh script to map the short reads onto your assembly. This will create a sorted.bam file. In this example, two paired-end read files will be mapped: | ||
| + | |||
| < | < | ||
| - | bwa mem -t 16 assembly_to_polish.fasta / | + | bwa mem \ |
| + | | ||
| + | | ||
| + | | ||
| + | | ||
| + | | ||
| </ | </ | ||
| + | |||
| + | UPDATE: You can now run bwa-mem2, which is an optimized version of bwa mem. It generates the exact same output, but is 2-4x faster: | ||
| + | |||
| + | < | ||
| + | bwa-mem2 mem \ | ||
| + | -t 16 \ | ||
| + | assembly_to_polish.fasta \ | ||
| + | / | ||
| + | / | ||
| + | samtools sort --threads 16 -o piloninput.sorted.bam | ||
| + | </ | ||
| + | |||
| Once this is finished, use Pilon.sh to make changes in the assembly and generate a new consensus sequence. Pilon.sh can be formatted like so: | Once this is finished, use Pilon.sh to make changes in the assembly and generate a new consensus sequence. Pilon.sh can be formatted like so: | ||
| + | |||
| < | < | ||
| - | java -Xmx16G -jar / | + | java -Xmx16G -jar / |
| + | | ||
| + | | ||
| + | | ||
| + | | ||
| + | | ||
| </ | </ | ||
| + | |||
| + | UPDATE: The --threads option is as of v1.24 no longer maintained. It seems Pilon doesn' | ||
| + | |||
| You may run into an error where Pilon does not recognize the bam file created from the previous step as being indexed. To fix this, run: | You may run into an error where Pilon does not recognize the bam file created from the previous step as being indexed. To fix this, run: | ||
| + | |||
| < | < | ||
| samtools index / | samtools index / | ||
| </ | </ | ||
| + | |||
| This will return a .bam.bai file. This file needs to be in the same folder as Pilon.sh, but does not need to be placed in the script. | This will return a .bam.bai file. This file needs to be in the same folder as Pilon.sh, but does not need to be placed in the script. | ||
| Line 63: | Line 134: | ||
| Shell script: | Shell script: | ||
| {{ : | {{ : | ||
| + | |||
| + | < | ||
| + | #!/bin/bash | ||
| + | #$ -S /bin/bash | ||
| + | . / | ||
| + | #$ -cwd | ||
| + | #$ -pe threaded 16 | ||
| + | |||
| + | #cd / | ||
| + | |||
| + | echo " | ||
| + | |||
| + | unset PYTHONPATH | ||
| + | export PATH=/ | ||
| + | export LD_LIBRARY_PATH=/ | ||
| + | |||
| + | / | ||
| + | |||
| + | |||
| + | echo " | ||
| + | |||
| + | </ | ||
| Formatting: | Formatting: | ||
| Line 100: | Line 193: | ||
| If illumina reads are available it might be possible to skip nanopolish altogether and go directly to Pilon polishing after Racon. This has been exemplified in the Solanum penellii pre-print where nanopolish simply was not feasible. | If illumina reads are available it might be possible to skip nanopolish altogether and go directly to Pilon polishing after Racon. This has been exemplified in the Solanum penellii pre-print where nanopolish simply was not feasible. | ||
| - | Location: / | + | Location: / |
| Scripts: | Scripts: | ||
| Line 119: | Line 212: | ||
| nanopolish merge - merges the pieces into new a new consensus. | nanopolish merge - merges the pieces into new a new consensus. | ||
| - | **Updated Nanopolish protocol (as of Dec 1st 2017):** | + | **Updated Nanopolish protocol (as of March 8 2020):** |
| First, index your unchopped, raw reads file. | First, index your unchopped, raw reads file. | ||
| Use the sequencing_summary.txt produced by albacore during basecalling to speed up this step significantly. If you have several sequencing_summary.txt files these can be placed in a fof-file with the path to the txt-file and called by -f. This also works in case of a single-file.: | Use the sequencing_summary.txt produced by albacore during basecalling to speed up this step significantly. If you have several sequencing_summary.txt files these can be placed in a fof-file with the path to the txt-file and called by -f. This also works in case of a single-file.: | ||
| + | for the following step **DO NOT** **use more than 1** thread because the program is not threaded! | ||
| < | < | ||
| #!/bin/bash | #!/bin/bash | ||
| Line 128: | Line 222: | ||
| . / | . / | ||
| #$ -cwd | #$ -cwd | ||
| - | #$ -pe threaded | + | #$ -pe threaded |
| cd $PWD | cd $PWD | ||
| - | export PATH=/scratch2/software/anaconda/bin:$PATH | + | fast5path=/scratch2/path2/fast5/ |
| - | source activate nanopolish-python3 | + | fastq=/ |
| - | + | seqsummary=/ | |
| - | / | + | source activate nanopolish-0.13.2 |
| - | -d / | + | export PATH=/ |
| - | -f summary_files.fof \ | + | nanopolish index -d $fast5path |
| - | / | + | conda deactivate |
| - | source deactivate | ||
| </ | </ | ||
| Line 165: | Line 257: | ||
| samtools index reads.sorted.bam | samtools index reads.sorted.bam | ||
| - | source | + | conda deactivate |
| </ | </ | ||
| Line 185: | Line 277: | ||
| cd $PWD | cd $PWD | ||
| - | export PATH=/ | + | export PATH=/ |
| - | source activate nanopolish-python3 | + | |
| + | source activate nanopolish-0.12 | ||
| - | python / | + | nanopolish_makerange.py reference.fasta | parallel --results nanopolish.results -P 20 \ |
| - | reference.fasta | parallel --results nanopolish.results -P 20 \ | + | nanopolish variants --consensus |
| - | / | + | -r / |
| - | -r / | + | |
| - | -b reads.sorted.bam -g reference.fasta -t 10 --min-candidate-frequency 0.1 | + | |
| - | source | + | conda deactivate |
| </ | </ | ||
nanopore_tools_for_polishing.1531842878.txt.gz · Last modified: by 129.173.91.14
