Differences

This shows you the differences between two versions of the page.

--- nanopore_tools_for_polishing [2019/01/18 08:21] – 36.2.110.248
+++ nanopore_tools_for_polishing [2024/08/07 13:01] (current) – 134.190.232.164
@@ Line 1: / Line 1: @@
 ====== Polishing your MinION assembly ======
 Documentation by Jon Jerlström Hultqvist and Shelby Williams
+(updates by Joran Martijn)
 **Be aware that some scripts and commands might not be working any longer on Perun due to the switch to the new conda-environment system. Sections will be progressively updated to reflect this.**
@@ Line 38: / Line 39: @@
 minimap2 -t 8 $input interleavedshortreads.fq > temporary.paf
 echo "minimap2 done"
-racon -u -e 0.1 -w 5000 -q 1 -t 8 interleavedshortreads.fq temporary.paf $input >$output
+racon -u -e 0.1 -w 500 -q 1 -t 8 interleavedshortreads.fq temporary.paf $input >$output
 echo "racon done"
 rm temporary.paf
@@ Line 48: / Line 49: @@
 -S $output.sam
 echo "Bowtie done"
-source deactivate
+conda deactivate
 samtools view -F 4 -bS $output.sam |samtools sort > $output.sorted.bam
 samtools index $output.sorted.bam > $output.sorted.bam.bai
@@ Line 81: / Line 82: @@
 First, make a BWA index of the assembly you wish to map onto by using the following command:
 <code>
 bwa index assembly_to_polish.fasta
 </code>
 Next, use the meteora_bwa.sh script to map the short reads onto your assembly. This will create a sorted.bam file. In this example, two paired-end read files will be mapped:
 <code>
-bwa mem -t 16 assembly_to_polish.fasta /scratch2/user/path/to/trimmedreads_1_PairNtrim.fastq.gz /scratch2/user/path/to/trimmedreads_2_PairNtrim.fastq.gz | samtools view -Sb - | samtools sort >  piloninput.sorted.bam
+bwa mem \
+    -t 16 \
+    assembly_to_polish.fasta \
+    /scratch2/user/path/to/trimmedreads_1_PairNtrim.fastq.gz \
+    /scratch2/user/path/to/trimmedreads_2_PairNtrim.fastq.gz | \
+        samtools sort --threads 16 -o piloninput.sorted.bam
 </code>
+UPDATE: You can now run bwa-mem2, which is an optimized version of bwa mem. It generates the exact same output, but is 2-4x faster:
+<code>
+bwa-mem2 mem \
+    -t 16 \
+    assembly_to_polish.fasta \
+    /scratch2/user/path/to/trimmedreads_1_PairNtrim.fastq.gz \
+    /scratch2/user/path/to/trimmedreads_2_PairNtrim.fastq.gz | \
+        samtools sort --threads 16 -o piloninput.sorted.bam
+</code>
 Once this is finished, use Pilon.sh to make changes in the assembly and generate a new consensus sequence. Pilon.sh can be formatted like so:
 <code>
-java -Xmx16G -jar /scratch2/software/pilon/pilon-1.22.jar --genome assembly_to_polish.fasta --frags piloninput.sorted.bam --output P2x --outdir Pilon2x --threads 16
+java -Xmx16G -jar /scratch2/software/pilon/pilon-1.22.jar \
+    --genome assembly_to_polish.fasta \
+    --frags piloninput.sorted.bam \
+    --output P2x \
+    --outdir Pilon2x \
+    --threads 16
 </code>
+UPDATE: The --threads option is as of v1.24 no longer maintained. It seems Pilon doesn't use more than 200-300% CPU (i.e. 3 threads) at most, so setting --threads to 4 orso should be sufficient.
 You may run into an error where Pilon does not recognize the bam file created from the previous step as being indexed. To fix this, run:
 <code>
 samtools index /path/to_bam_file
 </code>
 This will return a .bam.bai file. This file needs to be in the same folder as Pilon.sh, but does not need to be placed in the script.
@@ Line 102: / Line 134: @@
 Shell script:
 {{ :unicyclersh.docx |}}
+<code>
+#!/bin/bash
+#$ -S /bin/bash
+. /etc/profile
+#$ -cwd
+#$ -pe threaded 16
+#cd /scratch2/jon/MinION/BMAN/assemblies/Unicycler_polish/
+echo "Starting"
+unset PYTHONPATH
+export PATH=/scratch2/software/gcc-6.3.0/bin:/scratch2/software/Python-3.6.0/bin:$PATH
+export LD_LIBRARY_PATH=/scratch2/software/gcc-6.3.0/lib64:/scratch2/software/Python-3.6.0/lib:$LD_LIBRARY_PATH
+/scratch2/software/Python-3.6.0/bin/unicycler_polish -1 /scratch2/shelbyw/RCL_Unicycler/RCL_1_PairNtrim.fq -2 /scratch2/shelbyw/RCL_Unicycler/RCL_2_PairNtrim.fq --long_reads RCL_MinION.CutAdapt75.3000.chop.fastq.gz -a RCL_unclean_AB_assembly_fix_Racon2_Pilon3.fasta --pilon=/scratch2/software/pilon/pilon-1.22.jar --samtools=/opt/perun/bin/samtools --threads 16
+echo "Done!"
+</code>
 Formatting:
@@ Line 139: / Line 193: @@
 If illumina reads are available it might be possible to skip nanopolish altogether and go directly to Pilon polishing after Racon. This has been exemplified in the Solanum penellii pre-print where nanopolish simply was not feasible.
-Location: /scratch2/software/nanopolish
+Location: /scratch2/software/anaconda/envs/nanopolish-0.12/bin
 Scripts:
@@ Line 158: / Line 212: @@
 nanopolish merge - merges the pieces into new a new consensus.
-**Updated Nanopolish protocol (as of July 17 2018):**
+**Updated Nanopolish protocol (as of March 8 2020):**
 First, index your unchopped, raw reads file.
 Use the sequencing_summary.txt produced by albacore during basecalling to speed up this step significantly. If you have several sequencing_summary.txt files these can be placed in a fof-file with the  path to the txt-file and called by -f. This also works in case of a single-file.:
+for the following step **DO NOT** **use more than 1** thread because the program is not threaded!
 <code>
 #!/bin/bash
@@ Line 167: / Line 222: @@
 . /etc/profile
 #$ -cwd
-#$ -pe threaded 4
+#$ -pe threaded 1
 cd $PWD
-export PATH=/scratch2/software/anaconda/bin:$PATH
+fast5path=/scratch2/path2/fast5/
-source activate nanopolish-python3
+fastq=/path2fastqlongreads.fastq
+seqsummary=/path2tosequencing_summary.txt
-/scratch2/software/anaconda/envs/nanopolish-python3/bin/nanopolish index \
+source activate nanopolish-0.13.2
--d /path/to/fast5/directory/ \
+export PATH=/scratch2/software/anaconda/envs/nanopolish-0.13.2/bin:$PATH
--f summary_files.fof \
+nanopolish index -d $fast5path -s $seqsummary $fastq
-/path/to/reads.fastq
+conda deactivate
-source deactivate
 </code>
@@ Line 204: / Line 257: @@
 samtools index reads.sorted.bam
-source deactivate
+conda deactivate
 </code>
@@ Line 224: / Line 277: @@
 cd $PWD
-export PATH=/scratch2/software/anaconda/bin:$PATH
+export PATH=/scratch2/software/anaconda/envs/nanopolish-0.12/bin:$PATH
-source activate nanopolish-python3
+source activate nanopolish-0.12
-python /scratch2/software/anaconda/envs/nanopolish-python3/bin/nanopolish_makerange.py \
+nanopolish_makerange.py reference.fasta | parallel --results nanopolish.results -P 20 \
-reference.fasta | parallel --results nanopolish.results -P 20 \
+nanopolish variants --consensus -o polished.{1}.fa -w {1} \
-/scratch2/software/anaconda/envs/nanopolish-python3/bin/nanopolish variants --consensus polished.{1}.fa -w {1} \
+-r /path/to/reads.fastq -b reads.sorted.bam -g reference.fasta -t 10 --min-candidate-frequency 0.1
--r /path/to/reads.fastq \
--b reads.sorted.bam -g reference.fasta -t 10 --min-candidate-frequency 0.1
-source deactivate
+conda deactivate
 </code>