User Tools

Site Tools


assembling_long_read_data

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
assembling_long_read_data [2017/11/09 11:34] 129.173.88.84assembling_long_read_data [2018/01/08 14:51] (current) 129.173.88.84
Line 1: Line 1:
 ====== ASSEMBLING LONG READ DATA ====== ====== ASSEMBLING LONG READ DATA ======
 +
 +Documentation by Sarah Shah
  
 When you have your porechopped reads in fastq and fasta formats, try out the following assemblers: When you have your porechopped reads in fastq and fasta formats, try out the following assemblers:
  
-Programs: ABruijn ([[https://github.com/fenderglass/ABruijn]]), Canu ([[http://canu.readthedocs.io/en/latest/quick-start.html]]), smartdenovo ([[https://github.com/ruanjue/smartdenovo]]), miniasm ([[https://github.com/lh3/miniasm]])+Programs: ABruijn ([[https://github.com/fenderglass/ABruijn]]), Flye ([[https://github.com/fenderglass/Flye]]), Canu ([[http://canu.readthedocs.io/en/latest/quick-start.html]]), smartdenovo ([[https://github.com/ruanjue/smartdenovo]]), miniasm ([[https://github.com/lh3/miniasm]])
  
 **ABruijn** **ABruijn**
Line 26: Line 28:
  
 /scratch2/software/ABruijn-1.0/bin/abruijn /path/to_your_fasta /path/to_an_output_directory <estimated coverage> --platform nano --threads 10 /scratch2/software/ABruijn-1.0/bin/abruijn /path/to_your_fasta /path/to_an_output_directory <estimated coverage> --platform nano --threads 10
 +</code>
 +
 +Abruijn has been replaced by **Flye** as of January 2018! Example usage:
 +<code>
 +#!/bin/bash
 +#$ -S /bin/bash
 +. /etc/profile
 +#$ -cwd
 +#$ -pe threaded 16
 +#$ -o leg
 +
 +source /scratch2/software/python-2.7-env/bin/activate
 +
 +unset PYTHONPATH
 +
 +flye --nano-raw Acas_merged_pc_fl.fastq --genome-size 45m --out-dir Acas_filtlongFlye --threads 16 --iterations 3 --min-overlap 3000
 </code> </code>
 **Canu** **Canu**
Line 58: Line 76:
 **smartdenovo** **smartdenovo**
  
 +Download smartdenovo to your account on Perun.
 +<code>
 +/path/to/smartdenovo/smartdenovo.pl reads.fa > reads.mak
 +make -f reads.mak
 +</code>
 +The **.utg** file is the important output.
  
 **miniasm** **miniasm**
  
-The simplest and the fastest of all the assemblers here. First, +The simplest and the fastest of all the assemblers here. First, self-map the fasta file using minimap2: 
 +<code> 
 +minimap2 -x ava-ont reads.fq reads.fq | gzip -1 > reads.paf.gz 
 +</code> 
 + 
 +Then, use miniasm: 
 +<code> 
 +miniasm -f reads.fq reads.paf.gz > reads.gfa 
 +</code> 
 + 
 +View the .gfa file using **Bandage**. You can convert the .gfa file to a fasta file by: 
 +<code> 
 +awk '/^S/{print">"$2"\n"$3}' in.gfa | fold > out.fa 
 +</code> 
 + 
 +---- 
 + 
 +The Unicycler Github page ([[https://github.com/rrwick/Unicycler]]) has nice examples of how good, alright, and terrible graphs look like.  
 + 
 +Do a quick BLAST search of your contigs and separate out the eukaryotic and bacterial contigs. Compare your assemblies using QUAST ([[http://quast.bioinf.spbau.ru/]]) and continue to **[[nanopore_tools_for_polishing|polishing and correcting]]** your chosen assembly. 
  
assembling_long_read_data.1510241695.txt.gz · Last modified: by 129.173.88.84