User Tools

Site Tools


from_nanopore_to_gene_prediction

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
from_nanopore_to_gene_prediction [2019/07/24 11:10] – [Canu] 134.190.235.39from_nanopore_to_gene_prediction [2019/07/24 11:47] (current) 134.190.235.39
Line 1: Line 1:
 ======From Nanopore to Gene Prediction: a pathway====== ======From Nanopore to Gene Prediction: a pathway======
 +By Greg and Jon
  
  
Line 19: Line 19:
 There are a number of additional flags that can help refine the program’s behavior to your needs. There are a number of additional flags that can help refine the program’s behavior to your needs.
  
-Importantly, the fast5 reads will be physically transfered from the input folder to the new respective folders, so don't be alarmed when the inputs 'disappear'!+Importantly, the fast5 reads will be physically transferred from the input folder to the new respective folders, so don't be alarmed when the inputs 'disappear'!
  
 **Model presets:** **Model presets:**
Line 63: Line 63:
 The first flag will allow the program to classify a read based on the barcode call of either the start or end of the read, so long as they do not disagree. The first flag will allow the program to classify a read based on the barcode call of either the start or end of the read, so long as they do not disagree.
  
-The second flag will classify a read based on a start barcode, and having an end barcode is opitional. **This is the default behaviour**+The second flag will classify a read based on a start barcode, and having an end barcode is optional. **This is the default behaviour**
  
 The third flag requires the same barcode on both ends of the read in order for it to be classified. The third flag requires the same barcode on both ends of the read in order for it to be classified.
Line 112: Line 112:
  
  
-Nanopore technology generates a truly absurd amount of data files, which can be unwieldy to use, both in the sense of day to day work, as well as for programs like Terminal to handle. Therefore, we will use another script to combine fast5 files into multi fast5 files. This is the default method currently, however programs like Deepbinner work with the individual files, and most of our older datasets are still in single file format, which is important to keep in mind.+Nanopore technology generates a truly absurd amount of data files, which can be unwieldy to use, both in the sense of day to day work, as well as for programs like Terminal to handle. Therefore, we will use another script to combine fast5 files into multi fast5 files. This is the default method currently, however programs like Deepbinner work with the individual files, and most of our older data sets are still in single file format, which is important to keep in mind.
  
    <code>#!/bin/bash    <code>#!/bin/bash
Line 134: Line 134:
 <code>--recursive</code> will run the shell on both the files in the directory you specify, as well as in any directories that are inside the directory you specified. IE /scratch3/yourname/MINION/deepbinner/barcode03/another_file_level <code>--recursive</code> will run the shell on both the files in the directory you specify, as well as in any directories that are inside the directory you specified. IE /scratch3/yourname/MINION/deepbinner/barcode03/another_file_level
  
-Additionally, there is another command, multi_to_single_fast5 that can be run using the ont-fast5-api program. As the name implies, it does the reverse process, breaking apart a single multi fast5 file into individiual fast5 files. Deepbinner will do this for you if it detects a multi-fast5 file, however it is always good to know how to do it by hand as well.+Additionally, there is another command, multi_to_single_fast5 that can be run using the ont-fast5-api program. As the name implies, it does the reverse process, breaking apart a single multi fast5 file into individual fast5 files. Deepbinner will do this for you if it detects a multi-fast5 file, however it is always good to know how to do it by hand as well.
  
 I have also created a script in /home/gseaton/public_scripts that combines both deepbinner and single-to-multi-fast5 called deepbinner-combopack. This will launch both deepbinner and combine the files into single fast5 files. I have also created a script in /home/gseaton/public_scripts that combines both deepbinner and single-to-multi-fast5 called deepbinner-combopack. This will launch both deepbinner and combine the files into single fast5 files.
Line 181: Line 181:
 Before this step, however, you should merge all the files together into one single .fastq file for each barcode directory created by guppy, using the 'cat' command. Before this step, however, you should merge all the files together into one single .fastq file for each barcode directory created by guppy, using the 'cat' command.
  
-On the commandline, write:+On the command line, write:
 <code> cat *.fastq > merged_reads.fq</code> <code> cat *.fastq > merged_reads.fq</code>
 This will take all files ending in the fastq suffix and merge them into a single file. You can name the output however you like. This is the file you will use for porechop. This will take all files ending in the fastq suffix and merge them into a single file. You can name the output however you like. This is the file you will use for porechop.
Line 206: Line 206:
 ===== Filtlong ===== ===== Filtlong =====
  
-This step is important if you have lots and lots of data. Here, filtlong attempts to take the 'best' of the given data set, and creates a file containing that for later use. The 'best'ness can be determined by the researcher with different flags. For example, if one wanted to take only the most accurate reads, this program would do that for you.+This step is important if you have lots and lots of data. Here, filtlong attempts to take the 'best' of the given data set, and creates a file containing that for later use. The 'bestness' can be determined by the researcher with different flags. For example, if one wanted to take only the most accurate reads, this program would do that for you.
  
 In our lab, we typically use filtlong to obtain the longest reads. In our lab, we typically use filtlong to obtain the longest reads.
Line 249: Line 249:
    <input file. Should be trimmed, and if necessary, filtered using filtlong> \    <input file. Should be trimmed, and if necessary, filtered using filtlong> \
    --meta \    --meta \
-   --genome-size <estimate it in the format 20m> --out-dir <out directorary> --threads 30 --iterations 2+   --genome-size <estimate it in the format 20m> --out-dir <out directory> --threads 30 --iterations 2
        
    conda deactivate</code>    conda deactivate</code>
from_nanopore_to_gene_prediction.1563977458.txt.gz · Last modified: by 134.190.235.39