cgal
Differences
This shows you the differences between two versions of the page.
| Next revision | Previous revision | ||
| cgal [2018/11/01 14:42] – created 129.173.88.84 | cgal [2019/01/14 22:13] (current) – 122.223.74.246 | ||
|---|---|---|---|
| Line 1: | Line 1: | ||
| **CGAL: Computing Genome Assembly Likelihoods** | **CGAL: Computing Genome Assembly Likelihoods** | ||
| + | |||
| Documentation by Sarah Shah | Documentation by Sarah Shah | ||
| Line 6: | Line 7: | ||
| Useful for comparing results of different genome assemblies. CGAL " | Useful for comparing results of different genome assemblies. CGAL " | ||
| - | 1) Map DNA short reads to your genome using bowtie2. | + | 1) Map DNA short reads to your genome using bowtie2: |
| + | < | ||
| + | # | ||
| + | #$ -S /bin/bash | ||
| + | . / | ||
| + | #$ -cwd | ||
| + | #$ -pe threaded 10 | ||
| + | cd / | ||
| + | source activate bowtie2 | ||
| + | input=HiGC_canumeta.fasta | ||
| + | bowtie2 -a --local --no-mixed --phred33 -q --threads 10 -x $input \ | ||
| + | -1 / | ||
| + | source deactivate | ||
| + | </ | ||
| + | |||
| + | In the above shell I used the flag -a to map all possible reads, but if you find that it is taking too long or too much memory, remove this flag. The point is to use the same method to compare the different versions of your genome. Additionally, | ||
| + | < | ||
| + | seqtk sample -s100 readfile1.fq 1000000 > readfile1_s100_1mil.fq | ||
| + | #This subsets random 1 million reads. The -s100 flag is the sampling seed, make sure you use the same seed for the second set of reads. | ||
| + | </ | ||
| + | |||
| + | 2) Convert the sam file from Step 1) into a CGAL format: | ||
| + | < | ||
| + | # | ||
| + | #$ -S /bin/bash | ||
| + | . / | ||
| + | #$ -cwd | ||
| + | #$ -pe threaded 1 | ||
| + | input=HiGC_canumeta.fasta | ||
| + | / | ||
| + | #450 is the maximum expected insert size. | ||
| + | </ | ||
| + | |||
| + | KEEP ALL output files from this step! | ||
| + | |||
| + | 3) Get the unmapped reads information by: | ||
| + | < | ||
| + | # | ||
| + | #$ -S /bin/bash | ||
| + | . / | ||
| + | #$ -cwd | ||
| + | #$ -pe threaded 8 | ||
| + | input=HiGC_canumeta.fasta | ||
| + | / | ||
| + | #10000 is a random subset | ||
| + | </ | ||
| + | |||
| + | KEEP ALL output files from this step! | ||
| + | |||
| + | 4) Run the actual CGAL command: | ||
| + | < | ||
| + | # | ||
| + | #$ -S /bin/bash | ||
| + | . / | ||
| + | #$ -cwd | ||
| + | #$ -pe threaded 1 | ||
| + | input=HiGC_canumeta.fasta | ||
| + | / | ||
| + | </ | ||
| + | |||
| + | The .sh.o* file from above contains the likelihood value. The out.txt contains more information such as total number of reads, number of mapped and unmapped reads. | ||
cgal.1541094165.txt.gz · Last modified: by 129.173.88.84
