Differences

This shows you the differences between two versions of the page.

--- ploidy_analysis_using_ploidyngs [2022/06/24 15:30] – 134.190.232.29
+++ ploidy_analysis_using_ploidyngs [2022/06/24 16:01] (current) – [UPDATE (June 2022)] 134.190.232.29
@@ Line 45: / Line 45: @@
 To see the sorted content of the Third and Fourth lines
-grep Fourth myoutput.tab |cut -f4 |sort -n\\
+<code>
-grep Third myoutput.tab |cut -f4 |sort -n
+$ grep Fourth myoutput.tab | cut -f4 | sort -n
+$ grep Third  myoutput.tab | cut -f4 | sort -n
+</code>
+To remove lines containing "Third" and "Fourth" from the .tab file:
+<code>
+$ grep -vP "Third|Fourth" myoutput.tab > mytabforhistogram.tab
+</code>
-To remove Third and Fourth lines from the tab
+Once satisfied with the tab you need to make a PDF (it is suppose to generate it automatically
+but it doesn't)
-grep -v Fourth  myoutput.tab |grep -v Third > mytabforhistogram.tab
+On the command line execute
+<code>
+$ Rscript --vanilla /scratch2/software/ploidyNGS/ploidyNGS_generateHistogram.R mytabforhistogram.tab
+</code>
-Once satisfied with the tab you need to make a pdf (it is suppose to generate it automatically
+This will generate a PDF called ''NA'' which you can transfer to your home computer to look at.
-but it doesn't)
+==== UPDATE (June 2022) ====
+I (Joran) have updated the ''ploidyNGS.py'' and ''ploidyNGS_generateHistogram.R'' scripts. They are available in Perun under the new names ''ploidyNGS_minCov.py'' and ''ploidyNGS_generateHistogram_minCov.R'', respectively.
+The main new feature is the ''--min_cov'' option, which is set to 0 by default. It allows the user to ignore positions that have a lower coverage than the specified value in its allele frequency calculations. This can be useful if you have mysterious 50%/50% or 33%/67% peaks in your histogram which is otherwise strongly hinting haploidy. In my experience these mysterious peaks can stem from low coverage regions with sequencing errors. For example, if you have a position that is covered twice, and one of them is an error, it will yield a 50%/50% peak even though there is no diploidy going on here.
+Other updates:
+  * Positions reported in the .tab file are now 1-indexed (i.e. starting at 1) instead of 0-indexed (i.e. starting at 0)
+  * If you have ''/scratch2/software/ploidyNGS/'' in your ''$PATH'', the updated Python script will properly execute the histogram R script without any issues. The PDF file will have a proper file name ending in .pdf, instead of the mysterious ''NA''
+  * The axis titles in the histogram will be a bit easier to understand
+  * The script will now report the total number of heteromorphic positions per contig and over all contigs
+To call it, use a Perun submission script like this one
+<code>
+#!/bin/bash
+#$ -S /bin/bash
+. /etc/profile
+#$ -cwd
+#$ -q 256G-batch
+source activate ploidyNGS-dependencies
+export PATH="/scratch2/software/ploidyNGS:$PATH"
+BAMFILE='dnaseq_vs_canu_ergo_assembly.sorted.bam'
+MINCOV=10
+MAXDEPTH=2000
+OUTBASE='dnaseq_vs_canu_ergo_assembly'
+ploidyNGS_minCov.py \
+    --out $OUTBASE \
+    --bam $BAMFILE \
+    --min_cov $MINCOV \
+    --max_depth $MAXDEPTH
-On the commandline
+conda deactivate
+</code>
-   Rscript --vanilla /scratch2/software/ploidyNGS/ploidyNGS_generateHistogram.R mytabforhistogram.tab
+A more elaborate script is available on the GitHub repository of the Roger lab: https://github.com/RogerLab/gospel_of_andrew.
-This will generate a pdf called NA which you can transfer to your home computer to look at.
+The updated scripts are also available on the original GitHub repository https://github.com/diriano/ploidyNGS under the original script names ''ploidyNGS.py'' and ''ploidyNGS_generateHistogram.R''