This is an old revision of the document!
https://github.com/diriano/ploidyNGS
Note that you will need a sorted bam file to run ploidyNGS
Script for running a ploidyNGS analysis on perun
#!/bin/bash\\
#$ -S /bin/bash\\
. /etc/profile\\
#$ -cwd
source activate ploidyNGS-preq
/scratch2/software/ploidyNGS/ploidyNGS.py \
-o outputfilename \
-b nameofsortedbamfile.bam
conda deactivate
Perun will immediately generate error messages that look dire if not fatal
“fatal: not a git repository (or any parent up to mount point /misc)
Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).”
This is the result of a python function defined by ploidyNGS to try to identify which version of the script you are running, but it typically doesn't work. You can safely ignore these and the program continues to run
One of the options you will be interested in is -d or –max_depth, which is set to 100 by default.
Max number of reads kepth at each position in the reference genome (integer, default: 100)
This sets the maximum read depth ploidyNGS.py will read until it moves on to the next alignment position. For example, if you have it set to a 100, but at position 123 you have actually 500 reads mapped, it will only load the basecalls of the first 100 reads to calculate allele frequencies from. I (Joran) found that it is pretty safe to increase this value to say 2000, to use as much information as possible.
Examine the tab output file. In particular, if the lines for Third and Fourth contain almost all 0.00 then they should be removed from the tab file, otherwise they mess up the histogram.
To see the sorted content of the Third and Fourth lines
grep Fourth myoutput.tab | cut -f4 | sort -n grep Third myoutput.tab | cut -f4 | sort -n
To remove lines containing “Third” and “Fourth” from the .tab file:
grep -vP "Third|Fourth" myoutput.tab > mytabforhistogram.tab
Once satisfied with the tab you need to make a pdf (it is suppose to generate it automatically but it doesn't)
On the command line execute
$ Rscript --vanilla /scratch2/software/ploidyNGS/ploidyNGS_generateHistogram.R mytabforhistogram.tab
This will generate a PDF called NA which you can transfer to your home computer to look at.
