User Tools

Site Tools


ploidy_analysis_using_ploidyngs

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
ploidy_analysis_using_ploidyngs [2022/06/24 15:33] 134.190.232.29ploidy_analysis_using_ploidyngs [2022/06/24 16:01] (current) – [UPDATE (June 2022)] 134.190.232.29
Line 46: Line 46:
  
 <code> <code>
-grep Fourth myoutput.tab | cut -f4 | sort -n +grep Fourth myoutput.tab | cut -f4 | sort -n 
-grep Third  myoutput.tab | cut -f4 | sort -n+grep Third  myoutput.tab | cut -f4 | sort -n
 </code> </code>
  
 To remove lines containing "Third" and "Fourth" from the .tab file: To remove lines containing "Third" and "Fourth" from the .tab file:
 <code> <code>
-grep -vP "Third|Fourth" myoutput.tab > mytabforhistogram.tab+grep -vP "Third|Fourth" myoutput.tab > mytabforhistogram.tab
 </code> </code>
  
-Once satisfied with the tab you need to make a pdf (it is suppose to generate it automatically+Once satisfied with the tab you need to make a PDF (it is suppose to generate it automatically
 but it doesn't) but it doesn't)
  
Line 65: Line 65:
  
 This will generate a PDF called ''NA'' which you can transfer to your home computer to look at. This will generate a PDF called ''NA'' which you can transfer to your home computer to look at.
 +
 +==== UPDATE (June 2022) ====
 +
 +I (Joran) have updated the ''ploidyNGS.py'' and ''ploidyNGS_generateHistogram.R'' scripts. They are available in Perun under the new names ''ploidyNGS_minCov.py'' and ''ploidyNGS_generateHistogram_minCov.R'', respectively.
 +
 +The main new feature is the ''--min_cov'' option, which is set to 0 by default. It allows the user to ignore positions that have a lower coverage than the specified value in its allele frequency calculations. This can be useful if you have mysterious 50%/50% or 33%/67% peaks in your histogram which is otherwise strongly hinting haploidy. In my experience these mysterious peaks can stem from low coverage regions with sequencing errors. For example, if you have a position that is covered twice, and one of them is an error, it will yield a 50%/50% peak even though there is no diploidy going on here.
 +
 +Other updates:
 +  * Positions reported in the .tab file are now 1-indexed (i.e. starting at 1) instead of 0-indexed (i.e. starting at 0)
 +  * If you have ''/scratch2/software/ploidyNGS/'' in your ''$PATH'', the updated Python script will properly execute the histogram R script without any issues. The PDF file will have a proper file name ending in .pdf, instead of the mysterious ''NA''
 +  * The axis titles in the histogram will be a bit easier to understand
 +  * The script will now report the total number of heteromorphic positions per contig and over all contigs
 +
 +To call it, use a Perun submission script like this one
 +
 +<code>
 +#!/bin/bash
 +#$ -S /bin/bash
 +. /etc/profile
 +#$ -cwd
 +#$ -q 256G-batch
 +
 +source activate ploidyNGS-dependencies
 +export PATH="/scratch2/software/ploidyNGS:$PATH"
 +
 +BAMFILE='dnaseq_vs_canu_ergo_assembly.sorted.bam'
 +MINCOV=10
 +MAXDEPTH=2000
 +OUTBASE='dnaseq_vs_canu_ergo_assembly'
 +
 +ploidyNGS_minCov.py \
 +    --out $OUTBASE \
 +    --bam $BAMFILE \
 +    --min_cov $MINCOV \
 +    --max_depth $MAXDEPTH
 +
 +conda deactivate
 +</code>
 +
 +A more elaborate script is available on the GitHub repository of the Roger lab: https://github.com/RogerLab/gospel_of_andrew.
 +
 +The updated scripts are also available on the original GitHub repository https://github.com/diriano/ploidyNGS under the original script names ''ploidyNGS.py'' and ''ploidyNGS_generateHistogram.R''
ploidy_analysis_using_ploidyngs.1656095611.txt.gz · Last modified: by 134.190.232.29 · Currently locked by: 216.73.216.59