User Tools

Site Tools


ploidy_analysis_using_ploidyngs

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
ploidy_analysis_using_ploidyngs [2022/06/24 15:30] 134.190.232.29ploidy_analysis_using_ploidyngs [2022/06/24 16:01] (current) – [UPDATE (June 2022)] 134.190.232.29
Line 45: Line 45:
 To see the sorted content of the Third and Fourth lines To see the sorted content of the Third and Fourth lines
  
-grep Fourth myoutput.tab |cut -f4 |sort -n\\ +<code> 
-grep Third myoutput.tab |cut -f4 |sort -n+grep Fourth myoutput.tab | cut -f4 | sort -n 
 +grep Third  myoutput.tab | cut -f4 | sort -n 
 +</code>
  
 +To remove lines containing "Third" and "Fourth" from the .tab file:
 +<code>
 +$ grep -vP "Third|Fourth" myoutput.tab > mytabforhistogram.tab
 +</code>
  
-To remove Third and Fourth lines from the tab+Once satisfied with the tab you need to make a PDF (it is suppose to generate it automatically 
 +but it doesn't)
  
-grep -v Fourth  myoutput.tab |grep -v Third > mytabforhistogram.tab+On the command line execute
  
 +<code>
 +$ Rscript --vanilla /scratch2/software/ploidyNGS/ploidyNGS_generateHistogram.R mytabforhistogram.tab
 +</code>
  
-Once satisfied with the tab you need to make a pdf (it is suppose to generate it automatically +This will generate a PDF called ''NA'' which you can transfer to your home computer to look at. 
-but it doesn't)+ 
 +==== UPDATE (June 2022) ==== 
 + 
 +I (Joran) have updated the ''ploidyNGS.py'' and ''ploidyNGS_generateHistogram.R'' scripts. They are available in Perun under the new names ''ploidyNGS_minCov.py'' and ''ploidyNGS_generateHistogram_minCov.R'', respectively. 
 + 
 +The main new feature is the ''--min_cov'' option, which is set to 0 by default. It allows the user to ignore positions that have a lower coverage than the specified value in its allele frequency calculations. This can be useful if you have mysterious 50%/50% or 33%/67% peaks in your histogram which is otherwise strongly hinting haploidy. In my experience these mysterious peaks can stem from low coverage regions with sequencing errors. For example, if you have a position that is covered twice, and one of them is an error, it will yield a 50%/50% peak even though there is no diploidy going on here. 
 + 
 +Other updates: 
 +  * Positions reported in the .tab file are now 1-indexed (i.e. starting at 1) instead of 0-indexed (i.e. starting at 0) 
 +  * If you have ''/scratch2/software/ploidyNGS/'' in your ''$PATH'', the updated Python script will properly execute the histogram R script without any issues. The PDF file will have proper file name ending in .pdf, instead of the mysterious ''NA'' 
 +  * The axis titles in the histogram will be a bit easier to understand 
 +  * The script will now report the total number of heteromorphic positions per contig and over all contigs 
 + 
 +To call it, use a Perun submission script like this one 
 + 
 +<code> 
 +#!/bin/bash 
 +#$ -S /bin/bash 
 +. /etc/profile 
 +#$ -cwd 
 +#$ -q 256G-batch 
 + 
 +source activate ploidyNGS-dependencies 
 +export PATH="/scratch2/software/ploidyNGS:$PATH" 
 + 
 +BAMFILE='dnaseq_vs_canu_ergo_assembly.sorted.bam' 
 +MINCOV=10 
 +MAXDEPTH=2000 
 +OUTBASE='dnaseq_vs_canu_ergo_assembly' 
 + 
 +ploidyNGS_minCov.py \ 
 +    --out $OUTBASE \ 
 +    --bam $BAMFILE \ 
 +    --min_cov $MINCOV \ 
 +    --max_depth $MAXDEPTH
  
-On the commandline+conda deactivate 
 +</code>
  
-   Rscript --vanilla /scratch2/software/ploidyNGS/ploidyNGS_generateHistogram.R mytabforhistogram.tab+A more elaborate script is available on the GitHub repository of the Roger lab: https://github.com/RogerLab/gospel_of_andrew.
  
-This will generate a pdf called NA which you can transfer to your home computer to look at.+The updated scripts are also available on the original GitHub repository https://github.com/diriano/ploidyNGS under the original script names ''ploidyNGS.py'' and ''ploidyNGS_generateHistogram.R''
ploidy_analysis_using_ploidyngs.1656095436.txt.gz · Last modified: by 134.190.232.29