depth_and_breadth_of_coverage
Differences
This shows you the differences between two versions of the page.
| Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
| depth_and_breadth_of_coverage [2025/08/07 16:02] – 134.190.145.228 | depth_and_breadth_of_coverage [2025/08/08 17:18] (current) – 134.190.145.228 | ||
|---|---|---|---|
| Line 51: | Line 51: | ||
| conda deactivate | conda deactivate | ||
| - | # convert to bam | + | # indexing |
| for BAM in *.bam; do | for BAM in *.bam; do | ||
| samtools index -@ 40 $BAM $BAM.bai | samtools index -@ 40 $BAM $BAM.bai | ||
| Line 70: | Line 70: | ||
| Above we didn't pass on any special parameters to bwa-mem2 and samtools. This means it is essentially run in default mode. It turns out that if you are interested in assessing read depth along contigs, this is not exactly how we want to execute things. | Above we didn't pass on any special parameters to bwa-mem2 and samtools. This means it is essentially run in default mode. It turns out that if you are interested in assessing read depth along contigs, this is not exactly how we want to execute things. | ||
| - | Whenever a sequencing read has sufficient sequence similarity to multiple areas in the assembly, it will report all of these read alignments in the BAM file. The best mapping one will get the label " | + | Whenever a sequencing read has sufficient sequence similarity to multiple areas in the assembly, it will report all of these read alignments in the BAM file. The best mapping one will get the label " |
| - | To prevent | + | {{ primary_secondary_mapping_coverage_example.png }} |
| + | |||
| + | In the picture above, the top coverage or read depth track is drawn using ALL read mappings. Primary and Secondary read mappings. In the coverage track below that, only primary read mappings are used. Suddenly we can see that at the contig edges, read depth is approximately only half what it is compared to the inner part of the contig! | ||
| + | |||
| + | We currently have two hypotheses to explain such odd patterns: | ||
| + | |||
| + | 1. There is some biological variation in genome structure within the sampled Blastocystis population. The assembler finds multiple variants of the same homologous region in the read data, and represents | ||
| + | |||
| + | Short reads will equally match both areas, and thus their coverage is split between those two, while short reads from unique areas only have one clear mapping location. As a result, the edge regions that occur twice get half the short read coverage relative to the inner regions that only occur once in the assembly! | ||
| + | |||
| + | 2. There is some artificial variation in the assembly, because some long reads were so enriched in errors that the assembler didn't recognize they'd overlap with other long reads that stem from the same chromosome region. And these errors somehow persisted through the polishing process. | ||
| + | |||
| + | In any case, now that we've filtered out the secondary mappings, we can at least see that something fishy is going on here! If we didn' | ||
| + | |||
| + | Thus, we need to prevent secondary mappings to be reported in the BAM file. This can be achieved by editing our '' | ||
| < | < | ||
| ## then align paired reads | ## then align paired reads | ||
| - | bwa-mem2 mem -t 40 ${ASSEMBLY%.fasta} $R1 $R2 | samtools view -b -F 256 | samtools sort --threads 40 -o ${PREFIX}-pe.sort.bam - | + | bwa-mem2 mem -t 40 ${ASSEMBLY%.fasta} $R1 $R2 | samtools view -b -H -F 256 | samtools sort --threads 40 -o ${PREFIX}-pe.sort.bam - |
| </ | </ | ||
| Line 84: | Line 98: | ||
| < | < | ||
| - | samtools view -b -F 256 <BAM> > < | + | samtools view -b -H -F 256 <BAM> > < |
| </ | </ | ||
depth_and_breadth_of_coverage.1754593325.txt.gz · Last modified: by 134.190.145.228
