Differences

This shows you the differences between two versions of the page.

--- depth_and_breadth_of_coverage [2025/08/07 16:02] – 134.190.145.228
+++ depth_and_breadth_of_coverage [2025/08/08 17:18] (current) – 134.190.145.228
@@ Line 51: / Line 51: @@
 conda deactivate
-# convert to bam
+# indexing bam files
 for BAM in *.bam; do
     samtools index -@ 40 $BAM $BAM.bai
@@ Line 70: / Line 70: @@
 Above we didn't pass on any special parameters to bwa-mem2 and samtools. This means it is essentially run in default mode. It turns out that if you are interested in assessing read depth along contigs, this is not exactly how we want to execute things.
-Whenever a sequencing read has sufficient sequence similarity to multiple areas in the assembly, it will report all of these read alignments in the BAM file. The best mapping one will get the label "primary", whereas the others are considered "secondary". If you load your BAM file in some software (for example IGV) to visualize sequence depth, both primary AND secondary read mappings will contribute the read depth plot. This gives us a skewed picture of local sequencing depth levels. //This is because a single read, which is **derived from only one area** in a genome, is now contributing to the read depth plot in multiple areas.//.
+Whenever a sequencing read has sufficient sequence similarity to multiple areas in the assembly, it will report all of these read alignments in the BAM file. The best mapping one will get the label "primary", whereas the others are considered "secondary". If you load your BAM file in some software (for example IGV) to visualize sequence depth, both primary AND secondary read mappings will contribute the read depth plot. This gives us a skewed picture of local sequencing depth levels. //This is because a single read, which is **derived from only one area** in a genome, is now contributing to the read depth plot in multiple areas.//. See below:
-To prevent this from happening, we need to prevent secondary mappings to be reported in the BAM file. This can be achieved by editing our ''bwa-mem2'' / ''samtools sort'' command by adding the ''samtools view -b -F 256'' in the piped command:
+{{ primary_secondary_mapping_coverage_example.png }}
+In the picture above, the top coverage or read depth track is drawn using ALL read mappings. Primary and Secondary read mappings. In the coverage track below that, only primary read mappings are used. Suddenly we can see that at the contig edges, read depth is approximately only half what it is compared to the inner part of the contig!
+We currently have two hypotheses to explain such odd patterns:
+. There is some biological variation in genome structure within the sampled Blastocystis population. The assembler finds multiple variants of the same homologous region in the read data, and represents this area multiple times in different parts of the assembly. Here, the for example 3' edge region of contig Seq_52 is also found on some other contig (not shown).
+Short reads will equally match both areas, and thus their coverage is split between those two, while short reads from unique areas only have one clear mapping location. As a result, the edge regions that occur twice get half the short read coverage relative to the inner regions that only occur once in the assembly!
+. There is some artificial variation in the assembly, because some long reads were so enriched in errors that the assembler didn't recognize they'd overlap with other long reads that stem from the same chromosome region. And these errors somehow persisted through the polishing process.
+In any case, now that we've filtered out the secondary mappings, we can at least see that something fishy is going on here! If we didn't, we'd falsely get the impression that everything is fine and dandy.
+Thus, we need to prevent secondary mappings to be reported in the BAM file. This can be achieved by editing our ''bwa-mem2'' / ''samtools sort'' command by adding the ''samtools view -b -F 256'' in the piped command:
 <code>
 ## then align paired reads
-bwa-mem2 mem -t 40 ${ASSEMBLY%.fasta} $R1 $R2 | samtools view -b -F 256 | samtools sort --threads 40 -o ${PREFIX}-pe.sort.bam -
+bwa-mem2 mem -t 40 ${ASSEMBLY%.fasta} $R1 $R2 | samtools view -b -H -F 256 | samtools sort --threads 40 -o ${PREFIX}-pe.sort.bam -
 </code>
@@ Line 84: / Line 98: @@
 <code>
-samtools view -b -F 256 <BAM> > <NEWBAM>
+samtools view -b -H -F 256 <BAM> > <NEWBAM>
 </code>