User Tools

Site Tools


curation_of_phylogenomic_datasets

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
curation_of_phylogenomic_datasets [2023/09/19 12:21] – [Identifying xenologs] 134.190.232.90curation_of_phylogenomic_datasets [2025/03/06 11:50] (current) 134.190.145.228
Line 1: Line 1:
-====== Construction and Curation of phylogenomic datasets ======+Joran Martijn 
 + 
 +====== Curation of phylogenomic datasets ======
  
 Phylogenomic analyses attempt to use genomic data to answer phylogenetic questions. Often we're asking about the shape of a species tree. How did modern day taxa diverge over their evolutionary history? What is the deepest divergence (i.e. the root) of these taxa? Phylogenomic analyses attempt to use genomic data to answer phylogenetic questions. Often we're asking about the shape of a species tree. How did modern day taxa diverge over their evolutionary history? What is the deepest divergence (i.e. the root) of these taxa?
Line 15: Line 17:
   * if one of the pair had underwent horizontal gene transfer at some point in its evolutionary history since its divergence with the other of the pair, and the pair's common ancestor gene was present in the LCA or one of its descendants, it constitutes an **in-xenolog**. *   * if one of the pair had underwent horizontal gene transfer at some point in its evolutionary history since its divergence with the other of the pair, and the pair's common ancestor gene was present in the LCA or one of its descendants, it constitutes an **in-xenolog**. *
  
-Typically when we construct new phylogenomic datasets, we use similarity searches such as BLAST and DIAMOND and HMMER to generate sets of genes. +Typically when we construct new phylogenomic datasets, we use similarity searches such as BLAST and DIAMOND and HMMER (sometimes in combination with Markov Clustering, or MCL, algorithms) to generate sets of genes. 
  
 This is an extremly practical approach, but can be fairly rough. Genes that are truely orthologs relative to genes that were found with BLAST may be missed if similarity searches are too stringent. On the other hand, genes that are NOT true orthologs (i.e. their divergence with the genes found with BLAST //predates// the LCA) may be falsely included if similarity searches are too loose. Such false positives are typically **out-paralogs**, i.e. they diverged by a duplication in an ancestor that //predates// the LCA, or **out-xenologs** *, i.e. they were introduced into the species tree via horizontal-gene transfer from some external donor and they diverged from the other genes in a common ancestor that //predates// the LCA. This is an extremly practical approach, but can be fairly rough. Genes that are truely orthologs relative to genes that were found with BLAST may be missed if similarity searches are too stringent. On the other hand, genes that are NOT true orthologs (i.e. their divergence with the genes found with BLAST //predates// the LCA) may be falsely included if similarity searches are too loose. Such false positives are typically **out-paralogs**, i.e. they diverged by a duplication in an ancestor that //predates// the LCA, or **out-xenologs** *, i.e. they were introduced into the species tree via horizontal-gene transfer from some external donor and they diverged from the other genes in a common ancestor that //predates// the LCA.
curation_of_phylogenomic_datasets.1695136913.txt.gz · Last modified: by 134.190.232.90