Differences

This shows you the differences between two versions of the page.

--- bioinformatics_tools [2022/01/26 15:59] – 134.190.232.106
+++ bioinformatics_tools [2022/08/21 10:35] (current) – 173.212.112.187
@@ Line 10: / Line 10: @@
   - HSDFinder: a BLAST-based strategy for identifying highly similar duplicated genes in eukaryotic genomes (2021)
   - HSDatabase – Identification and functional annotation of highly similar duplicated genes in eukaryotic genomes(2022)
-  - Comprehensive analysis of gene duplications in eukaryotic genomes unravelling the convergent adaptive and nonadaptive evolution (2022)
+  - An overview of online resources for gene duplication detection within species: Mini review(2022)
+  - HSDicipher: A downstream anaylysis package of hsdfiner and hsdatabase(2023)
 So far, the first step via designing the HSDFinder tool has been reached after so many trails, the selected eukaryotic species have been collected into the HSDatabase. The comprehensive analysis is on the way.
@@ Line 25: / Line 26: @@
 If you wish that a new species would be added in HSDatabase, please use the following form. The new species have to meet the following requirement:
-* Peptide sequence (https://github.com/zx0223winner/isoform2one)
+  * Peptide sequence (https://github.com/zx0223winner/isoform2one)
-* Blast all-against-all file
+  * Blast all-against-all file
-* InterProscan file
+  * InterProscan file
-* KEGG file
+  * KEGG file
 The HSDatabase is based on the data provided by the NCBI FTP site. If your species is stored in the FTP site, it will be a valuable help to provide us the FTP links to the peptide database. At least, a link to the species information is required.
-<Last updated by Xi Zhang on Oct 6th,2021> Upcoming
+**How to analyze the data from HSDFinder?** HSDicipher https://github.com/zx0223winner/HSDicipher
+Although there is no golden rule to distinguish partial duplicates from more complete ones, it is believed that the relative complete duplicates turn to have at least less than 50% amino acid length difference and same number and function of conserved domain.
+{{::hsd_calculator.png?400|}}
+  * HSD_ Statistics.py (https://github.com/zx0223winner/HSDFinder/tree/master/Tutorial) is to calculate the number of HSDs, number of gene copies, non-redundant gene copies, capturing value, performance score,
+     * True HSDs: are those HSDs with gene copies minimum length occupied more than half of the maximum length and have the same function and number of Pfam domains.
+     * Incomplete HSDs: are those having different number of conserved domains (Pfam domains) or gene copies encoding the hypothetical proteins have the varied aa length more than 50% of each other.
+  * HSD_Categories.py is to calculate the gene copies within each group, i.e., 2-group is the HSD group only has two gene copies.
+  * HSD_add_on.py is to merge a series of combo thresholds based on the formula:E + (D + (C + (B +A)))
+     * A = 90%_100aa+(90%_70aa+(90%_50aa+(90%_30aa+90%_10aa)))
+     * B = 80%_100aa+(80%_70aa+(80%_50aa+(80%_30aa+80%_10aa)))
+     * C = 70%_100aa+(70%_70aa+(70%_50aa+(70%_30aa+70%_10aa)))
+     * D = 60%_100aa+(60%_70aa+(60%_50aa+(60%_30aa+60%_10aa)))
+     * E = 50%_100aa+(50%_70aa+(50%_50aa+(50%_30aa+50%_10aa)))
+<Last updated by Xi Zhang on Oct 6th,2021>
+<Last updated by Xi Zhang on May 1st,2022>