bioinformatics_tools
Differences
This shows you the differences between two versions of the page.
| Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
| bioinformatics_tools [2021/11/24 11:33] – 134.190.232.106 | bioinformatics_tools [2022/08/21 10:35] (current) – 173.212.112.187 | ||
|---|---|---|---|
| Line 10: | Line 10: | ||
| - HSDFinder: a BLAST-based strategy for identifying highly similar duplicated genes in eukaryotic genomes (2021) | - HSDFinder: a BLAST-based strategy for identifying highly similar duplicated genes in eukaryotic genomes (2021) | ||
| - HSDatabase – Identification and functional annotation of highly similar duplicated genes in eukaryotic genomes(2022) | - HSDatabase – Identification and functional annotation of highly similar duplicated genes in eukaryotic genomes(2022) | ||
| - | - Comprehensive analysis | + | - An overview |
| + | - HSDicipher: A downstream anaylysis package of hsdfiner and hsdatabase(2023) | ||
| + | |||
| + | So far, the first step via designing the HSDFinder tool has been reached after so many trails, the selected | ||
| - | So far, the first step | ||
| **How to prepare the species files?** | **How to prepare the species files?** | ||
| + | |||
| + | As for the HSDatabase itself, we have offered the species request option which allows users to submit the species within users' interest. | ||
| **How to document that data in HSDatabase? | **How to document that data in HSDatabase? | ||
| - | <Last updated by Xi Zhang on Oct 6th, | + | There are several necessary files to be documented into the database. |
| + | Request a new species | ||
| + | If you wish that a new species would be added in HSDatabase, please use the following form. The new species have to meet the following requirement: | ||
| + | |||
| + | * Peptide sequence (https:// | ||
| + | * Blast all-against-all file | ||
| + | * InterProscan file | ||
| + | * KEGG file | ||
| + | |||
| + | The HSDatabase is based on the data provided by the NCBI FTP site. If your species is stored in the FTP site, it will be a valuable help to provide us the FTP links to the peptide database. At least, a link to the species information is required. | ||
| + | |||
| + | |||
| + | **How to analyze the data from HSDFinder? | ||
| + | |||
| + | Although there is no golden rule to distinguish partial duplicates from more complete ones, it is believed that the relative complete duplicates turn to have at least less than 50% amino acid length difference and same number and function of conserved domain. | ||
| + | |||
| + | {{:: | ||
| + | |||
| + | * HSD_ Statistics.py (https:// | ||
| + | * True HSDs: are those HSDs with gene copies minimum length occupied more than half of the maximum length and have the same function and number of Pfam domains. | ||
| + | * Incomplete HSDs: are those having different number of conserved domains (Pfam domains) or gene copies encoding the hypothetical proteins have the varied aa length more than 50% of each other. | ||
| + | |||
| + | * HSD_Categories.py is to calculate the gene copies within each group, i.e., 2-group is the HSD group only has two gene copies. | ||
| + | * HSD_add_on.py is to merge a series of combo thresholds based on the formula:E + (D + (C + (B +A))) | ||
| + | * A = 90%_100aa+(90%_70aa+(90%_50aa+(90%_30aa+90%_10aa))) | ||
| + | * B = 80%_100aa+(80%_70aa+(80%_50aa+(80%_30aa+80%_10aa))) | ||
| + | * C = 70%_100aa+(70%_70aa+(70%_50aa+(70%_30aa+70%_10aa))) | ||
| + | * D = 60%_100aa+(60%_70aa+(60%_50aa+(60%_30aa+60%_10aa))) | ||
| + | * E = 50%_100aa+(50%_70aa+(50%_50aa+(50%_30aa+50%_10aa))) | ||
| + | |||
| + | <Last updated by Xi Zhang on Oct 6th, | ||
| + | <Last updated by Xi Zhang on May 1st, | ||
bioinformatics_tools.1637768000.txt.gz · Last modified: by 134.190.232.106
