phylogeny_protocol6
Differences
This shows you the differences between two versions of the page.
| Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
| phylogeny_protocol6 [2022/02/07 15:15] – 134.190.232.106 | phylogeny_protocol6 [2022/02/24 10:23] (current) – 38.20.199.40 | ||
|---|---|---|---|
| Line 1: | Line 1: | ||
| - | After reading the tutorial from https:// | + | ** How to use PhyloToL |
| + | **Introduction** | ||
| + | After reading the tutorial from https:// | ||
| - | Python must be python3 | + | To make the PhyloToL pipeline in a clear, concise and easy-to-follow manner, we prepared a Protocol for using PhyloToL to explore genome evolution of diverse eukaryotes. We have also guided users with a sample data set and results for each step, which can help users reproduce the results and figures with their own data. |
| - | Errors 1: | + | **PhyloToL Backgrounds** |
| + | Explore genome evolution of diverse eukaryotes-PhyloToL (Phylogenomic Tree of Life) | ||
| + | PhyloToL is a phylogenomic pipeline aimed at exploring evolutionary hypotheses at ancient (i.e. >100 million year) time scales. PhyloToL was originally designed for analyses of the eukaryotic tree of life, the flexibility of PhyloToL allows users to add their taxa of interest and to explore hypotheses at varying taxonomic levels. | ||
| + | |||
| + | PhyloToL consists of four major components: | ||
| + | * Gene family (GF) assessment per taxon (i.e. adding taxa to the database); | ||
| + | * Refinement of homologs and gene tree reconstruction, | ||
| + | * Tree-based contamination removal; | ||
| + | * Building of a supermatrix for species tree reconstruction. | ||
| + | |||
| + | These components can be executed independently. PhyloToL is written primarily in the Python 3 programming language but it also incorporates Perl, Ruby and Bash custom scripts. PhyloToL only runs through the command line (i.e. there is no a GUI), therefore a minimum knowledge of UNIX is required. PhyloToL was designed to run on a cluster, but can also easily be run on desktop computers with single or multiple threads. | ||
| + | |||
| + | **Troubleshootings** | ||
| + | |||
| + | * Errors 1: | ||
| + | |||
| + | < | ||
| / | / | ||
| python3 1p_RenameProts.py --input_file ../ | python3 1p_RenameProts.py --input_file ../ | ||
| Line 12: | Line 30: | ||
| args.folder = args.input_file.split(' | args.folder = args.input_file.split(' | ||
| # filename must be ended with .fasta | # filename must be ended with .fasta | ||
| + | # Python must be python3 | ||
| + | </ | ||
| + | |||
| + | * Errors 2: | ||
| - | Errors 2: | + | < |
| #The content header must be clean: | #The content header must be clean: | ||
| [ok]> | [ok]> | ||
| Line 19: | Line 41: | ||
| GCF_000002595.2_Chlamydomonas_reinhardtii_v5.5_protein.fasta | GCF_000002595.2_Chlamydomonas_reinhardtii_v5.5_protein.fasta | ||
| + | </ | ||
| - | Errors 3: | + | * Errors 3: |
| + | < | ||
| python3 2p_CountOGsUsearch.py --input_file ../ | python3 2p_CountOGsUsearch.py --input_file ../ | ||
| PhyloTol/ | PhyloTol/ | ||
| #OGSout* must be OGSout0, OGSout1 | #OGSout* must be OGSout0, OGSout1 | ||
| + | </ | ||
| - | Errors 4: | + | * Errors 4: |
| + | < | ||
| python3 2p_CountOGsUsearch.py --input_file ../ | python3 2p_CountOGsUsearch.py --input_file ../ | ||
| [379mb]OG5_ | [379mb]OG5_ | ||
| [2.2gb]OG6_ | [2.2gb]OG6_ | ||
| https:// | https:// | ||
| + | </ | ||
| - | Errors 5: | + | * Errors 5: |
| + | < | ||
| # OG file name error. | # OG file name error. | ||
| python3 2p_CountOGsUsearch.py --input_file ../ | python3 2p_CountOGsUsearch.py --input_file ../ | ||
| Line 40: | Line 68: | ||
| [ok]> | [ok]> | ||
| [wrong]> | [wrong]> | ||
| + | </ | ||
| - | Errors 6: | + | * Errors 6: |
| + | |||
| + | < | ||
| python3 3p_RemoveDuplicates.py --file_prefix MyProteins | python3 3p_RemoveDuplicates.py --file_prefix MyProteins | ||
| Line 47: | Line 78: | ||
| ### Update values and names if you have used an alternative Database of Proteins! | ### Update values and names if you have used an alternative Database of Proteins! | ||
| OGLenDB = | OGLenDB = | ||
| + | </ | ||
| - | Errors 7: | + | * Errors 7: |
| + | < | ||
| python3 3p_RemoveDuplicates.py --file_prefix MyProteins | python3 3p_RemoveDuplicates.py --file_prefix MyProteins | ||
| Line 57: | Line 90: | ||
| python 4p_FinalizeName.py --input_file ../ | python 4p_FinalizeName.py --input_file ../ | ||
| + | </ | ||
| + | * Errors 8: | ||
| - | Errors 8: | + | < |
| / | / | ||
| FileNotFoundError: | FileNotFoundError: | ||
| - | |||
| pipeline_parameter_file.txt | pipeline_parameter_file.txt | ||
| + | </ | ||
| - | 8. Explore genome evolution of diverse eukaryotes-PhyloToL (Phylogenetic Tree of Life) | + | **Downloading, trimming, assembly** |
| - | PhyloToL is a phylogenomic pipeline aimed at exploring evolutionary hypotheses at ancient (i.e. >100 million year) time scales. PhyloToL was originally designed for analyses of the eukaryotic tree of life, the flexibility of PhyloToL allows users to add their taxa of interest and to explore hypotheses at varying taxonomic levels. | + | |
| - | PhyloToL consists of four major components: | + | Errors 1:MultiFastQC |
| - | * Gene family | + | |
| - | * Refinement of homologs and gene tree reconstruction, | + | < |
| - | * Tree-based contamination removal; | + | |
| - | * Building of a supermatrix for species tree reconstruction. | + | os.system(' |
| - | These components can be executed independently. PhyloToL is written primarily in the Python 3 programming language but it also incorporates Perl, Ruby and Bash custom scripts. PhyloToL only runs through the command | + | #pathFQC + "/ |
| + | |||
| + | </ | ||
| + | |||
| + | |||
| + | Errors 2: BBmap | ||
| + | |||
| + | < | ||
| + | |||
| + | TransPipe2.py | ||
| + | |||
| + | _R1 to _1 | ||
| + | _R2 to _2 | ||
| + | |||
| + | </ | ||
| + | |||
| + | Errors 3: BBmap2 | ||
| + | |||
| + | < | ||
| + | python2 TransPipe2.py parameter.txt your email (Email doesnot matter, must be python2) | ||
| + | |||
| + | parameter.txt (must be careful with the \n, in MacOS which is \r\n) | ||
| + | </ | ||
| + | |||
| + | Errors 4: PLT1 = Post assembly pipeline | ||
| + | |||
| + | < | ||
| + | wrapper.py | ||
| + | ' | ||
| + | |||
| + | |||
| + | ---Fatal error--- | ||
| + | Empty file ../ | ||
| + | |||
| + | |||
| + | / | ||
| + | |||
| + | ---Fatal error--- | ||
| + | Empty file ../ | ||
| + | Traceback (most recent call last): | ||
| + | File " | ||
| + | main() | ||
| + | File " | ||
| + | hunt_for_stops(args) | ||
| + | File " | ||
| + | +' | ||
| + | ZeroDivisionError: | ||
| + | </ | ||
| + | |||
| + | |||
| + | < | ||
| + | wrapper.py fifth step | ||
| + | |||
| + | / | ||
| + | BiopythonWarning) | ||
| + | </ | ||
| + | |||
| + | Errors 4: PLT1 = Post assembly pipeline | ||
| + | |||
| + | < | ||
| + | #step sixth | ||
| + | 6_FilterPartials.py | ||
| + | ### Update values and names if you have used an alternative Database of Proteins! | ||
| + | |||
| + | Error: | ||
| + | 6_FilterPartials.py | ||
| + | |||
| + | |||
| + | This OG group ‘OG5_141800’ cannot be found in the dictionary ‘OGLenDB’ in 6_FilterPartials.py: | ||
| + | |||
| + | |||
| + | Question: | ||
| + | How to add new OG to this list? (how can we get the length info of each OG?) | ||
| + | |||
| + | </ | ||
| + | |||
| + | Errors 5: | ||
| + | < | ||
| + | Failed to determine offset! Specify it manually and restart, please! | ||
| + | Try setting --phred-offset 33 or 64 manually and re-run. If you have recent sequence data then the offset will be 33 for sure. | ||
| + | |||
| + | How much RAM are you allocating to this job. Appears to be 250GB. | ||
| + | </ | ||
| + | |||
| + | |||
| + | <Last updated by Xi Zhang on Feb 8th, | ||
phylogeny_protocol6.1644261356.txt.gz · Last modified: by 134.190.232.106 · Currently locked by: 216.73.216.59
