User Tools

Site Tools


phylogeny_protocol6

This is an old revision of the document!


Introduction

After reading the tutorial from https://perun.biochem.dal.ca/user-wiki/doku.php?id=phylogeny_protocol5, users should be able to set up the PhyloToL running environment. However, there is a need for users to have a basic knowledge of bioinformatics in order to take full advantage of PhyloToL (https://github.com/Katzlab/PhyloTOL)in particular, the ability to read the scripts and dash shell in a Linux/Unix environment.

To make the PhyloToL pipeline in a clear, concise and easy-to-follow manner, we prepared a Protocol for using PhyloToL to explore genome evolution of diverse eukaryotes. We have also guided users with a sample data set and results for each step, which can help users reproduce the results and figures with their own data.

PhyloToL Backgrounds

Explore genome evolution of diverse eukaryotes-PhyloToL (Phylogenomic Tree of Life) PhyloToL is a phylogenomic pipeline aimed at exploring evolutionary hypotheses at ancient (i.e. >100 million year) time scales. PhyloToL was originally designed for analyses of the eukaryotic tree of life, the flexibility of PhyloToL allows users to add their taxa of interest and to explore hypotheses at varying taxonomic levels.

PhyloToL consists of four major components: * Gene family (GF) assessment per taxon (i.e. adding taxa to the database); * Refinement of homologs and gene tree reconstruction, * Tree-based contamination removal; * Building of a supermatrix for species tree reconstruction.

These components can be executed independently. PhyloToL is written primarily in the Python 3 programming language but it also incorporates Perl, Ruby and Bash custom scripts. PhyloToL only runs through the command line (i.e. there is no a GUI), therefore a minimum knowledge of UNIX is required. PhyloToL was designed to run on a cluster, but can also easily be run on desktop computers with single or multiple threads.

Troubleshootings

  • Errors 1:
/PhyloTol/AddTaxa/Proteins/Scripts/
python3 1p_RenameProts.py --input_file ../MyProteins.fasta --source genbank

args.folder = args.input_file.split('.fasta')[0] 
# filename must be ended with .fasta
# Python must be python3
  • Errors 2:
#The content header must be clean:
[ok]>XP_042921699.1
[wrong]> >XP_001689413.1 uncharacterized protein CHLRE_01g010350v5 [Chlamydomonas reinhardtii]

GCF_000002595.2_Chlamydomonas_reinhardtii_v5.5_protein.fasta
  • Errors 3:
python3 2p_CountOGsUsearch.py --input_file ../MyProteins/MyProteins.Prepped.fasta
PhyloTol/AddTaxa/Databases/db_OG
#OGSout* must be OGSout0, OGSout1
  • Errors 4:
python3 2p_CountOGsUsearch.py --input_file ../MyProteins/MyProteins.Prepped.fasta
[379mb]OG5_
[2.2gb]OG6_
https://orthomcl.org/orthomcl/app/downloads/
  • Errors 5:
# OG file name error.
python3 2p_CountOGsUsearch.py --input_file ../MyProteins/MyProteins.Prepped.fasta

[ok]>OG5_104059
[wrong]>aaeo|O67855 | organism=Aquifex aeolicus (strain VF5) | Peptidase_S49 domain-containing protein | OG6_104059
  • Errors 6:
python3 3p_RemoveDuplicates.py --file_prefix MyProteins

if OG5 not in the 3p_RemoveDuplicates.py OGLenDB list
### Update values and names if you have used an alternative Database of Proteins!
	OGLenDB =
  • Errors 7:
python3 3p_RemoveDuplicates.py --file_prefix MyProteins

PhyloTol/AddTaxa/Proteins/FinalizeProteins/Chlamy
Original folder must be renamed i.e., _Original 


python 4p_FinalizeName.py --input_file ../ToRename/MyProteins_Filtered.Final.AA.fasta --name EE_is_Fake
  • Errors 8:
/misc/scratch2/xizhang/PhyloTol/Scripts
FileNotFoundError: [Errno 2] No such file or directory: '/Users/katzlab32/Documents/PhyloTOL/DataFiles//duplication'


pipeline_parameter_file.txt

<Last updated by Xi Zhang on Feb 8th,2022>

phylogeny_protocol6.1644338776.txt.gz · Last modified: by 134.190.232.106