User Tools

Site Tools


phylogeny_protocol6

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
phylogeny_protocol6 [2022/02/07 15:31] 134.190.232.106phylogeny_protocol6 [2022/02/24 10:23] (current) 38.20.199.40
Line 1: Line 1:
 +** How to use PhyloToL to explore genome evolution of diverse eukaryotes?**
  
 **Introduction** **Introduction**
Line 7: Line 8:
  
 **PhyloToL Backgrounds** **PhyloToL Backgrounds**
-8.  Explore genome evolution of diverse eukaryotes-PhyloToL (Phylogenetic Tree of Life)+ 
 +Explore genome evolution of diverse eukaryotes-PhyloToL (Phylogenomic Tree of Life)
 PhyloToL is a phylogenomic pipeline aimed at exploring evolutionary hypotheses at ancient (i.e. >100 million year) time scales. PhyloToL was originally designed for analyses of the eukaryotic tree of life, the flexibility of PhyloToL allows users to add their taxa of interest and to explore hypotheses at varying taxonomic levels. PhyloToL is a phylogenomic pipeline aimed at exploring evolutionary hypotheses at ancient (i.e. >100 million year) time scales. PhyloToL was originally designed for analyses of the eukaryotic tree of life, the flexibility of PhyloToL allows users to add their taxa of interest and to explore hypotheses at varying taxonomic levels.
  
Line 15: Line 17:
 * Tree-based contamination removal;  * Tree-based contamination removal; 
 * Building of a supermatrix for species tree reconstruction.  * Building of a supermatrix for species tree reconstruction. 
 +
 These components can be executed independently. PhyloToL is written primarily in the Python 3 programming language but it also incorporates Perl, Ruby and Bash custom scripts. PhyloToL only runs through the command line (i.e. there is no a GUI), therefore a minimum knowledge of UNIX is required. PhyloToL was designed to run on a cluster, but can also easily be run on desktop computers with single or multiple threads. These components can be executed independently. PhyloToL is written primarily in the Python 3 programming language but it also incorporates Perl, Ruby and Bash custom scripts. PhyloToL only runs through the command line (i.e. there is no a GUI), therefore a minimum knowledge of UNIX is required. PhyloToL was designed to run on a cluster, but can also easily be run on desktop computers with single or multiple threads.
  
 **Troubleshootings** **Troubleshootings**
  
-Python must be python3 +  * Errors 1:
- +
-Errors 1:+
  
 +<code>
 /PhyloTol/AddTaxa/Proteins/Scripts/ /PhyloTol/AddTaxa/Proteins/Scripts/
 python3 1p_RenameProts.py --input_file ../MyProteins.fasta --source genbank python3 1p_RenameProts.py --input_file ../MyProteins.fasta --source genbank
Line 28: Line 30:
 args.folder = args.input_file.split('.fasta')[0]  args.folder = args.input_file.split('.fasta')[0] 
 # filename must be ended with .fasta # filename must be ended with .fasta
 +# Python must be python3
 +</code>
  
-Errors 2:+  * Errors 2: 
 + 
 +<code>
 #The content header must be clean: #The content header must be clean:
 [ok]>XP_042921699.1 [ok]>XP_042921699.1
Line 35: Line 41:
  
 GCF_000002595.2_Chlamydomonas_reinhardtii_v5.5_protein.fasta GCF_000002595.2_Chlamydomonas_reinhardtii_v5.5_protein.fasta
 +</code>
  
-Errors 3:+  * Errors 3:
  
 +<code>
 python3 2p_CountOGsUsearch.py --input_file ../MyProteins/MyProteins.Prepped.fasta python3 2p_CountOGsUsearch.py --input_file ../MyProteins/MyProteins.Prepped.fasta
 PhyloTol/AddTaxa/Databases/db_OG PhyloTol/AddTaxa/Databases/db_OG
 #OGSout* must be OGSout0, OGSout1 #OGSout* must be OGSout0, OGSout1
 +</code>
  
-Errors 4:+  * Errors 4:
  
 +<code>
 python3 2p_CountOGsUsearch.py --input_file ../MyProteins/MyProteins.Prepped.fasta python3 2p_CountOGsUsearch.py --input_file ../MyProteins/MyProteins.Prepped.fasta
 [379mb]OG5_ [379mb]OG5_
 [2.2gb]OG6_ [2.2gb]OG6_
 https://orthomcl.org/orthomcl/app/downloads/ https://orthomcl.org/orthomcl/app/downloads/
 +</code>
  
-Errors 5:+  * Errors 5:
  
 +<code>
 # OG file name error. # OG file name error.
 python3 2p_CountOGsUsearch.py --input_file ../MyProteins/MyProteins.Prepped.fasta python3 2p_CountOGsUsearch.py --input_file ../MyProteins/MyProteins.Prepped.fasta
Line 56: Line 68:
 [ok]>OG5_104059 [ok]>OG5_104059
 [wrong]>aaeo|O67855 | organism=Aquifex aeolicus (strain VF5) | Peptidase_S49 domain-containing protein | OG6_104059 [wrong]>aaeo|O67855 | organism=Aquifex aeolicus (strain VF5) | Peptidase_S49 domain-containing protein | OG6_104059
 +</code>
  
-Errors 6:+  * Errors 6: 
 + 
 +<code>
 python3 3p_RemoveDuplicates.py --file_prefix MyProteins python3 3p_RemoveDuplicates.py --file_prefix MyProteins
  
Line 63: Line 78:
 ### Update values and names if you have used an alternative Database of Proteins! ### Update values and names if you have used an alternative Database of Proteins!
  OGLenDB =  OGLenDB =
 +</code>
  
-Errors 7:+  * Errors 7:
  
 +<code>
 python3 3p_RemoveDuplicates.py --file_prefix MyProteins python3 3p_RemoveDuplicates.py --file_prefix MyProteins
  
Line 73: Line 90:
  
 python 4p_FinalizeName.py --input_file ../ToRename/MyProteins_Filtered.Final.AA.fasta --name EE_is_Fake python 4p_FinalizeName.py --input_file ../ToRename/MyProteins_Filtered.Final.AA.fasta --name EE_is_Fake
 +</code>
  
 +  * Errors 8:
  
-Errors 8:+<code>
 /misc/scratch2/xizhang/PhyloTol/Scripts /misc/scratch2/xizhang/PhyloTol/Scripts
 FileNotFoundError: [Errno 2] No such file or directory: '/Users/katzlab32/Documents/PhyloTOL/DataFiles//duplication' FileNotFoundError: [Errno 2] No such file or directory: '/Users/katzlab32/Documents/PhyloTOL/DataFiles//duplication'
- 
  
 pipeline_parameter_file.txt pipeline_parameter_file.txt
 +
 +</code>
 +
 +**Downloading, trimming, assembly**
 +
 +Errors 1:MultiFastQC
 +
 +<code>
 +   MultiFastQC.py
 +os.system('fastqc ' + file + ' --outdir=' + pathOutdir)
 + #pathFQC + "/Contents/MacOS/fastqc" 
 +
 +</code>
 +
 +
 +Errors 2: BBmap
 +
 +<code>
 +
 +TransPipe2.py
 +
 +_R1 to _1
 +_R2 to _2
 +
 +</code>
 +
 +Errors 3: BBmap2
 +
 +<code>
 +python2 TransPipe2.py parameter.txt your email  (Email doesnot matter, must be python2)
 +
 +parameter.txt (must be careful with the \n, in MacOS which is \r\n)
 +</code>
 +
 +Errors 4: PLT1 = Post assembly pipeline 
 +
 +<code>
 +wrapper.py
 +'_rnaSPAdes --minLen 200 --spades
 +
 +
 +---Fatal error---
 +Empty file ../Sr_di_Espc/Sr_di_Espc_WTA_NBU.Renamed_tga_ORF.aa.fasta
 +
 +
 +/opt/perun/bin/usearch -ublast ../Sr_di_Espc/Sr_di_Espc_WTA_NBU.Renamed_taa_ORF.aa.fasta -db ../../Databases/db_StopFreq/RepEukProts.udb -evalue 1e-5 -maxaccepts 1 -blast6out ../Sr_di_Espc/Sr_di_Espc_WTA_NBU.Renamed_taa_ORF.RepEukProts.tsv
 +
 +---Fatal error---
 +Empty file ../Sr_di_Espc/Sr_di_Espc_WTA_NBU.Renamed_taa_ORF.aa.fasta
 +Traceback (most recent call last):
 +  File "4_InFrameStopFreq.py", line 745, in <module>
 +    main()
 +  File "4_InFrameStopFreq.py", line 739, in main
 +    hunt_for_stops(args)
 +  File "4_InFrameStopFreq.py", line 686, in hunt_for_stops
 +    +'\t'+"%.2f" % ((float(taa_inframe)*1000)/float(total_codons))+'\n')
 +ZeroDivisionError: float division by zero
 +</code>
 +
 +
 +<code>
 +wrapper.py fifth step
 +
 +/usr/lib/python2.7/dist-packages/Bio/Seq.py:2309: BiopythonWarning: Partial codon, len(sequence) not a multiple of three. Explicitly trim the sequence or add trailing N before translation. This may become an error in future.
 +  BiopythonWarning)
 +</code>
 +
 +Errors 4: PLT1 = Post assembly pipeline 
 +
 +<code>
 +#step sixth
 +6_FilterPartials.py
 + ### Update values and names if you have used an alternative Database of Proteins!
 +
 +Error:
 +6_FilterPartials.py
 +
 + 
 +This OG group ‘OG5_141800’ cannot be found in the dictionary ‘OGLenDB’ in 6_FilterPartials.py:
 +
 + 
 +Question:
 +How to add new OG to this list? (how can we get the length info of each OG?)
 +
 +</code>
 +
 +Errors 5:
 +<code>
 +Failed to determine offset! Specify it manually and restart, please!
 +Try setting --phred-offset 33 or 64 manually and re-run. If you have recent sequence data then the offset will be 33 for sure.
 +
 +How much RAM are you allocating to this job. Appears to be 250GB.
 +</code>
  
  
 +<Last updated by Xi Zhang on Feb 8th,2022>
phylogeny_protocol6.1644262316.txt.gz · Last modified: by 134.190.232.106