User Tools

Site Tools


phylogeny_protocol5

How to build a YAML file thereby setting up Conda environment?

Users should be able to focus in their science, not in installing and managing software, so make it easy to them is fundamental. ” - Advice for Bioinformaticians.

Background

It is often annoying to set up installation dependencies(i.e., folder directory, hardcoded $PATH, and versions),especially when software developer does not follow good programming practices, such as avoiding hardcoded paths to file or script.

To make the tools/pipelines more user friendly and reproducible, it is necessary make the installing process concise and straightforward thereby the most debugging efforts can be avoided.

To speed up the installation process, developers usually prepare a conda environment definition file with all the dependencies listed, or/and a dockerfile to build an image (ideally both). Then install it into the conda environment or make it somehow findable by the $PATH system variable.

Build Yaml file

Taking the software pipeline PhyloToL as an example (Ceron-Romeroet.al.,2019), which requires the following dependencies.

  1. • Biopython (https://biopython.org/)
  2. • DendroPy (https://dendropy.org/)
  3. • Bioperl (https://bioperl.org/)
  4. • USEARCH (any version; https://www.drive5.com/usearch/)
  5. • trimAl (v1.3; http://trimal.cgenomics.org/)

1. First set up a Conda environment file name.

conda create -n PhyloToL python=3.6

# Since most scripts in PhyloToL are Python3, so we first start with installing the python dependencies.
# Since macOS, linux can have different software distributions, here since I tested it on Perun which run on Ubuntu linux system, so all the software packages are Linux compatible not for MacOS. 

2. It is easy to use conda to install Biopython, DendroPy, Bioperl, MAFFT, Guidance, trimAl, RAxML

For example, https://anaconda.org/bioconda/dendropy conda install -c bioconda dendropy

3. However, for P4 and USEARCH which is not easy to set up.

# USEARCH 
https://www.drive5.com/usearch/download.html
chmod +x /usr/bin/usearch6.0.98_i86linux32
export PATH=/misc/scratch2/xizhang/PhyloTol/Conda:$PATH
# P4 

https://p4.nhm.ac.uk/installation.html

conda install scipy gsl nlopt bitarray

https://github.com/pgfoster/p4-phylogenetics  

export PYTHONPATH=$PYTHONPATH:/misc/scratch2/xizhang/PhyloTol/Conda/p4-phylogenetics-master

export PATH=$PATH:/misc/scratch2/xizhang/PhyloTol/Conda/p4-phylogenetics-master/bin

must change the hardcoded path in setup.py 

my_include_dirs = ["/home/xizhang/.conda/envs/PhyloToL/include"]
my_lib_dirs = ["/home/xizhang/.conda/envs/PhyloToL/lib"]

conda install -c conda-forge nlopt

conda install -c anaconda bitarray

python3 setup.py build_ext -i

p4 -help

4. With all these done, user can yield an yaml file.

conda env export > PhyloToL.yml

name: PhyloToL
channels:
  - bioconda
  - anaconda
  - conda-forge
  - defaults
dependencies:
  - _libgcc_mutex=0.1=main
  - _openmp_mutex=4.5=1_gnu
  - asttokens=2.0.5=pyhd8ed1ab_0
  - biopython=1.79=py36h8f6f2f9_0
  - bitarray=1.6.0=py36h7b6447c_0
  - ca-certificates=2020.10.14=0
  - certifi=2020.6.20=py36_0
  - dendropy=4.5.2=pyh3252c3a_0
  - executing=0.8.2=pyhd8ed1ab_0
  - ld_impl_linux-64=2.35.1=h7274673_9
  - libblas=3.9.0=11_linux64_openblas
  - libcblas=3.9.0=11_linux64_openblas
  - libffi=3.3=he6710b0_2
  - libgcc=7.2.0=h69d50b8_2
  - libgcc-ng=9.3.0=h5101ec6_17
  - libgfortran-ng=11.2.0=h69a702a_12
  - libgfortran5=11.2.0=h5c6108e_12
  - libgomp=9.3.0=h5101ec6_17
  - liblapack=3.9.0=11_linux64_openblas
  - libopenblas=0.3.17=pthreads_h8fe5266_1
  - libstdcxx-ng=9.3.0=hd4cf53a_17
  - mafft=7.310=h1b792b2_4
  - ncurses=6.3=h7f8727e_2
  - nlopt=2.7.0=py36he9b8a8a_1
  - numpy=1.19.5=py36hfc0c790_2
  - openssl=1.1.1m=h7f8727e_0
  - perl=5.26.2=h14c3975_0
  - perl-bioperl=1.6.924=4
  - perl-threaded=5.32.1=hdfd78af_1
  - perl-yaml=1.29=pl526_0
  - pip=21.2.2=py36h06a4308_0
  - python=3.6.13=h12debd9_1
  - python-devtools=0.8.0=pyhd8ed1ab_0
  - python_abi=3.6=2_cp36m
  - raxml=8.2.12=h779adbc_3
  - readline=8.1.2=h7f8727e_1
  - setuptools=58.0.4=py36h06a4308_0
  - six=1.16.0=pyh6c4a22f_0
  - sqlite=3.37.0=hc218d9a_0
  - tk=8.6.11=h1ccaba5_0
  - trimal=1.4.1=h7d875b9_5
  - wheel=0.37.1=pyhd3eb1b0_0
  - xz=5.2.5=h7b6447c_0
  - zlib=1.2.11=h7f8727e_4
prefix: /home/xizhang/.conda/envs/PhyloToL

Execute Yaml file

With this yaml file on hand, the new user can easily install PhyloToL software dependencies(except P4 and USEARCH) via below.

source ~/.bashrc

#
# To activate this environment, use
#
# $ conda activate PhyloToL
#
# To deactivate an active environment, use
#
# $ conda deactivate

<Last updated by Xi Zhang on Feb 8th,2022>

phylogeny_protocol5.txt · Last modified: by 134.190.232.106