**How to build a YAML file thereby setting up Conda environment?**
"//Users should be able to focus in their science, not in installing and managing software, so make it easy to them is fundamental.// " - Advice for Bioinformaticians.
**Background**
It is often annoying to set up installation dependencies(i.e., folder directory, hardcoded $PATH, and versions),especially when software developer does not follow good programming practices, such as avoiding hardcoded paths to file or script.
To make the tools/pipelines more user friendly and reproducible, it is necessary make the installing process concise and straightforward thereby the most debugging efforts can be avoided.
To speed up the installation process, developers usually prepare a conda environment definition file with all the dependencies listed, or/and a dockerfile to build an image (ideally both). Then install it into the conda environment or make it somehow findable by the $PATH system variable.
**Build Yaml file**
Taking the software pipeline PhyloToL as an example (Ceron-Romeroet.al.,2019), which requires the following dependencies.
- • Biopython (https://biopython.org/)
- • DendroPy (https://dendropy.org/)
- • P4 (http://p4.nhm.ac.uk/)
- • Bioperl (https://bioperl.org/)
- • MAFFT (v7; https://mafft.cbrc.jp/alignment/software/)
- • USEARCH (any version; https://www.drive5.com/usearch/)
- • Guidance (v2.02; http://guidance.tau.ac.il/overview.html)
- • trimAl (v1.3; http://trimal.cgenomics.org/)
- • RAxML (v8; https://cme.h-its.org/exelixis/web/software/raxml/index.html)
1. First set up a Conda environment file name.
conda create -n PhyloToL python=3.6
# Since most scripts in PhyloToL are Python3, so we first start with installing the python dependencies.
# Since macOS, linux can have different software distributions, here since I tested it on Perun which run on Ubuntu linux system, so all the software packages are Linux compatible not for MacOS.
2. It is easy to use conda to install Biopython, DendroPy, Bioperl, MAFFT, Guidance, trimAl, RAxML
For example, https://anaconda.org/bioconda/dendropy
conda install -c bioconda dendropy
3. However, for P4 and USEARCH which is not easy to set up.
# USEARCH
https://www.drive5.com/usearch/download.html
chmod +x /usr/bin/usearch6.0.98_i86linux32
export PATH=/misc/scratch2/xizhang/PhyloTol/Conda:$PATH
# P4
https://p4.nhm.ac.uk/installation.html
conda install scipy gsl nlopt bitarray
https://github.com/pgfoster/p4-phylogenetics
export PYTHONPATH=$PYTHONPATH:/misc/scratch2/xizhang/PhyloTol/Conda/p4-phylogenetics-master
export PATH=$PATH:/misc/scratch2/xizhang/PhyloTol/Conda/p4-phylogenetics-master/bin
must change the hardcoded path in setup.py
my_include_dirs = ["/home/xizhang/.conda/envs/PhyloToL/include"]
my_lib_dirs = ["/home/xizhang/.conda/envs/PhyloToL/lib"]
conda install -c conda-forge nlopt
conda install -c anaconda bitarray
python3 setup.py build_ext -i
p4 -help
4. With all these done, user can yield an yaml file.
conda env export > PhyloToL.yml
name: PhyloToL
channels:
- bioconda
- anaconda
- conda-forge
- defaults
dependencies:
- _libgcc_mutex=0.1=main
- _openmp_mutex=4.5=1_gnu
- asttokens=2.0.5=pyhd8ed1ab_0
- biopython=1.79=py36h8f6f2f9_0
- bitarray=1.6.0=py36h7b6447c_0
- ca-certificates=2020.10.14=0
- certifi=2020.6.20=py36_0
- dendropy=4.5.2=pyh3252c3a_0
- executing=0.8.2=pyhd8ed1ab_0
- ld_impl_linux-64=2.35.1=h7274673_9
- libblas=3.9.0=11_linux64_openblas
- libcblas=3.9.0=11_linux64_openblas
- libffi=3.3=he6710b0_2
- libgcc=7.2.0=h69d50b8_2
- libgcc-ng=9.3.0=h5101ec6_17
- libgfortran-ng=11.2.0=h69a702a_12
- libgfortran5=11.2.0=h5c6108e_12
- libgomp=9.3.0=h5101ec6_17
- liblapack=3.9.0=11_linux64_openblas
- libopenblas=0.3.17=pthreads_h8fe5266_1
- libstdcxx-ng=9.3.0=hd4cf53a_17
- mafft=7.310=h1b792b2_4
- ncurses=6.3=h7f8727e_2
- nlopt=2.7.0=py36he9b8a8a_1
- numpy=1.19.5=py36hfc0c790_2
- openssl=1.1.1m=h7f8727e_0
- perl=5.26.2=h14c3975_0
- perl-bioperl=1.6.924=4
- perl-threaded=5.32.1=hdfd78af_1
- perl-yaml=1.29=pl526_0
- pip=21.2.2=py36h06a4308_0
- python=3.6.13=h12debd9_1
- python-devtools=0.8.0=pyhd8ed1ab_0
- python_abi=3.6=2_cp36m
- raxml=8.2.12=h779adbc_3
- readline=8.1.2=h7f8727e_1
- setuptools=58.0.4=py36h06a4308_0
- six=1.16.0=pyh6c4a22f_0
- sqlite=3.37.0=hc218d9a_0
- tk=8.6.11=h1ccaba5_0
- trimal=1.4.1=h7d875b9_5
- wheel=0.37.1=pyhd3eb1b0_0
- xz=5.2.5=h7b6447c_0
- zlib=1.2.11=h7f8727e_4
prefix: /home/xizhang/.conda/envs/PhyloToL
**Execute Yaml file**
With this yaml file on hand, the new user can easily install PhyloToL software dependencies(except P4 and USEARCH) via below.
source ~/.bashrc
#
# To activate this environment, use
#
# $ conda activate PhyloToL
#
# To deactivate an active environment, use
#
# $ conda deactivate