How to build a YAML file thereby setting up Conda environment?
“Users should be able to focus in their science, not in installing and managing software, so make it easy to them is fundamental. ” - Advice for Bioinformaticians.
Background
It is often annoying to set up installation dependencies(i.e., folder directory, hardcoded $PATH, and versions),especially when software developer does not follow good programming practices, such as avoiding hardcoded paths to file or script.
To make the tools/pipelines more user friendly and reproducible, it is necessary make the installing process concise and straightforward thereby the most debugging efforts can be avoided.
To speed up the installation process, developers usually prepare a conda environment definition file with all the dependencies listed, or/and a dockerfile to build an image (ideally both). Then install it into the conda environment or make it somehow findable by the $PATH system variable.
Build Yaml file
Taking the software pipeline PhyloToL as an example (Ceron-Romeroet.al.,2019), which requires the following dependencies.
1. First set up a Conda environment file name.
conda create -n PhyloToL python=3.6 # Since most scripts in PhyloToL are Python3, so we first start with installing the python dependencies. # Since macOS, linux can have different software distributions, here since I tested it on Perun which run on Ubuntu linux system, so all the software packages are Linux compatible not for MacOS.
2. It is easy to use conda to install Biopython, DendroPy, Bioperl, MAFFT, Guidance, trimAl, RAxML
For example, https://anaconda.org/bioconda/dendropy conda install -c bioconda dendropy
3. However, for P4 and USEARCH which is not easy to set up.
# USEARCH https://www.drive5.com/usearch/download.html chmod +x /usr/bin/usearch6.0.98_i86linux32 export PATH=/misc/scratch2/xizhang/PhyloTol/Conda:$PATH
# P4 https://p4.nhm.ac.uk/installation.html conda install scipy gsl nlopt bitarray https://github.com/pgfoster/p4-phylogenetics export PYTHONPATH=$PYTHONPATH:/misc/scratch2/xizhang/PhyloTol/Conda/p4-phylogenetics-master export PATH=$PATH:/misc/scratch2/xizhang/PhyloTol/Conda/p4-phylogenetics-master/bin must change the hardcoded path in setup.py my_include_dirs = ["/home/xizhang/.conda/envs/PhyloToL/include"] my_lib_dirs = ["/home/xizhang/.conda/envs/PhyloToL/lib"] conda install -c conda-forge nlopt conda install -c anaconda bitarray python3 setup.py build_ext -i p4 -help
4. With all these done, user can yield an yaml file.
conda env export > PhyloToL.yml
name: PhyloToL channels: - bioconda - anaconda - conda-forge - defaults dependencies: - _libgcc_mutex=0.1=main - _openmp_mutex=4.5=1_gnu - asttokens=2.0.5=pyhd8ed1ab_0 - biopython=1.79=py36h8f6f2f9_0 - bitarray=1.6.0=py36h7b6447c_0 - ca-certificates=2020.10.14=0 - certifi=2020.6.20=py36_0 - dendropy=4.5.2=pyh3252c3a_0 - executing=0.8.2=pyhd8ed1ab_0 - ld_impl_linux-64=2.35.1=h7274673_9 - libblas=3.9.0=11_linux64_openblas - libcblas=3.9.0=11_linux64_openblas - libffi=3.3=he6710b0_2 - libgcc=7.2.0=h69d50b8_2 - libgcc-ng=9.3.0=h5101ec6_17 - libgfortran-ng=11.2.0=h69a702a_12 - libgfortran5=11.2.0=h5c6108e_12 - libgomp=9.3.0=h5101ec6_17 - liblapack=3.9.0=11_linux64_openblas - libopenblas=0.3.17=pthreads_h8fe5266_1 - libstdcxx-ng=9.3.0=hd4cf53a_17 - mafft=7.310=h1b792b2_4 - ncurses=6.3=h7f8727e_2 - nlopt=2.7.0=py36he9b8a8a_1 - numpy=1.19.5=py36hfc0c790_2 - openssl=1.1.1m=h7f8727e_0 - perl=5.26.2=h14c3975_0 - perl-bioperl=1.6.924=4 - perl-threaded=5.32.1=hdfd78af_1 - perl-yaml=1.29=pl526_0 - pip=21.2.2=py36h06a4308_0 - python=3.6.13=h12debd9_1 - python-devtools=0.8.0=pyhd8ed1ab_0 - python_abi=3.6=2_cp36m - raxml=8.2.12=h779adbc_3 - readline=8.1.2=h7f8727e_1 - setuptools=58.0.4=py36h06a4308_0 - six=1.16.0=pyh6c4a22f_0 - sqlite=3.37.0=hc218d9a_0 - tk=8.6.11=h1ccaba5_0 - trimal=1.4.1=h7d875b9_5 - wheel=0.37.1=pyhd3eb1b0_0 - xz=5.2.5=h7b6447c_0 - zlib=1.2.11=h7f8727e_4 prefix: /home/xizhang/.conda/envs/PhyloToL
Execute Yaml file
With this yaml file on hand, the new user can easily install PhyloToL software dependencies(except P4 and USEARCH) via below.
source ~/.bashrc # # To activate this environment, use # # $ conda activate PhyloToL # # To deactivate an active environment, use # # $ conda deactivate
<Last updated by Xi Zhang on Feb 8th,2022>