Installation
MetaPathways supports installing the software using Conda and Pip in a 64-bit Linux environment, or from a container image that can be used with Docker or Singularity. If you do not have administrator (i.e., “root”) access to your computer, we recommend that users install MiniConda if they do not already have it set up. For users wanting to use MetaPathways in an academic grid computing environment, we recommend using the container image via Singularity. Below please find a description of how to install MetaPathways using the two supported options:
Container Install
Our container images are hosted at Quay.io.
The following commands assume that you are already familiar with installing and running Docker containers
via the docker or singularity executables:
Using Docker:
sudo docker pull quay.io/hallamlab/metapathways
Using Singularity:
singularity build metapathways.sif docker://quay.io/hallamlab/metapathways:latest
More advanced container-related commands are available as Make targets in the Makefile.
Installing with Pip and Conda
Summary
Assuming that you have all prerequisites satisfied, installng can be as simple as:
conda create --name metapathways python=3.10
conda activate metapathways
pip3 install git+https://bitbucket.org/BCB2/metapathways.git@dev#egg=MetaPathways
metapathways-install-deps.sh
Read on to learn the details.
Detailed Install
We currently offer a way to use Pip to install the MetaPathways Python package, along with using Conda to install all dependencies. We do not yet have a Conda package for MetaPathways. It is in the works for a future release.
For this to work, we assume that you have the following already set up in your command line environment:
You have Python 3 (
python3) andpip3installedYou have already installed Conda, and it is activated
Development files for
zlib,liblzma` and ``libbz2(required to installPySAMviapip)You have
wgetinstalled
If you are using a version of Linux that uses apt, and you have root access, then you can execute the
following to get all of the dependencies except Conda:
sudo apt-get update -y
sudo apt-get install -y \
python3 \
python3-pip \
zlib1g-dev \
liblzma-dev \
libbz2-dev \
wget
Installing Python Package as Root
If you have root/administrator access, install the MetaPathways Python package using the following command:
pip3 install git+https://bitbucket.org/BCB2/metapathways.git@dev#egg=MetaPathways
Installing Python Package as an Unpriviledged User
Use this form to install the package to the user’s home directory:
pip3 install --user git+https://bitbucket.org/BCB2/metapathways.git@dev#egg=MetaPathways
Make sure to add $HOME/.local/bin to your $PATH environment variable. This will allow you
to use the programs without having to type the full path each time.
Conda-Based Setup
Once you have installed the Python package, you will have the following executables either in the system Python install path, or in ~/.local/bin, so be sure to add those paths to your $PATH environment variable.
MetaPathways
metapathways-install-deps.sh
metapathways-data-install.sh
metacount
fastal
fastdb
Execute metapathways-install-deps.sh to install pipeline dependencies using Conda.
Reference Sequences
Summary
Assuming that you have MetaPathways installed, installing the reference DB can be as simple as:
metapathways-data-install.sh /media/ref-db-dir stage_fast_full
Read on for detailed instructons.
Details
MetaPathways relies on reference databases of sequences to assign functional and taxonomic annotations to the user’s sequences. The reference databases, and the index files for each database, take up a significant amount of disk storage. See below for an anecdotal example.
You cannot install these large reference databases within the container, though. You should have a directory on a disk with plenty of capacity, and use Docker’s and Singularity’s bind options to mount that external directory within the container. Here’s an example using Singularity:
singularity shell --bind /mnt/sandbox/user:/data docker://quay.io/hallamlab/metapathways:latest
The above example binds the host operating system’s
/mnt/sandbox/user directory within the running container as
/data.
Warning: Circa 2021-10, using a beefy computer with many cores and
plenty of RAM, performing the staging of the full Blast databases may
take an hour, and staging the full set of FAST databases will
take around 24 hours. The Blast refseq_protein databases take up ~90 GB of disk
capacity, while the FAST refseq_protein database takes up ~375
GB. The combination of other staged databases (including both Blast
and FAST versions) consumes an additional ~20 GB. Please make sure you
have adequate disk capacity before starting the database staging.
We use Snakemake to automate the staging of reference databases
needed by MetaPathways. We have installed Snakemake via Conda. If
you are using the Docker container, then Conda is already
initialized. If you are using the container via Singularity, you must
first initialize Conda as follows (note the space between the period
character, and the first slash character):
. /opt/conda/etc/profile.d/conda.sh
Now we can run the metapathways-data-install.sh script
metapathways-data-install.sh /media/ref-db-dir stage_fast_lite
… where /media/ref-db-dir is the reference database installation directory (make sure this directory has adequate capacity for the data to be installed).
Above we issued the stage_fast_lite command to Snakemake, as an example
that runs quickly. There are actually four options for staging the data:
All databases, indexed for use with Blast:
stage_blast_fullAll databases except RefSeq Proteome, indexed for use with Blast:
stage_blast_liteAll databases, indexed for use with FAST:
stage_fast_fullAll databases except RefSeq Proteome, indexed for use with FAST:
stage_fast_lite
So, first decide whether you want to use Blast or FAST, and then
decide whether you have the disk space and the install time to install
the NCBI RefSeq Proteome reference database. FAST runs faster than Blast,
with comparable sensitivity. And the RefSeq Proteome is currently required
for MetaPathways to accurately annotate contigs taxonomically. Thus, we
recommend running stage_fast_full, if you have the disk storage and
the time to let it run.