Overview
MetaPathways [MP2013] is a meta’omic analysis pipeline for the annotation and analysis for environmental sequence information. MetaPathways include metagenomic or metatranscriptomic sequence data in one of several file formats (.fasta, .gff, or .gbk). The pipeline consists of five operational stages including
Pipeline Overview
MetaPathways is composed of four general stages, encompassing a number of analytical or data handling steps (Figure 1):
Quality Control: Basic quality control (QC) is performed with includes filtering out sequences below a set length threshold (default 180bp). At this stage any duplicate sequences are removed (optional).
Feature Prediction: Several sequence features can be predicted on the QC’ed contigs. Open-reading frames (ORFs) are predicted by default and (optionally) ribosomal subunits (rRNAs) and transfer RNAs (tRNAs) can be predicted. To improve the runtime and efficency, Prodigal [PRODIGAL2010] is run through a parallel version (pProdigal) [PPRODIGAL2022] and tRNAscan-SE [TRNASCANSE2021] is run using a wrapper script that allow for more efficient multi-threading. BARRNAP [BARRNAP2019] is used for the prediction of rRNAs including: 16S, 23S, and 5S. MetaPathways provides an overlap-aware identification so that users can make informed decisions about features when they overlap eachother. Addtionally, users can define the minimum length of ORFs to keep for downstream analysis.
Functional Annotation: Using a seed-and-extend homology search algorithm, either BLAST [BLAST] or FAST [FAST], users can conduct searches against both functional and taxonomic (optional) databases. Currently supported databases include: Uniprot SwissProt [SWISSPROT], Uniprot UniRef90 [UNIREF90], MetaCyc [METACYC], and CAZymes [CAZy]. However, users can create custom databases for any preferred databases. Optionally, reads can be used to calculate abundance information at both the contig and ORF-level.
Pathway Inference: MetaPathways then predicts MetaCyc pathways using the Pathway Tools software and its pathway prediction algorithm PathoLogic [KARP11], resulting in the creation of a community-level environmental Pathway/Genome Database (ePGDB), an integrative data structure of sequences, genes, pathways, and literature annotations for integrative interpretation. Optionally, if metagenome-assembled genomes (MAGs) are available for the metagenome, these MAGs can be used to create population-level ePGDBs. MetaCyc pathways are exported in a tabular format for downstream analysis.
Bibliography
Please see the following Zotero Library for a bibliography.