Getting started

Installing bripipetools

To install bripipetools, you’ll first need to install a copy of Anaconda by following the instructions at https://docs.conda.io/projects/conda/en/latest/user-guide/install/. Once you have Anaconda installed, you can execute the following commands:

git clone https://github.com/BenaroyaResearch/bripipetools
conda env create -n bripipetools environment.yml
conda activate bripipetools

After the Anaconda environment has been set up with the required packages and activated, you can install bripipetools. There are two ways of installing; currently it is recommended that you use the first method, designed to install while the package is undergoing active development:

pip install -e .
py.test

There is also an installation method designed for production:

pip install .
python setup.py test

Following successful installation, you should be able to run the command bripipetools to see usage information.


Using bripipetools

Currently, there are three primary functions served by package modules:

  • Generation of workflow instructions and submission of data processing batches

  • Collection and organization of output data from bioinformatics processing workflows

  • Annotation and import of pipeline input & output data into Mongo databases — i.e., the Research Database

These features are continuing to expand and evolve over time.

Workflow batch submission

Batch submission to Globus Galaxy for bioinformatics data processing is currently managed through bripipetools submit command, which uses the submission package.

Workflow batch data summarization

Following a successful submission and completion of a Globus Galaxy batch of jobs, the data can be summarized and inserted into the Research Database using the bripipetools wrapup command, which uses the packages described below.

Quality control

Quality control functions (for example, sexchecking) are accessed using the bripipetools qc, which accepts a path to a workflowbatch file and performs the appropriate analyses for the data listed in the batch file.

Post-processing

The entrypoint for tasks related to organizing processing output files (e.g., file stitching, cleanup, etc.) is the bripipetools postprocess command, which uses the postprocess package.

Data management

Annotation of sequencing and processing data — as well as corresponding retrieval and import of data from/to the Research Database — is performed through the bripipetools dbify command, which uses the dbify package.