Thanks to advances in both the cost and speed of sequencing technology, the amount of genomic data available for processing is growing exponentially. As a project, our goal is to build scalable pipelines for processing genomic data on top of high performance distributed computing frameworks.


At the moment, we are working on three projects:

  • ADAM: A scalable API & CLI for genome processing
  • bdg-formats: Schemas for genomic data
  • avocado: A Variant Caller, Distributed

The source for these projects is available at Github.


All of our development is available under the Apache 2 open source software (OSS) license. This OSS license is non-viral, and places no restrictions on users who would like to use or modify the software.