ADAM version 0.25.0 and Cannoli version 0.3.0 have been released!

Since the 0.24.0 release of ADAM, more then 40 issues have been closed, including bug fixes around indexed reads and attributes in VCF. New features include additional filter by methods and multi-sample coverage. The ADAM Python APIs now support Python 3.

Based on feedback from the 2018 GCCBOSC bioinformatics community conference, at 2018 GCCBOSC CollaborationFest the Cannoli API was refactored to greatly improve interactive use in cannoli-shell (a Scala REPL based on Spark Shell, similar to adam-shell) and notebooks such as Jupyter, Zeppelin, and Spark Notebook.

For example, here is an entire variant calling pipeline, based on bwa, ADAM, and Freebayes

import org.bdgenomics.adam.rdd.ADAMContext._
import org.bdgenomics.cannoli.cli._
import org.bdgenomics.cannoli.cli.Cannoli._

val sample = "sample"
val reference = "ref.fa"

val reads = sc.loadPairedFastqAsFragments(sample + "_1.fq", sample + "_2.fq")

val bwaArgs = new BwaArgs()
bwaArgs.sample = sample
bwaArgs.indexPath = reference

val alignments = reads.alignWithBwa(bwaArgs)
val sorted = alignments.sortReadsByReferencePositionAndIndex()
val markdup = sorted.markDuplicates()

val freebayesArgs = new FreebayesArgs()
freebayesArgs.referencePath = reference

val variantContexts = markdup.callVariantsWithFreebayes(freebayesArgs)

variantContexts.saveAsVcf(sample + ".freebayes.vcf.bgzf")

Changes since Previous Releases

The full list of changes to ADAM since version 0.24.0 and Cannoli since version 0.2.0 are below.

ADAM version 0.25.0

Closed issues:

Expand illumina metadata regex to include “N” character #2079
Remove support for Hadoop 2.6 #2073
NumberFormatException: For input string: “nan” in VCF #2068
Support Spark 2.3.2 #2062
Arrays should be passed to HTSJDK in the JVM primitive type #2059
toCoverage() function for alignments does not distinguish samples #2049
Building from adam-core module directory fails to generate Scala code for sql package #2047
Data Sets #2043
saveAsBed writes missing score values as ‘.’ instead of ‘0’ #2039
Fix GFF3 parser to handle trailing FASTA #2037
Add StorageLevel as an optional parameter to loadPairedFastq #2032
Error: File name too long when building on encrypted file system #2031
Fail to transform a VCF file containing multiple genome data (Muliple sample) #2029
Dataset and RDD constructors are missing from CoverageRDD #2027
How to create a single RDD[Genotype] object out of multiple VCF files? #2025
ReadTheDocs github banner is broken #2020
-realign_indels throws serialization error with instrumentation enabled #2007
Support 0 length FASTQ reads #2006
Speed of Reading into ADAM RDDs from S3 #2003
Support Python 3 #1999
Unordered list of region join types in doc is missing nested levels #1997
Add VariantContextRDD.saveAsPartitionedParquet, ADAMContext.loadPartitionedParquetVariantContexts #1996
VCF annotation question #1994
Fastq reader clips long reads at 10,000 bp #1992
adam-submit Error: Number of executors must be a positive number on EMR 5.13.0/Spark 2.3.0 #1991
Test against Spark 2.3.1, Parquet 1.8.3 #1989
END does not get set when writing a gVCF #1988
Support saving single files to filesystems that don’t implement getScheme #1984
Add additional filter by convenience methods #1978
Limiting FragmentRDD pipe paralellism #1977
Consider javadoc.io for API documentation linking #1976
FASTQ Reader leaks connections #1974
Update bioconda recipe for version 0.24.0 #1971
Update homebrew formula at brewsci/homebrew-bio for version 0.24.0 #1970
loadPartitionedParquetAlignments fails with Reference.all #1967
Caused by: java.lang.VerifyError: class com.fasterxml.jackson.module.scala.ser.ScalaIteratorSerializer overrides final method withResolved #1953
FASTQ input format needs to support index sequences #1697
Changelog must be edited and committed manually during release process #936

Merged and closed pull requests:

added pyspark mock modules for API documentation #2084 (akmorrow13)
Added mock python modules for API python documentation #2082 (akmorrow13)
[ADAM-2079] Expand illumina metadata regex to include “N” character #2081 (pauldwolfe)
ADAM-2079 Added “N” to regexs for illumina metadata #2080 (pauldwolfe)
Update docs with new template and documentation #2078 (akmorrow13)
[ADAM-1992] Make maximum FASTQ read length configurable. #2077 (heuermh)
[ADAM-2059] Properly pass back primitive typed arrays to HTSJDK. #2075 (heuermh)
Update dependency versions, including htsjdk to 2.16.1 and guava to 27.0-jre #2072 (heuermh)
[ADAM-1999] Support Python 3 #2070 (akmorrow13)
[ADAM-2068] Prevent NumberFormatException for nan vs NaN in VCF files. #2069 (heuermh)
Update python MAKE file #2067 (Georgehe4)
Update python MAKE file #2066 (Georgehe4)
Update jenkins script to test python 3.6 #2060 (Georgehe4)
[ADAM-2062] Update Spark version to 2.3.2 #2055 (heuermh)
Clean up fields and doc in fragment. #2054 (heuermh)
[ADAM-2037] Support GFF3 files containing FASTA formatted sequences. #2053 (heuermh)
modified CoverageRDD and FeatureRDD to extend MultisampleGenomicDataset #2051 (akmorrow13)
Multi-sample coverage #2050 (akmorrow13)
[ADAM-2047] Use source directory relative to project.basedir for adam codegen. #2048 (heuermh)
[ADAM-2039] Adding support for writing BED format per UCSC definition #2042 (heuermh)
Update Jenkins Spark version to 2.2.2 #2035 (akmorrow13)
[ADAM-2032] Add StorageLevel as an optional parameter to loadPairedFastq #2033 (heuermh)
[ADAM-2027] Add RDD and Dataset constructors to CoverageRDD. #2028 (heuermh)
Allow for export of query name sorted SAM files #2026 (karenfeng)
[ADAM-2020] Fix ReadTheDocs Github banner. #2021 (fnothaft)
[ADAM-1988] Add copyVariantEndToAttribute method to support gVCF END attribute … #2017 (heuermh)
[ADAM-936] Use github-changes-maven-plugin to update CHANGES.md. #2014 (heuermh)
[ADAM-1992] Make maximum FASTQ read length configurable. #2011 (fnothaft)
[ADAM-1697] Expand Illumina metadata regex to cover interleaved index sequences. #2010 (heuermh)
[ADAM-2007] Make IndelRealignmentTarget implement Serializable. #2009 (fnothaft)
[ADAM-2006] Support loading 0-length reads as FASTQ. #2008 (fnothaft)
[ADAM-1697] Expand Illumina metadata regex to cover index sequences #2004 (pauldwolfe)
[ADAM-1996] Load and save VariantContexts as partitioned Parquet. #2001 (heuermh)
[ADAM-1997] Nest list of region join types in joins doc. #1998 (heuermh)
[ADAM-1877] Add filterToReferenceName(s) to SequenceDictionary. #1995 (heuermh)
[ADAM-1984] Support file systems that don’t set the scheme. #1985 (fnothaft)
[ADAM-1978] Add additional filter by convenience methods. #1983 (heuermh)
Adding printAttribute methods for alignment records, features, and samples. #1982 (heuermh)
Fix partitioning code to use Long instead of Int #1980 (fnothaft)
[ADAM-1976] Adding core API documentation link and badge. #1979 (heuermh)
[ADAM-1974] Close unclosed stream in FastqInputFormat. #1975 (fnothaft)
Set defaults to schemas #1972 (ffinfo)
Add loadPairedFastqAsFragments method. #1866 (heuermh)
Adding loadPairedFastqAsFragments method #1828 (ffinfo)

Cannoli Version 0.3.0

Closed issues:

Add implicit methods that attach to source RDD #131
Flip function and command line class names around #130
Add API documentation link and badge #128
Add homebrew formula at brewsci/homebrew-bio #124
Add bioconda recipe #123
Support validation stringency in out formatters #122
Add Ensembl Variant Effect Predictor (VEP) for variant annotation #112
Add Minimap2 for alignment #111

Merged and closed pull requests:

Update release script for changelog. #143 (heuermh)
[CANNOLI-141] Update ADAM dependency to 0.25.0. #142 (heuermh)
Update default docker image for bowtie2. #140 (heuermh)
[CANNOLI-138] Update Cannoli per latest ADAM snapshot changes. #139 (heuermh)
[CANNOLI-131] Add implicits on Cannoli function source data sets. #133 (heuermh)
[CANNOLI-130] Extract function classes to core package. #132 (heuermh)
[CANNOLI-128] Adding API documentation link and badge. #129 (heuermh)
[CANNOLI-112] Adding Ensembl Variant Effect Predictor (VEP) for variant annotation #127 (heuermh)
[CANNOLI-122] Support validation stringency in out formatters. #126 (heuermh)
[CANNOLI-111] Adding Minimap2 for alignment. #119 (heuermh)

ADAM 0.25.0 and Cannoli 0.3.0 Released

Changes since Previous Releases

ADAM version 0.25.0

Cannoli Version 0.3.0

Comments