https://github.com/bigdatagenomics/    https://twitter.com/bigdatagenomics/

ADAM 0.10.0 is now available.

Parquet 1.4.3 upgrade

As developers, you need to know about a small (but significant) change to way Parquet projections work now. In the past, any record fields that were not in the projections were returned as null values. With our upgrade to Parquet 1.4.3, fields that are not in the your projections will be set to their default value (if one exists).

BQSR Performance and Concordance

This release adds the cycle covariate to BQSR. Our BQSR implementation can process the 1000g NA12878 High-Coverage Whole Genome in under 45 minutes using 100 EC2 nodes. Our concordance with GATK is high (>99.9%) for the data we’ve tested. We’ll publish the scripts we use for concordance testing in an upcoming release to allow broader, automated testing. Additionally, ADAM now processes the OQ tag to make comparing original quality scores easier.

Lots of other bug fixes and features…

  • ISSUE 242: Upgrade to Parquet 1.4.3
  • ISSUE 241: Fixes to FASTA code to properly handle indices.
  • ISSUE 239: Make ADAMVCFOutputFormat public
  • ISSUE 233: Build up reference information during cigar processing
  • ISSUE 234: Predicate to filter conversion
  • ISSUE 235: Remove unused contiglength field
  • ISSUE 232: Add -pretty and -o to the print command
  • ISSUE 230: Remove duplicate mdtag field
  • ISSUE 231: Helper scripts to run an ADAM Console.
  • ISSUE 226: Fix ReferenceRegion from ADAMRecord
  • ISSUE 225: Change Some to Option to check for unmapped reads
  • ISSUE 223: Use SparkConf object to configure SparkContext
  • ISSUE 217: Stop using reference IDs and use reference names instead
  • ISSUE 220: Update SAM to ADAM conversion
  • ISSUE 213: BQSR updates

Comments