ADAM 0.21.0 Released
ADAM version 0.21.0 has been released!
Due to major changes between Spark versions 1.6 and 2.0, we now build for combinations of Apache Spark and Scala versions: Spark 1.x and Scala 2.10, Spark 1.x and Scala 2.11, Spark 2.x and Scala 2.10, and Spark 2.x and Scala 2.11. The Spark 2.x build-time dependency will be bumped to version 2.1.0 in the next release of ADAM, see issue #1330.
One focus of this release was documentation, both at the developer API level, including extensive javadoc and scaladoc source code comments, and at the user level (e.g. https://github.com/bigdatagenomics/adam/tree/master/docs/source). The user docs can be compiled to PDF or HTML with pandoc, but to be honest they look better rendered as Markdown on Github.
Another focus was to more closely follow the VCF specification(s) when reading from and writing to VCF. For this we made significant changes to our variant and variant annotation schema and added support for version 1.0 of the VCF INFO ‘ANN’ key specification. This work will continue for our genotype and genotype annotation schema in the next version of ADAM.
The full list of changes since version 0.20.0 is below.
Closed issues:
- Update Markdown docs with ValidationStringency in VCF<–>ADAM CLI #1342
- Variant VCFHeaderLine metadata does not handle wildcards properly #1339
- Close called multiple times on VCF header stream #1337
- BroadcastRegionJoin has serialization failures #1334
- adam-cli uses git-commit-id-plugin which breaks release? #1322
- move_to_xyz scripts should have interlocks… #1317
- Lineage for partitionAndJoin in ShuffleRegionJoin causes StackOverflow Errors #1308
- Add move_to_spark_1.sh script and update README to mention #1307
- adam-submit transform fails with Exception in thread “main” java.lang.IncompatibleClassChangeError: Implementing class #1306
- private ADAMContext constructor? #1296
- AlignmentRecord.mateAlignmentEnd never set #1290
- how to submit my own driver class via adam-submit? #1289
- ReferenceRegion on Genotype seems busted? #1286
- Clarify strandedness in ReferenceRegion apply methods #1285
- Parquet and CRAM debug logging during unit tests #1280
- Add more ANN field parsing unit tests #1273
- loadVariantAnnotations returns empty RDD #1271
- Implement joinVariantAnnotations with region join #1259
- Count how many chromosome in the range of the kmer #1249
- ADAM minor release to support htsjdk 2.7.0? #1248
- how to config kryo.registrator programmatically #1245
- Does the nested record Flattener drop Maps/Arrays? #1244
- Dead-ish code cleanup in
org.bdgenomics.adam.utils
#1242 - java.io.FileNotFoundException for old adam file after upgrade to adam0.20 #1240
- please add maven-source-plugin into the pom file #1239
- Assembly jar doesn’t get rebuilt on CLI changes #1238
- how to compare with the last the column for the same chromosome name? #1237
- Need a way for users to add VCF header lines #1233
- Enhancements to VCF save #1232
- Must we split multi-allelic sites in our Genotype model? #1231
- Can’t override default -collapse in reads2coverage #1228
- Reads2coverage NPEs on unmapped reads #1227
- Strand bias doesn’t get exported #1226
- Move ADAMFunSuite helper functions upstream to SparkFunSuite #1225
- broadcast join using interval tree #1224
- Instrumentation is lost in ShuffleRegionJoin #1222
- Bump Spark, Scala, Hadoop dependency versions #1221
- GenomicRDD shuffle region join passes partition count to partition size #1220
- Scala compile errors downstream of Spark 2 Scala 2.11 artifacts #1218
- Javac error: incompatible types: SparkContext cannot be converted to ADAMContext #1217
- Release 0.20.0 artifacts failed Sonatype Nexus validation #1212
- Release script failed for 0.20.0 release #1211
- gVCF – can’t load multi-allelic sites #1202
- Allow open-ended intervals in loadIndexedBam #1196
- Interval tree join in ADAM #1171
- spark-submit throw exception in spark-standalone using .adam which transformed from .vcf #1121
- BroadcastRegionJoin is not a broadcast join #1110
- Improve test coverage of VariantContextConverter #1107
- Variant dbsnp rs id tracking in vcf2adam and ADAM2Vcf #1103
- Document core ADAM transform methods #1085
- Document deploying ADAM on Toil #1084
- Clean up packages #1083
- VariantCallingAnnotations is getting populated with INFO fields #1063
- How to load DatabaseVariantAnnotation information ? #1049
- Release ADAM version 0.20.0 #1048
- Support VCF annotation ANN field in vcf2adam and adam2vcf #1044
- How to create a rich(er) VariantContext RDD? Reconstruct VCF INFO fields. #878
- Add biologist targeted section to the README #497
- Update usage docs running for EC2 and CDH #493
- Add docs about building downstream apps on top of ADAM #291
- Variant filter representation #194
Merged and closed pull requests:
- [ADAM-1342] Update CLI docs after #1288 merged. #1343 (fnothaft)
- [ADAM-1339] Use glob-safe method to load VCF header metadata for Parquet #1340 (fnothaft)
- [ADAM-1337] Remove os.{flush,close} calls after writing VCF header. #1338 (fnothaft)
- [ADAM-1334] Clean up serialization issues in Broadcast region join. #1336 (fnothaft)
- [ADAM-1307] move_to_spark_2 fails after moving to scala 2.11. #1329 (fnothaft)
- unroll/optimize some JavaConversions #1326 (ryan-williams)
- clean up *Join type-params/scaldocs #1325 (ryan-williams)
- [ADAM-1322] Skip git commit plugin if .git is missing. #1323 (fnothaft)
- Supports access to indexed fa and fasta files #1320 (akmorrow13)
- Add interlocks for move_to_xyz scripts. #1319 (fnothaft)
- [ADAM-1307] Add script for moving to Spark 1. #1318 (fnothaft)
- Update move_to_spark_2.sh #1316 (creggian)
- [ADAM-1308] Fix stack overflow in join with custom iterator impl. #1315 (fnothaft)
- Why Adam? section added to README.md #1310 (tverbeiren)
- Add docs about using ADAM’s Kryo registrator from another Kryo registrator. #1305 (fnothaft)
- Add docs about building downstream applications #1304 (heuermh)
- [ADAM-493] Add ADAM-on-Spark-on-YARN docs. #1301 (fnothaft)
- Code style fixes #1299 (heuermh)
- Make ADAMContext and JavaADAMContext constructors public #1298 (heuermh)
- Remove back reference between VariantAnnotation and Variant #1297 (fnothaft)
- [ADAM-1280] Silence CRAM logging in tests. #1294 (fnothaft)
- HBase as a separate repo #1293 (jpdna)
- Reference region cleanup #1291 (fnothaft)
- Clean rewrite of VariantContextConverter #1288 (fnothaft)
- add function:filterByOverlappingRegions #1287 (liamlee)
- Populate fields on VariantAnnotation #1283 (heuermh)
- Add VCF headers for fields in Variant and VariantAnnotation records #1281 (heuermh)
- CGCloud deploy docs #1279 (jpdna)
- some style nits #1278 (ryan-williams)
- use ParsedLoci in loadIndexedBam #1277 (ryan-williams)
- Increasing unit test coverage for VariantContextConverter #1276 (heuermh)
- Expose FeatureRDD to public #1275 (Georgehe4)
- Clean up CLI operation categories and names, and add documentation for CLI #1274 (fnothaft)
- Rename org.bdgenomics.adam.rdd.variation package to o.b.a.rdd.variant #1270 (heuermh)
- use testFile in some tests #1268 (ryan-williams)
- [ADAM-1083] Cleaning up
org.bdgenomics.adam.models
. #1267 (fnothaft) - make py file py3-forward-compatible #1266 (ryan-williams)
- rm accidentally-added file #1265 (fnothaft)
- Finishing up the cleanup on org.bdgenomics.adam.rdd. #1264 (fnothaft)
- Clean up
org.bdgenomics.adam.rich
package. #1263 (fnothaft) - Add docs for transform pipeline, ADAM-on-Toil #1262 (fnothaft)
- updates for bdg utils 0.2.9-SNAPSHOT #1261 (akmorrow13)
- [ADAM-1233] Expose header lines in Variant-related GenomicRDDs #1260 (fnothaft)
- [ADAM-1221] Bump Spark/Hadoop versions. #1258 (fnothaft)
- Rename org.bdgenomics.adam.rdd.features package to o.b.a.rdd.feature #1256 (heuermh)
- Clean up documentation in
org.bdgenomics.adam.projection
. #1255 (fnothaft) - [ADAM-1221] Bump Spark/Hadoop versions. #1254 (fnothaft)
- Misc shuffle join fixes. #1253 (fnothaft)
- [ADAM-1196] Add support for open ReferenceRegions. #1252 (fnothaft)
- [ADAM-1225] Move helper functions from ADAMFunSuite to SparkFunSuite. #1251 (fnothaft)
- Merge VariantAnnotation and DatabaseVariantAnnotation records #1250 (heuermh)
- Miscellaneous VCF fixes #1247 (fnothaft)
- HBase backend for Genotypes #1246 (jpdna)
- [ADAM-1242] Clean up dead code in org.bdgenomics.adam.util. #1243 (fnothaft)
- Small cleanup of “replacing uses of deprecated class SAMFileReader” #1236 (fnothaft)
- replacing uses of deprecated class SAMFileReader #1235 (lbergelson)
- [ADAM-1224] Replace BroadcastRegionJoin with tree based algo. #1234 (fnothaft)
- Fix reads2coverage issues #1230 (fnothaft)
- [ADAM-1212] Add empty assembly object, allows Maven build to create sources and javadoc artifacts #1215 (heuermh)
- [ADAM-1211] Fix call to move_to_scala_2.sh, reorder Spark 2.x Scala 2.10 and 2.10 sections #1214 (heuermh)
- demonstrate multi-allelic gVCF failure – test added #1205 (jpdna)
- Merge VariantAnnotation and DatabaseVariantAnnotation records #1144 (heuermh)
- Upgrade to bdg-formats-0.10.0 #1135 (fnothaft)