GATK Queue support

Use

We have code that integrates GATK Queue with Arvados, creating child jobs for each work unit spawned by GATK Queue. You invoke it following the usual documentation for GATK Queue. Just make sure you're using a jar that includes the Arvados integration.

For example, to run IndelRealigner with GATK Queue using run-command, your command parameter will look like:

["java", "-jar", "$(dir $(queue))/Queue.jar", "--script", "$(dir $(scala_script))/ArvadosIndelRealigner.scala", more...]

The GATK Queue integration creates and monitors child jobs for chunks of work that GATK Queue organizes. The integration looks at a script parameter named runtime_constraints to get the runtime constraints for each component. (Do not put these in the main runtime_constraints for the component; they may confuse Arvados.) The relevant part of your pipeline template will look like this:

"script_parameters": {
  "runtime_constraints": {
    "value": {
      "HaplotypeCaller":        { "min_cores_per_node": N1, more... },
      "RealignerTargetCreator": { "min_cores_per_node": N2, more... },
      "IndelRealigner":         { "min_cores_per_node": N3, more... },
      "CatVariants":            { "min_cores_per_node": N4, more... },
      "MergeSamFiles":          { "min_cores_per_node": N5, more... },
      "GenotypeGVCFs":          { "min_cores_per_node": N6, more... },
      "SelectVariants":         { "min_cores_per_node": N7, more... },
      "VariantFiltration":      { "min_cores_per_node": N8, more... },
      "CombineVariants":        { "min_cores_per_node": N9, more... }
    }
  },
  more...
}

Not every component is relevant to every job, so you don't need to specify each one, but this is the complete list of what our GATK Queue integration currently supports and recognizes. If you don't specify runtime constraints for a component, the GATK Queue integration won't either, so you'll usually get the smallest node size.

Development

Source is on GitHub

Build process:

  1. Install Maven, dependencies, and plug-ins: sudo aptitude install openjdk-7-jdk maven libmaven-jar-plugin-java libmaven-shared-jar-java libmaven-compiler-plugin-java
  2. Build and install the Arvados Java SDK. In arvados/sdk/java, run: mvn package -Dmaven.test.skip=true && mvn install -Dmaven.test.skip=true
  3. Build GATK Queue. In gatk-protected, run: mvn package

If it succeeds, you'll have target/Queue.jar. arv keep put that in a collection where you need it.