GATK Queue support » History » Version 2
Brett Smith, 03/10/2016 03:53 PM
1 | 1 | Brett Smith | h1. GATK Queue support |
---|---|---|---|
2 | |||
3 | 2 | Brett Smith | h2. Use |
4 | 1 | Brett Smith | |
5 | 2 | Brett Smith | We have code that integrates GATK Queue with Arvados, creating child jobs for each work unit spawned by GATK Queue. You invoke it following the usual documentation for GATK Queue. Just make sure you're using a jar that includes the Arvados integration. |
6 | |||
7 | For example, to run IndelRealigner with GATK Queue using run-command, your @command@ parameter will look like: |
||
8 | |||
9 | <pre><code class="javascript">["java", "-jar", "$(dir $(queue))/Queue.jar", "--script", "$(dir $(scala_script))/ArvadosIndelRealigner.scala", more...] |
||
10 | </code></pre> |
||
11 | |||
12 | The GATK Queue integration creates and monitors child jobs for chunks of work that GATK Queue organizes. The integration looks at a *script parameter* named @runtime_constraints@ to get the runtime constraints for each component. (Do *not* put these in the main @runtime_constraints@ for the component; they may confuse Arvados.) The relevant part of your pipeline template will look like this: |
||
13 | |||
14 | <pre><code class="javascript">"script_parameters": { |
||
15 | "runtime_constraints": { |
||
16 | "value": { |
||
17 | "HaplotypeCaller": { "min_cores_per_node": N1, more... }, |
||
18 | "RealignerTargetCreator": { "min_cores_per_node": N2, more... }, |
||
19 | "IndelRealigner": { "min_cores_per_node": N3, more... }, |
||
20 | "CatVariants": { "min_cores_per_node": N4, more... }, |
||
21 | "MergeSamFiles": { "min_cores_per_node": N5, more... }, |
||
22 | "GenotypeGVCFs": { "min_cores_per_node": N6, more... }, |
||
23 | "SelectVariants": { "min_cores_per_node": N7, more... }, |
||
24 | "VariantFiltration": { "min_cores_per_node": N8, more... }, |
||
25 | "CombineVariants": { "min_cores_per_node": N9, more... } |
||
26 | } |
||
27 | }, |
||
28 | more... |
||
29 | } |
||
30 | </code></pre> |
||
31 | |||
32 | Not every component is relevant to every job, so you don't need to specify each one, but this is the complete list of what our GATK Queue integration currently supports and recognizes. If you don't specify runtime constraints for a component, the GATK Queue integration won't either, so you'll usually get the smallest node size. |
||
33 | |||
34 | h2. Development |
||
35 | |||
36 | "Source is on GitHub":https://github.com/curoverse/gatk-protected |
||
37 | |||
38 | 1 | Brett Smith | Build process: |
39 | |||
40 | # Install Maven, dependencies, and plug-ins: @sudo aptitude install openjdk-7-jdk maven libmaven-jar-plugin-java libmaven-shared-jar-java libmaven-compiler-plugin-java@ |
||
41 | # Build and install the Arvados Java SDK. In @arvados/sdk/java@, run: @mvn package -Dmaven.test.skip=true && mvn install -Dmaven.test.skip=true@ |
||
42 | # Build GATK Queue. In @gatk-protected@, run: @mvn package@ |
||
43 | |||
44 | 2 | Brett Smith | If it succeeds, you'll have @target/Queue.jar@. @arv keep put@ that in a collection where you need it. |