Project

General

Profile

Crunch runner » History » Version 4

Peter Amstutz, 10/23/2015 03:16 PM

1 1 Peter Amstutz
h1. Crunch runner
2
3
Crunch runner is a Go program designed to be injected into a Docker container used to bootstrap running some other command line program, upload the results, and communicate task success or failure to the API server.  It is similar to the Python crunch script run-command, but because it is a compiled Go binary, it has a much lighter footprint than run-command (which requires the Arvados Python SDK and all its dependencies) and so can run in a wider variety of container environments.
4 2 Peter Amstutz
5
Example job invocation:
6
7
<pre>
8
{
9
  "script": "crunchrunner",
10
  "script_parameters": {
11
    "tasks": [
12
       {
13 3 Peter Amstutz
          "command": ["cat", "$(task.keep)/d3b07384d113edec49eaa6238ad5ff00+4/input1.txt", "-", "input3.txt"],
14
          "task.stdin": "$(task.keep)/d3b07384d113edec49eaa6238ad5ff00+4/input2.txt",
15
          "task.stdout": "output.txt",
16
          "task.env": {
17
            "BARFOO": "foobar"
18
          },
19
          "task.vwd": {
20
             "input3.txt": "$(task.keep)/d3b07384d113edec49eaa6238ad5ff00+4/input99.txt"
21
          },
22
          "task.successCodes": [0],
23
          "task.temporaryFailCodes": [1, 2],
24
          "task.permanentFailCodes": [3]
25 2 Peter Amstutz
       }
26
     ]
27
   }
28
}
29 1 Peter Amstutz
</pre>
30
31 3 Peter Amstutz
If there is a single task in "tasks", it runs in task 0.  If there are multiple tasks, they are scheduled as job_tasks.
32
33
* "command" is the command line to execute
34
* "task.stdin" is a path to a file that will be attached to standard input
35
* "task.stdout" is a path to a file that will be attached to standard output.  Must be a relative path in the output directory.  Subdirectories are permitted.
36
* "task.env" allows setting environment variables.
37
* "task.vwd" is a list of files to be symbolically linked into the output directory.
38
* "task.successCodes" is a list of exit codes that are considered success.
39
* "task.temporaryFailCodes" is a list of exit codes that are temporary failure (can be retried) .
40
* "task.permanentFailCodes" is a list of exit codes that are permanent failure (cannot be retried.)
41
42
Everything except "command" is optional.
43
44 4 Peter Amstutz
The initial working directory of the command is the output directory.  The umask is set to 0022.
45 3 Peter Amstutz
46
If successCodes/temporaryFailCodes/permanentFailCodes are not specified, or the exit code isn't found in one of the arrays, default Unix semantics apply (zero success, nonzero fail).
47
48
There are three substitution parameters, $(task.tmpdir), $(task.outdir) and $(task.keep).  These resolve to their respective paths on the file system.  Substitution is applied to command line arguments, task.stdin, task.env values, and task.vwd values.