Crunch runner

Note: This is a Crunch1 artifact, not to be confused with Crunch2 run and the Containers API.

Crunch runner is a Go program designed to be injected into a Docker container used to bootstrap running some other command line program, upload the results, and communicate task success or failure to the API server. It is similar to the Python crunch script run-command, but because it is a compiled Go binary, it has a much lighter footprint than run-command (which requires the Arvados Python SDK and all its dependencies) and so can run in a wider variety of container environments.

Example job invocation:

  "script": "crunchrunner",
  "script_parameters": {
    "tasks": [
          "command": ["cat", "$(task.keep)/d3b07384d113edec49eaa6238ad5ff00+4/input1.txt", "-", "input3.txt"],
          "task.stdin": "$(task.keep)/d3b07384d113edec49eaa6238ad5ff00+4/input2.txt",
          "task.stdout": "output.txt",
          "task.env": {
            "BARFOO": "foobar" 
          "task.vwd": {
             "input3.txt": "$(task.keep)/d3b07384d113edec49eaa6238ad5ff00+4/input99.txt" 
          "task.successCodes": [0],
          "task.temporaryFailCodes": [1, 2],
          "task.permanentFailCodes": [3]

If there is a single task in "tasks", it runs in task 0. If there are multiple tasks, they are scheduled as job_tasks.

  • "command" is the command line to execute
  • "task.stdin" is a path to a file that will be attached to standard input
  • "task.stdout" is a path to a file that will be attached to standard output. Must be a relative path in the output directory. Subdirectories are permitted.
  • "task.env" allows setting environment variables.
  • "task.vwd" is a list of files to be symbolically linked into the output directory.
  • "task.successCodes" is a list of exit codes that are considered success.
  • "task.temporaryFailCodes" is a list of exit codes that are temporary failure (can be retried) .
  • "task.permanentFailCodes" is a list of exit codes that are permanent failure (cannot be retried.)

Everything except "command" is optional.

The initial working directory of the command is the output directory. The umask is set to 0022.

If successCodes/temporaryFailCodes/permanentFailCodes are not specified, or the exit code isn't found in one of the arrays, default Unix semantics apply (zero success, nonzero fail).

There are three substitution parameters, $(task.tmpdir), $(task.outdir) and $(task.keep). These resolve to their respective paths on the file system. Substitution is applied to command line arguments, task.stdin, task.env values, and task.vwd values.

Updated by Tom Clegg over 8 years ago · 5 revisions