Feature #4579

[Documentation] Run-command docs should remind user how & why to exit non-zero on failure.

Added by Bryan Cosca almost 5 years ago. Updated almost 5 years ago.

Status:
New
Priority:
Normal
Assigned To:
-
Category:
Documentation
Target version:
Start date:
11/18/2014
Due date:
% Done:

0%

Estimated time:
Story points:
0.5

Description

Some jobs encounter errors that seem like they should be fatal errors, but still report job success.

For example, qr1hi-8i9sb-mtxaffgfw6athnp:

grep: //c09a19ea17f72c8da97f8cb64a9b333b+743: No such file or directory

or qr1hi-8i9sb-gn0jmhwp88j3a8z:

ls: cannot access /keep//keep/c09a19ea17f72c8da97f8cb64a9b333b+743/*.vcf: No such file or directory

These jobs should report failure.


Related issues

Related to Arvados - Story #3044: [Documentation] Improve documentation for authoring crunch scriptsClosed

History

#1 Updated by Tim Pierce almost 5 years ago

  • Subject changed from Crunch is able to detect unique errors within scripts? to [Crunch] failed jobs are incorrectly reported as succeeding
  • Description updated (diff)
  • Category set to Crunch

#2 Updated by Tim Pierce almost 5 years ago

  • Target version set to Bug Triage

#3 Updated by Tom Clegg almost 5 years ago

  • Tracker changed from Feature to Bug

If you use run-command, the only way to indicate success/failure is exit status. In both of these cases it looks like the script exits 0, run-command sets success=true on the task, and Crunch sets state=Complete. Crunch's part of this looks correct.

The script itself, however, incorrectly exit 0 after encountering errors. Fixing this could be as simple (or not simple) as using "set -e" and "set -o pipefail" in all the right places.

Aside 1: The run-command documentation could certainly be more forthcoming with advice about how to write scripts for it to use. (Currently exit codes are only mentioned in the context of the "ignore exit code" feature, which incidentally should probably be adjusted to explain what a terrible, terrible idea it is to use that feature.)

Aside 2: When you're at the point of giving run-command a shell script which in turn builds and runs another shell script, you're doing it wrong. At some point our docs failed you, by steering you toward using run-command for this instead of writing a Python program that calls one_task_per_input_file...

#4 Updated by Tom Clegg almost 5 years ago

  • Status changed from New to Feedback

#5 Updated by Tom Clegg almost 5 years ago

  • Subject changed from [Crunch] failed jobs are incorrectly reported as succeeding to [Documentation] Run-command docs should remind user how & why to exit non-zero on failure.
  • Category changed from Crunch to Documentation

#6 Updated by Tom Clegg almost 5 years ago

  • Tracker changed from Bug to Feature
  • Status changed from Feedback to New
  • Story points set to 0.5

#7 Updated by Tom Clegg almost 5 years ago

  • Target version changed from Bug Triage to Arvados Future Sprints

Also available in: Atom PDF