Project

General

Profile

Actions

Bug #4920

closed

[Crunch] Installing Docker image from [...] exited 1

Added by Bryan Cosca over 9 years ago. Updated about 9 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
Crunch
Target version:
-
Story points:
-

Description

examples: qr1hi-8i9sb-8aaiz3f6ll1bx5z, qr1hi-8i9sb-azre7zigwmyfsi3, qr1hi-8i9sb-haprdga2d1q0hsh

1/7/2015 11:10:42 AM            crunch        running from /usr/local/arvados/src/sdk/cli/bin/crunch-job with arvados-cli Gem version(s) 0.1.20141209151444, 0.1.20141014201516, 0.1.20140919104705, 0.1.20140905165259, 0.1.20140827170424, 0.1.20140825141611, 0.1.20140812162850, 0.1.20140708213257, 0.1.20140707162447, 0.1.20140630151639, 0.1.20140513131358, 0.1.20140513101345, 0.1.20140414145041
1/7/2015 11:10:42 AM            crunch        check slurm allocation
1/7/2015 11:10:42 AM            crunch        node compute29 - 8 slots
1/7/2015 11:10:42 AM            crunch        start
1/7/2015 11:10:42 AM            crunch        Clean work dirs
1/7/2015 11:10:45 AM            crunch        Cleanup command exited 1
1/7/2015 11:10:47 AM            crunch        Installing Docker image from 142e99c2dec346e621fd3eeb30a63387+1050 exited 1 at /usr/local/arvados/src/sdk/cli/bin/crunch-job line 402
1/7/2015 11:10:47 AM            crunch        Freeze not implemented
1/7/2015 11:10:47 AM            crunch        collate

Actions #1

Updated by Tim Pierce over 9 years ago

  • Description updated (diff)
  • Category set to Crunch
  • Assigned To set to Tim Pierce
  • Target version set to Bug Triage
Actions #2

Updated by Tim Pierce over 9 years ago

  • Description updated (diff)
Actions #3

Updated by Tom Clegg over 9 years ago

  • Subject changed from Cleanup command exited 1 to [Crunch] Installing Docker image from [...] exited 1
Actions #4

Updated by Brett Smith about 9 years ago

  • Status changed from New to Resolved
  • Assigned To changed from Tim Pierce to Brett Smith
  • Target version deleted (Bug Triage)

These happened during the time that compute nodes were registering themselves prematurely more often due to a configuration change. Since we addressed that—first temporarily by reverting the change, and now more permanently by making compute node registration more conservative—we haven't seen the issue recur. At this point, I feel pretty confident saying that was the root cause, and it's fixed, so I'm closing the bug. (Note that the cleanup script also exited 1, which should happen more or less never unless things are very broken.)

I did review the crunch-job code here to see if there were opportunities to provide better error messages. However, the code's pretty clean in this regard: it runs with -e, and never redirects the stderr of anything. Docker itself seems to be noisy enough (for example, if the Docker daemon isn't running, you get an error on stderr about that). It seems like whatever failed was intentionally very quiet about it.

Actions

Also available in: Atom PDF