Bug #4920
closed
[Crunch] Installing Docker image from [...] exited 1
Added by Bryan Cosca over 9 years ago.
Updated over 9 years ago.
Description
examples: qr1hi-8i9sb-8aaiz3f6ll1bx5z, qr1hi-8i9sb-azre7zigwmyfsi3, qr1hi-8i9sb-haprdga2d1q0hsh
1/7/2015 11:10:42 AM crunch running from /usr/local/arvados/src/sdk/cli/bin/crunch-job with arvados-cli Gem version(s) 0.1.20141209151444, 0.1.20141014201516, 0.1.20140919104705, 0.1.20140905165259, 0.1.20140827170424, 0.1.20140825141611, 0.1.20140812162850, 0.1.20140708213257, 0.1.20140707162447, 0.1.20140630151639, 0.1.20140513131358, 0.1.20140513101345, 0.1.20140414145041
1/7/2015 11:10:42 AM crunch check slurm allocation
1/7/2015 11:10:42 AM crunch node compute29 - 8 slots
1/7/2015 11:10:42 AM crunch start
1/7/2015 11:10:42 AM crunch Clean work dirs
1/7/2015 11:10:45 AM crunch Cleanup command exited 1
1/7/2015 11:10:47 AM crunch Installing Docker image from 142e99c2dec346e621fd3eeb30a63387+1050 exited 1 at /usr/local/arvados/src/sdk/cli/bin/crunch-job line 402
1/7/2015 11:10:47 AM crunch Freeze not implemented
1/7/2015 11:10:47 AM crunch collate
- Description updated (diff)
- Category set to Crunch
- Assigned To set to Tim Pierce
- Target version set to Bug Triage
- Description updated (diff)
- Subject changed from Cleanup command exited 1 to [Crunch] Installing Docker image from [...] exited 1
- Status changed from New to Resolved
- Assigned To changed from Tim Pierce to Brett Smith
- Target version deleted (
Bug Triage)
These happened during the time that compute nodes were registering themselves prematurely more often due to a configuration change. Since we addressed that—first temporarily by reverting the change, and now more permanently by making compute node registration more conservative—we haven't seen the issue recur. At this point, I feel pretty confident saying that was the root cause, and it's fixed, so I'm closing the bug. (Note that the cleanup script also exited 1, which should happen more or less never unless things are very broken.)
I did review the crunch-job code here to see if there were opportunities to provide better error messages. However, the code's pretty clean in this regard: it runs with -e, and never redirects the stderr of anything. Docker itself seems to be noisy enough (for example, if the Docker daemon isn't running, you get an error on stderr about that). It seems like whatever failed was intentionally very quiet about it.
Also available in: Atom
PDF