Bug #4920
closed[Crunch] Installing Docker image from [...] exited 1
Description
examples: qr1hi-8i9sb-8aaiz3f6ll1bx5z, qr1hi-8i9sb-azre7zigwmyfsi3, qr1hi-8i9sb-haprdga2d1q0hsh
1/7/2015 11:10:42 AM crunch running from /usr/local/arvados/src/sdk/cli/bin/crunch-job with arvados-cli Gem version(s) 0.1.20141209151444, 0.1.20141014201516, 0.1.20140919104705, 0.1.20140905165259, 0.1.20140827170424, 0.1.20140825141611, 0.1.20140812162850, 0.1.20140708213257, 0.1.20140707162447, 0.1.20140630151639, 0.1.20140513131358, 0.1.20140513101345, 0.1.20140414145041 1/7/2015 11:10:42 AM crunch check slurm allocation 1/7/2015 11:10:42 AM crunch node compute29 - 8 slots 1/7/2015 11:10:42 AM crunch start 1/7/2015 11:10:42 AM crunch Clean work dirs 1/7/2015 11:10:45 AM crunch Cleanup command exited 1 1/7/2015 11:10:47 AM crunch Installing Docker image from 142e99c2dec346e621fd3eeb30a63387+1050 exited 1 at /usr/local/arvados/src/sdk/cli/bin/crunch-job line 402 1/7/2015 11:10:47 AM crunch Freeze not implemented 1/7/2015 11:10:47 AM crunch collate
Updated by Tim Pierce over 9 years ago
- Description updated (diff)
- Category set to Crunch
- Assigned To set to Tim Pierce
- Target version set to Bug Triage
Updated by Tom Clegg over 9 years ago
- Subject changed from Cleanup command exited 1 to [Crunch] Installing Docker image from [...] exited 1
Updated by Brett Smith about 9 years ago
- Status changed from New to Resolved
- Assigned To changed from Tim Pierce to Brett Smith
- Target version deleted (
Bug Triage)
These happened during the time that compute nodes were registering themselves prematurely more often due to a configuration change. Since we addressed that—first temporarily by reverting the change, and now more permanently by making compute node registration more conservative—we haven't seen the issue recur. At this point, I feel pretty confident saying that was the root cause, and it's fixed, so I'm closing the bug. (Note that the cleanup script also exited 1, which should happen more or less never unless things are very broken.)
I did review the crunch-job code here to see if there were opportunities to provide better error messages. However, the code's pretty clean in this regard: it runs with -e, and never redirects the stderr of anything. Docker itself seems to be noisy enough (for example, if the Docker daemon isn't running, you get an error on stderr about that). It seems like whatever failed was intentionally very quiet about it.