Bug #13959
closedcrunch-dispatch-slurm / Go SDK Dispatcher can block indefinitely on d.Arv.List("containers", params, &list)
Description
In the main loop of the Dispatcher's checkForUpdates function, the API List request to get a batch of matching containers appears to sometimes block forever. I'm not sure why this happens, but it is probably due to a network or API server issue. In any case, there should be some client side timeout that prevents this loop from hanging.
Our current workaround is to have a cron job that calls `systemctl restart crunch-dispatch-slurm` on an hourly basis so that if the dispatcher gets hung it will be fixed at the next top of the hour.
Updated by Tom Morris over 6 years ago
- Target version changed from To Be Groomed to 2018-09-05 Sprint
Updated by Tom Clegg over 6 years ago
- default 5-minute timeout (instead of no timeout) on API calls in sdk/go/arvadosclient (we already have this in sdk/go/arvados)
- move crunch-dispatch-slurm, crunch-dispatch-local, and dispatch library logging to logrus, making it easier to add Debugf() for future debugging
testWithServerStub(c, apiStubResponses, "echo", - `After echo process termination, container state for Running is "zzzzz-dz642-xxxxxxxxxxxxxx2". Updating it to "Cancelled"`) + `after "echo" process termination, container state for zzzzz-dz642-xxxxxxxxxxxxxx2 is "Running"; updating it to "Cancelled"`)
Updated by Tom Clegg over 6 years ago
13959-timeouts-and-logging @ 3ae5b174a710fca21895a1ad7273b6f61bf9ac1e https://ci.curoverse.com/job/developer-run-tests/861/
Updated by Peter Amstutz over 6 years ago
nit, from https://github.com/Sirupsen/logrus README:
It's in the past been possible to import Logrus as both upper- and lower-case. Due to the Go package environment, this caused issues in the community and we needed a standard. Some environments experienced problems with the upper-case variant, so the lower-case was decided. Everything using logrus will need to use the lower-case: github.com/sirupsen/logrus. Any package that isn't, should be changed.
Rest LGTM.
Updated by Tom Clegg over 6 years ago
- Status changed from In Progress to Resolved
Applied in changeset arvados|794cecd80b3fc3c9812687fa9e4e269fb1e79c60.