crunch-dispatch-slurm / Go SDK Dispatcher can block indefinitely on d.Arv.List("containers", params, &list)
In the main loop of the Dispatcher's checkForUpdates function, the API List request to get a batch of matching containers appears to sometimes block forever. I'm not sure why this happens, but it is probably due to a network or API server issue. In any case, there should be some client side timeout that prevents this loop from hanging.
Our current workaround is to have a cron job that calls `systemctl restart crunch-dispatch-slurm` on an hourly basis so that if the dispatcher gets hung it will be fixed at the next top of the hour.
Updated by Tom Clegg about 4 years ago
- default 5-minute timeout (instead of no timeout) on API calls in sdk/go/arvadosclient (we already have this in sdk/go/arvados)
- move crunch-dispatch-slurm, crunch-dispatch-local, and dispatch library logging to logrus, making it easier to add Debugf() for future debugging
testWithServerStub(c, apiStubResponses, "echo", - `After echo process termination, container state for Running is "zzzzz-dz642-xxxxxxxxxxxxxx2". Updating it to "Cancelled"`) + `after "echo" process termination, container state for zzzzz-dz642-xxxxxxxxxxxxxx2 is "Running"; updating it to "Cancelled"`)
Updated by Peter Amstutz about 4 years ago
nit, from https://github.com/Sirupsen/logrus README:
It's in the past been possible to import Logrus as both upper- and lower-case. Due to the Go package environment, this caused issues in the community and we needed a standard. Some environments experienced problems with the upper-case variant, so the lower-case was decided. Everything using logrus will need to use the lower-case: github.com/sirupsen/logrus. Any package that isn't, should be changed.