Actions
Bug #13959
closedcrunch-dispatch-slurm / Go SDK Dispatcher can block indefinitely on d.Arv.List("containers", params, &list)
Story points:
-
Release:
Release relationship:
Auto
Description
In the main loop of the Dispatcher's checkForUpdates function, the API List request to get a batch of matching containers appears to sometimes block forever. I'm not sure why this happens, but it is probably due to a network or API server issue. In any case, there should be some client side timeout that prevents this loop from hanging.
Our current workaround is to have a cron job that calls `systemctl restart crunch-dispatch-slurm` on an hourly basis so that if the dispatcher gets hung it will be fixed at the next top of the hour.
Actions