Project

General

Profile

Actions

Bug #13959

closed

crunch-dispatch-slurm / Go SDK Dispatcher can block indefinitely on d.Arv.List("containers", params, &list)

Added by Joshua Randall over 5 years ago. Updated over 5 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
Crunch
Target version:
Story points:
-
Release:
Release relationship:
Auto

Description

In the main loop of the Dispatcher's checkForUpdates function, the API List request to get a batch of matching containers appears to sometimes block forever. I'm not sure why this happens, but it is probably due to a network or API server issue. In any case, there should be some client side timeout that prevents this loop from hanging.

Our current workaround is to have a cron job that calls `systemctl restart crunch-dispatch-slurm` on an hourly basis so that if the dispatcher gets hung it will be fixed at the next top of the hour.


Subtasks 1 (0 open1 closed)

Task #14055: Review 13959-timeouts-and-loggingResolvedTom Clegg08/21/2018Actions
Actions #1

Updated by Tom Morris over 5 years ago

  • Target version set to To Be Groomed
Actions #2

Updated by Tom Morris over 5 years ago

  • Target version changed from To Be Groomed to 2018-09-05 Sprint
Actions #3

Updated by Tom Clegg over 5 years ago

  • Assigned To set to Tom Clegg
Actions #4

Updated by Tom Clegg over 5 years ago

13959-timeouts-and-logging @ f739d736bbb60a8463f04f5d56c18d09157d820e
  • default 5-minute timeout (instead of no timeout) on API calls in sdk/go/arvadosclient (we already have this in sdk/go/arvados)
  • move crunch-dispatch-slurm, crunch-dispatch-local, and dispatch library logging to logrus, making it easier to add Debugf() for future debugging
  •         testWithServerStub(c, apiStubResponses, "echo",
    -               `After echo process termination, container state for Running is "zzzzz-dz642-xxxxxxxxxxxxxx2".  Updating it to "Cancelled"`)
    +               `after "echo" process termination, container state for zzzzz-dz642-xxxxxxxxxxxxxx2 is "Running"; updating it to "Cancelled"`)
    

https://ci.curoverse.com/job/developer-run-tests/858/

Actions #6

Updated by Peter Amstutz over 5 years ago

nit, from https://github.com/Sirupsen/logrus README:

It's in the past been possible to import Logrus as both upper- and lower-case. Due to the Go package environment, this caused issues in the community and we needed a standard. Some environments experienced problems with the upper-case variant, so the lower-case was decided. Everything using logrus will need to use the lower-case: github.com/sirupsen/logrus. Any package that isn't, should be changed.

Rest LGTM.

Actions #7

Updated by Tom Clegg over 5 years ago

  • Status changed from New to In Progress
Actions #8

Updated by Tom Clegg over 5 years ago

  • Status changed from In Progress to Resolved
Actions #9

Updated by Ward Vandewege over 5 years ago

  • Release set to 13
Actions

Also available in: Atom PDF