Bug #9996

[keep-balance] Stop retrieving collections from API if the run is going to be aborted anyway

Added by Tom Clegg about 5 years ago. Updated about 5 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
-
Category:
Keep
Target version:
-
Start date:
09/08/2016
Due date:
% Done:

100%

Estimated time:
Story points:
-

Description

Background

Currently, if one of the srv.Index() goroutines encounters an error, GetCurrentState returns, but its other goroutines keep running. This is wasteful (it puts load on the API server, and the results will never be used) and makes logs confusing (you can get interleaved "collections: x/y" messages from the doomed run and a subsequent run).

The EachCollection loop already checks len(errs)>0, but len(errs)>0 is only true for a very short time after the first error because "return <-err" consumes it. Therefore, if only one error happens, the EachCollection loop probably won't realize that it should stop.

Proposed fix

At the end of GetCurrentState, don't call wg.Wait() from a goroutine and rely on errs to decide when to return. Instead, call wg.Wait() and then check len(errs) to decide whether to return <-err or nil.


Related issues

Related to Arvados - Bug #9918: keep-balance fails with "Malformed index line" errorResolved09/01/2016

Associated revisions

Revision 78468434
Added by Tom Clegg about 5 years ago

Merge branch '9996-stop-on-error'

closes #9996

History

#1 Updated by Tom Clegg about 5 years ago

  • Status changed from New to Resolved
  • % Done changed from 0 to 100

Applied in changeset arvados|commit:7846843453df9846c346f85c20a8d6d051066f52.

#2 Updated by Joshua Randall about 5 years ago

Thanks, Tom!

Also available in: Atom PDF