Project

General

Profile

Actions

Bug #4012

closed

[Crunch] crunch-job bandaid: use eval/retry to wrap api calls that are likely to fail a long-running job due to a transient error condition

Added by Ward Vandewege almost 10 years ago. Updated almost 10 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
-
Target version:
Story points:
1.0

Subtasks 1 (0 open1 closed)

Task #4109: Review 4012-crunch-job-api-retries-wipResolvedPeter Amstutz10/05/2014Actions
Actions #1

Updated by Tom Clegg almost 10 years ago

  • Subject changed from [Crunch] crunch-job bandaid: wrap api calls that can fail in an eval and retry to [Crunch] crunch-job bandaid: use eval/retry to wrap api calls that are likely to fail a long-running job due to a transient error condition
Actions #2

Updated by Ward Vandewege almost 10 years ago

  • Assigned To set to Brett Smith
  • Story points changed from 0.5 to 1.0
Actions #3

Updated by Peter Amstutz almost 10 years ago

This looks pretty good, just one comment on the algorithm. If the requests are timing out instead of failing fast, you may up waiting significantly longer than $timediff before giving up: a five minute job (300 seconds) would do 8 retries, if there is a 60 second timeout plus backoff that means it would spend well over 10 minutes retrying before giving up. Suggest tweaking retry_op() to try/retry (with backoff) until the wait time is exceeded, and at least three times? Something like:

my $wait = 1;
my $giveup_time = time + calculate_giveup_time();
while (time < $giveup_time) {
    sleep($wait) if $wait > 1;
    $wait *= 2; 
    my $result = eval { $operation->(@_); };
    if (!$@) {
      return $result;
    }
}
Actions #4

Updated by Brett Smith almost 10 years ago

Peter Amstutz wrote:

If the requests are timing out instead of failing fast, you may up waiting significantly longer than $timediff before giving up

Yeah, that's an issue. Fixed in 7e35706 and ready for another look. Thanks.

Actions #5

Updated by Brett Smith almost 10 years ago

  • Status changed from New to Resolved

Applied in changeset arvados|commit:344c6dcdbae76310879c85a736e4e6cce05d5645.

Actions

Also available in: Atom PDF