Project

General

Profile

Actions

Bug #4012

closed

[Crunch] crunch-job bandaid: use eval/retry to wrap api calls that are likely to fail a long-running job due to a transient error condition

Added by Ward Vandewege over 9 years ago. Updated over 9 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
-
Target version:
Story points:
1.0

Subtasks 1 (0 open1 closed)

Task #4109: Review 4012-crunch-job-api-retries-wipResolvedPeter Amstutz10/05/2014Actions
Actions #1

Updated by Tom Clegg over 9 years ago

  • Subject changed from [Crunch] crunch-job bandaid: wrap api calls that can fail in an eval and retry to [Crunch] crunch-job bandaid: use eval/retry to wrap api calls that are likely to fail a long-running job due to a transient error condition
Actions #2

Updated by Ward Vandewege over 9 years ago

  • Assigned To set to Brett Smith
  • Story points changed from 0.5 to 1.0
Actions #3

Updated by Peter Amstutz over 9 years ago

This looks pretty good, just one comment on the algorithm. If the requests are timing out instead of failing fast, you may up waiting significantly longer than $timediff before giving up: a five minute job (300 seconds) would do 8 retries, if there is a 60 second timeout plus backoff that means it would spend well over 10 minutes retrying before giving up. Suggest tweaking retry_op() to try/retry (with backoff) until the wait time is exceeded, and at least three times? Something like:

my $wait = 1;
my $giveup_time = time + calculate_giveup_time();
while (time < $giveup_time) {
    sleep($wait) if $wait > 1;
    $wait *= 2; 
    my $result = eval { $operation->(@_); };
    if (!$@) {
      return $result;
    }
}
Actions #4

Updated by Brett Smith over 9 years ago

Peter Amstutz wrote:

If the requests are timing out instead of failing fast, you may up waiting significantly longer than $timediff before giving up

Yeah, that's an issue. Fixed in 7e35706 and ready for another look. Thanks.

Actions #5

Updated by Brett Smith over 9 years ago

  • Status changed from New to Resolved

Applied in changeset arvados|commit:344c6dcdbae76310879c85a736e4e6cce05d5645.

Actions

Also available in: Atom PDF