Bug #4012
closed[Crunch] crunch-job bandaid: use eval/retry to wrap api calls that are likely to fail a long-running job due to a transient error condition
100%
Updated by Tom Clegg over 8 years ago
- Subject changed from [Crunch] crunch-job bandaid: wrap api calls that can fail in an eval and retry to [Crunch] crunch-job bandaid: use eval/retry to wrap api calls that are likely to fail a long-running job due to a transient error condition
Updated by Ward Vandewege over 8 years ago
- Assigned To set to Brett Smith
- Story points changed from 0.5 to 1.0
Updated by Peter Amstutz over 8 years ago
This looks pretty good, just one comment on the algorithm. If the requests are timing out instead of failing fast, you may up waiting significantly longer than $timediff before giving up: a five minute job (300 seconds) would do 8 retries, if there is a 60 second timeout plus backoff that means it would spend well over 10 minutes retrying before giving up. Suggest tweaking retry_op() to try/retry (with backoff) until the wait time is exceeded, and at least three times? Something like:
my $wait = 1; my $giveup_time = time + calculate_giveup_time(); while (time < $giveup_time) { sleep($wait) if $wait > 1; $wait *= 2; my $result = eval { $operation->(@_); }; if (!$@) { return $result; } }
Updated by Brett Smith over 8 years ago
Peter Amstutz wrote:
If the requests are timing out instead of failing fast, you may up waiting significantly longer than $timediff before giving up
Yeah, that's an issue. Fixed in 7e35706 and ready for another look. Thanks.
Updated by Brett Smith over 8 years ago
- Status changed from New to Resolved
Applied in changeset arvados|commit:344c6dcdbae76310879c85a736e4e6cce05d5645.