Bug #4012
closed[Crunch] crunch-job bandaid: use eval/retry to wrap api calls that are likely to fail a long-running job due to a transient error condition
Updated by Tom Clegg about 10 years ago
- Subject changed from [Crunch] crunch-job bandaid: wrap api calls that can fail in an eval and retry to [Crunch] crunch-job bandaid: use eval/retry to wrap api calls that are likely to fail a long-running job due to a transient error condition
Updated by Ward Vandewege about 10 years ago
- Assigned To set to Brett Smith
- Story points changed from 0.5 to 1.0
Updated by Peter Amstutz about 10 years ago
This looks pretty good, just one comment on the algorithm. If the requests are timing out instead of failing fast, you may up waiting significantly longer than $timediff before giving up: a five minute job (300 seconds) would do 8 retries, if there is a 60 second timeout plus backoff that means it would spend well over 10 minutes retrying before giving up. Suggest tweaking retry_op() to try/retry (with backoff) until the wait time is exceeded, and at least three times? Something like:
my $wait = 1; my $giveup_time = time + calculate_giveup_time(); while (time < $giveup_time) { sleep($wait) if $wait > 1; $wait *= 2; my $result = eval { $operation->(@_); }; if (!$@) { return $result; } }
Updated by Brett Smith about 10 years ago
Peter Amstutz wrote:
If the requests are timing out instead of failing fast, you may up waiting significantly longer than $timediff before giving up
Yeah, that's an issue. Fixed in 7e35706 and ready for another look. Thanks.
Updated by Brett Smith about 10 years ago
- Status changed from New to Resolved
Applied in changeset arvados|commit:344c6dcdbae76310879c85a736e4e6cce05d5645.