Bug #10468
closed[Keepstore] configurable timeout on blob storage requests
Updated by Tom Clegg about 8 years ago
- Status changed from New to In Progress
- Assigned To set to Tom Clegg
Updated by Tom Clegg about 8 years ago
10468-blob-storage-timeouts
Updated by Peter Amstutz about 8 years ago
- Default timeout of 10 minutes seems unreasonably long. I can't think of a situation where you would actually want that behavior. Should be more like 2 minutes or even shorter (20 seconds?)
- Azure has
const azureDefaultRequestTimeout
but S3 hardcodes defaults inS3Volume.Start()
.
Rest LGTM.
Updated by Tom Clegg about 8 years ago
Peter Amstutz wrote:
- Default timeout of 10 minutes seems unreasonably long. I can't think of a situation where you would actually want that behavior. Should be more like 2 minutes or even shorter (20 seconds?)
10 minutes might be unreasonable for the installation you're thinking of, but 20 seconds might be unreasonably short for someone else's site (e.g., S3 requests often take >30 seconds on our test cluster). Rather than try to guess a useful-but-not-too-aggressive timeout for all setups/endpoints, I figured we should start with a long timeout: a too-long timeout doesn't break anything.
I propose we revisit the defaults/examples/recommendations after we have some real-world experience.
Meanwhile, the rationale for having a default timeout is really just to avoid holding resources forever if the server somehow doesn't get notified that a request has failed.
- Azure has
const azureDefaultRequestTimeout
but S3 hardcodes defaults inS3Volume.Start()
.
Fixed, thanks.
Updated by Peter Amstutz about 8 years ago
Tom Clegg wrote:
Peter Amstutz wrote:
- Default timeout of 10 minutes seems unreasonably long. I can't think of a situation where you would actually want that behavior. Should be more like 2 minutes or even shorter (20 seconds?)
10 minutes might be unreasonable for the installation you're thinking of, but 20 seconds might be unreasonably short for someone else's site (e.g., S3 requests often take >30 seconds on our test cluster). Rather than try to guess a useful-but-not-too-aggressive timeout for all setups/endpoints, I figured we should start with a long timeout: a too-long timeout doesn't break anything.
Well, in the Python SDK, the default connection timeout is 2 seconds and the read timeout is 256 seconds. So having the default timeouts for keepstore talking to blob store be an order of magnitude longer than the client timeouts is counterproductive because the SDK will have long since hung up.
I propose we revisit the defaults/examples/recommendations after we have some real-world experience.
I agree we should look at the logs and get some accurate numbers but it's not like we don't have lots of data already.
Meanwhile, the rationale for having a default timeout is really just to avoid holding resources forever if the server somehow doesn't get notified that a request has failed.
By that rationale the default timeout could be 75 years, which is also less than forever.
However please go ahead and merge, we can litigate the defaults later.
Updated by Tom Clegg about 8 years ago
Peter Amstutz wrote:
Well, in the Python SDK, the default connection timeout is 2 seconds and the read timeout is 256 seconds. So having the default timeouts for keepstore talking to blob store be an order of magnitude longer than the client timeouts is counterproductive because the SDK will have long since hung up.
Guessing the client's timeout isn't the right way to address the problem of releasing server resources after the client hangs up (see #10467)
Timeouts are last resorts. If we find ourselves fine-tuning timeouts, that's probably a sign something else needs to be fixed...
Updated by Tom Clegg about 8 years ago
- Status changed from In Progress to Resolved
- % Done changed from 0 to 100
Applied in changeset arvados|commit:c3cc1d58b64940a2bd79f27a9d0fdc50318dbb99.