Project

General

Profile

Actions

Bug #10468

closed

[Keepstore] configurable timeout on blob storage requests

Added by Ward Vandewege over 7 years ago. Updated over 7 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
-
Target version:
Story points:
-

Subtasks 1 (0 open1 closed)

Task #10474: Review 10468-blob-storage-timeoutsResolvedPeter Amstutz11/07/2016Actions
Actions #1

Updated by Tom Clegg over 7 years ago

  • Status changed from New to In Progress
  • Assigned To set to Tom Clegg
Actions #2

Updated by Tom Clegg over 7 years ago

10468-blob-storage-timeouts

test 39536d8dd7f0a6ab89e106cd065830f1cbb067b1

Actions #3

Updated by Tom Clegg over 7 years ago

  • Target version set to 2016-11-09 sprint
Actions #4

Updated by Peter Amstutz over 7 years ago

  • Default timeout of 10 minutes seems unreasonably long. I can't think of a situation where you would actually want that behavior. Should be more like 2 minutes or even shorter (20 seconds?)
  • Azure has const azureDefaultRequestTimeout but S3 hardcodes defaults in S3Volume.Start().

Rest LGTM.

Actions #5

Updated by Tom Clegg over 7 years ago

Peter Amstutz wrote:

  • Default timeout of 10 minutes seems unreasonably long. I can't think of a situation where you would actually want that behavior. Should be more like 2 minutes or even shorter (20 seconds?)

10 minutes might be unreasonable for the installation you're thinking of, but 20 seconds might be unreasonably short for someone else's site (e.g., S3 requests often take >30 seconds on our test cluster). Rather than try to guess a useful-but-not-too-aggressive timeout for all setups/endpoints, I figured we should start with a long timeout: a too-long timeout doesn't break anything.

I propose we revisit the defaults/examples/recommendations after we have some real-world experience.

Meanwhile, the rationale for having a default timeout is really just to avoid holding resources forever if the server somehow doesn't get notified that a request has failed.

  • Azure has const azureDefaultRequestTimeout but S3 hardcodes defaults in S3Volume.Start().

Fixed, thanks.

Actions #6

Updated by Peter Amstutz over 7 years ago

Tom Clegg wrote:

Peter Amstutz wrote:

  • Default timeout of 10 minutes seems unreasonably long. I can't think of a situation where you would actually want that behavior. Should be more like 2 minutes or even shorter (20 seconds?)

10 minutes might be unreasonable for the installation you're thinking of, but 20 seconds might be unreasonably short for someone else's site (e.g., S3 requests often take >30 seconds on our test cluster). Rather than try to guess a useful-but-not-too-aggressive timeout for all setups/endpoints, I figured we should start with a long timeout: a too-long timeout doesn't break anything.

Well, in the Python SDK, the default connection timeout is 2 seconds and the read timeout is 256 seconds. So having the default timeouts for keepstore talking to blob store be an order of magnitude longer than the client timeouts is counterproductive because the SDK will have long since hung up.

I propose we revisit the defaults/examples/recommendations after we have some real-world experience.

I agree we should look at the logs and get some accurate numbers but it's not like we don't have lots of data already.

Meanwhile, the rationale for having a default timeout is really just to avoid holding resources forever if the server somehow doesn't get notified that a request has failed.

By that rationale the default timeout could be 75 years, which is also less than forever.

However please go ahead and merge, we can litigate the defaults later.

Actions #7

Updated by Tom Clegg over 7 years ago

Peter Amstutz wrote:

Well, in the Python SDK, the default connection timeout is 2 seconds and the read timeout is 256 seconds. So having the default timeouts for keepstore talking to blob store be an order of magnitude longer than the client timeouts is counterproductive because the SDK will have long since hung up.

Guessing the client's timeout isn't the right way to address the problem of releasing server resources after the client hangs up (see #10467)

Timeouts are last resorts. If we find ourselves fine-tuning timeouts, that's probably a sign something else needs to be fixed...

Actions #8

Updated by Tom Clegg over 7 years ago

  • Status changed from In Progress to Resolved
  • % Done changed from 0 to 100

Applied in changeset arvados|commit:c3cc1d58b64940a2bd79f27a9d0fdc50318dbb99.

Actions

Also available in: Atom PDF