Project

General

Profile

Actions

Bug #16480

closed

[keep-balance] should not timeout/fail when keepstore index takes more than 5 minutes

Added by Tom Clegg almost 4 years ago. Updated over 3 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
Keep
Target version:
Story points:
-
Release relationship:
Auto

Description

Currently sdk/go/arvados.Client uses a 5-minute request timeout when the caller (like keep-balance) doesn't provide a custom *http.Client. This isn't enough time to complete a keepstore index request on a cloud install with lots of data. When the timeout expires, keep-balance just gives up on the current iteration, goes to sleep, and waits for the next one, which is unlikely to fare any better.

It may be worth having an idle timer to detect hung connections, but aside from that keep-balance should wait as long as it takes for keepstore to return an index response.


Subtasks 1 (0 open1 closed)

Task #16486: Review 16480-keep-balance-index-timeoutResolvedWard Vandewege06/16/2020Actions
Actions #2

Updated by Tom Clegg almost 4 years ago

  • Target version set to 2020-06-17 Sprint
  • Assigned To set to Tom Clegg
Actions #4

Updated by Tom Clegg almost 4 years ago

  • Status changed from New to In Progress
Actions #5

Updated by Tom Clegg almost 4 years ago

16480-keep-balance-index-timeout @ 2bc1a7a89597ab02aaeef84b82fdc51f8e375b79 -- developer-run-tests: #1919

This changes the timeout to 24h. I'm inclined to punt on the "idle timer to detect hung connections" part.

Actions #6

Updated by Ward Vandewege almost 4 years ago

Tom Clegg wrote:

16480-keep-balance-index-timeout @ 2bc1a7a89597ab02aaeef84b82fdc51f8e375b79 -- developer-run-tests: #1919

This changes the timeout to 24h. I'm inclined to punt on the "idle timer to detect hung connections" part.

OK to punt on the "idle timer", that's a nice to have. But maybe shorten the timeout a bit, 24 hours seems a bit excessive?

Review comment:

Removing the 5 minute timeout from the definition of DefaultSecureClient and InsecureHTTPClient means that the setup() function in lib/controller/handler.go now creates a secureClient and insecureClient without timeout. Is that intentional? Since DefaultSecureClient and InsecureHTTPClient are exported, shouldn't there be a default timeout on them?

Otherwise, LGTM, thanks.

Actions #7

Updated by Tom Clegg almost 4 years ago

  • Target version changed from 2020-06-17 Sprint to 2020-07-01 Sprint
Actions #8

Updated by Tom Clegg almost 4 years ago

Ward Vandewege wrote:

OK to punt on the "idle timer", that's a nice to have. But maybe shorten the timeout a bit, 24 hours seems a bit excessive?

I suppose the only way to avoid choosing a bad timeout is to make it configurable so the admin can choose a bad timeout themselves, so I made a configurable timeout for the entire keep-balance operation, defaulting to 6h. Is that a good compromise between annoyingly-long-for-small-sites and annoyingly-short-for-big-sites?

Removing the 5 minute timeout from the definition of DefaultSecureClient and InsecureHTTPClient means that the setup() function in lib/controller/handler.go now creates a secureClient and insecureClient without timeout. Is that intentional? Since DefaultSecureClient and InsecureHTTPClient are exported, shouldn't there be a default timeout on them?

Controller already uses a context deadline on incoming requests to enforce API.RequestTimeout, and propagates that context to outgoing proxy requests. So we were effectively limiting outgoing reqs min(5m, RequestTimeout) -- using only RequestTimeout seems better.

Generally, I'm thinking it's OK to change the exported default clients' timeouts to zero -- that's how http.DefaultClient does it, probably a hint that we should let the caller provide a timeout/context when needed.

16480-keep-balance-index-timeout @ fd080b34a321cbd6593d69f427b9eaeab890712f -- developer-run-tests: #1925

Actions #9

Updated by Ward Vandewege almost 4 years ago

Tom Clegg wrote:

Ward Vandewege wrote:

OK to punt on the "idle timer", that's a nice to have. But maybe shorten the timeout a bit, 24 hours seems a bit excessive?

I suppose the only way to avoid choosing a bad timeout is to make it configurable so the admin can choose a bad timeout themselves, so I made a configurable timeout for the entire keep-balance operation, defaulting to 6h. Is that a good compromise between annoyingly-long-for-small-sites and annoyingly-short-for-big-sites?

Removing the 5 minute timeout from the definition of DefaultSecureClient and InsecureHTTPClient means that the setup() function in lib/controller/handler.go now creates a secureClient and insecureClient without timeout. Is that intentional? Since DefaultSecureClient and InsecureHTTPClient are exported, shouldn't there be a default timeout on them?

Controller already uses a context deadline on incoming requests to enforce API.RequestTimeout, and propagates that context to outgoing proxy requests. So we were effectively limiting outgoing reqs min(5m, RequestTimeout) -- using only RequestTimeout seems better.

Generally, I'm thinking it's OK to change the exported default clients' timeouts to zero -- that's how http.DefaultClient does it, probably a hint that we should let the caller provide a timeout/context when needed.

16480-keep-balance-index-timeout @ fd080b34a321cbd6593d69f427b9eaeab890712f -- developer-run-tests: #1925

Cool, this LGTM.

Actions #10

Updated by Anonymous almost 4 years ago

  • % Done changed from 0 to 100
  • Status changed from In Progress to Resolved
Actions #11

Updated by Peter Amstutz over 3 years ago

  • Release set to 25
Actions

Also available in: Atom PDF