Story #9005

[SDKs] Go SDK's arvadosclient and keepclient should share http.Client objects by default

Added by Tom Clegg over 1 year ago. Updated 4 months ago.

Status:ResolvedStart date:05/29/2017
Priority:NormalDue date:
Assignee:Tom Clegg% Done:

100%

Category:SDKs
Target version:2017-06-07 sprint
Story points1.0Remaining (hours)0.00 hour
Velocity based estimate-

Description

Background

As an example, keep-web currently
  • gets an arvadosclient.ArvadosClient from an arvadosclient.ClientPool
  • configures it with the current token
  • passes it to keepclient.MakeKeepClient() in order to get a keepclient.Client

(The public APIs for creating a keepclient.Client all involve passing in an arvadosclient.ArvadosClient, so we can't just make a pool of keepclient.Clients and assume they won't leak credentials across requests.)

At this point, keepclient.MakeKeepClient() creates a new http.Client object. This means
  • Consecutive keep-web requests cannot reuse http connections to keep services
  • (Currently) the HTTP connections to keep services are left open anyway, because we don't tell the http.Client that we won't be reusing it

In general, sharing an http.Client across requests is beneficial because it reduces TCP and TLS handshaking.

Proposed improvements

keepclient should use kc.Arvados.Client instead of its own Client field unless the caller specifies otherwise.
  • keepclient.New() should leave the Client field set to nil by default.
  • keepclient methods that use an http client should get it from a private httpClient() method that works along the lines of the one in source:sdk/go/arvados/client.go: use the KeepClient field if non-nil, otherwise the Arvados one.

(This change alone should reduce the proliferation of http.Client objects significantly.)

Also, arvadosclient.MakeArvadosClient should reuse the same two http.Client objects for all new ArvadosClient objects -- e.g., arvadosclient.DefaultHTTPClient when ApiInsecure is false, arvadosclient.DefaultInsecureHTTPClient when true. The caller can override it later if desired.

arvadosclient.DefaultHTTPClient should default to http.DefaultClient. (keepclient modifies client timeouts and keepalive settings according to proxy/non-proxy config, and we shouldn't muck with the http package globals this way.)

(This change should reduce the proliferation of http.Client objects further, especially in programs that don't use an arvados client pool.)


Subtasks

Task #11773: use RefreshKeepServers in keep-webResolvedTom Clegg

Task #11761: Review 9005-keep-http-clientResolvedPeter Amstutz

Task #11772: Share http.Client and http.Transport by defaultResolvedTom Clegg

Task #11795: Review 9005-share-discoveryResolvedTom Clegg

Associated revisions

Revision b51781b9
Added by Tom Clegg over 1 year ago

Merge branch '9004-close-keep-connections'

refs #9004
refs #9005

Revision 303424c3
Added by Tom Clegg 4 months ago

Merge branch '9005-conn-leak'

refs #9005

Revision 138eedbe
Added by Tom Clegg 4 months ago

Merge branch '9005-disable-keepalive'

refs #9005
refs #11726
refs #11729

Revision 63cfe7a9
Added by Tom Clegg 4 months ago

Merge branch '9005-keep-http-client'

refs #9005

Revision cb230b07
Added by Tom Clegg 4 months ago

Merge branch '9005-share-discovery'

closes #9005

History

#1 Updated by Tom Clegg over 1 year ago

  • Description updated (diff)

#2 Updated by Brett Smith over 1 year ago

  • Tracker changed from Bug to Story

#3 Updated by Brett Smith over 1 year ago

  • Target version set to Arvados Future Sprints

#4 Updated by Peter Amstutz 4 months ago

A couple thoughts:

API client and keepstore have different socket settings, which probably why they were not combined in the first place.

The pointer to the Arvados client object a public field of the KeepClient struct. If it's okay to pool ArvadosClient objects and update the ApiToken field, then it should be okay to pool the KeepClient objects and update the Arvados field.

#5 Updated by Tom Clegg 4 months ago

  • Description updated (diff)

#6 Updated by Tom Clegg 4 months ago

  • Description updated (diff)

Peter Amstutz wrote:

API client and keepstore have different socket settings, which probably why they were not combined in the first place.

I assume you're referring to the code that sets timeout and keepalive settings in source:sdk/go/keepclient/support.go after seeing the list of keep services.

It would seem rude to alter http.DefaultClient this way, but I think it would be reasonable if the default behavior used just one http.Client object per arvadosclient -- i.e., keepclient should default to using arvadosclient's http.Client instead of creating its own, and it should not do anything that sabotages arvadosclient's own use of that http.Client.

The pointer to the Arvados client object a public field of the KeepClient struct. If it's okay to pool ArvadosClient objects and update the ApiToken field, then it should be okay to pool the KeepClient objects and update the Arvados field.

True (at least with the current code). I think it would be even better if keepclients were efficient enough to make pools irrelevant. Perhaps if the list of keep services were cached in arvadosclient instead of keepclient (and we make the above changes to http.Client usage) then there would be no remaining motivation for pooling?

(I suppose ideally clients like keep-web shouldn't even need to know about a "keepclient" thing -- they should just say stuff like arv.GetCollection(uuid).Open(filename))

#7 Updated by Tom Clegg 4 months ago

  • Category set to Keep
  • Status changed from New to In Progress
  • Assignee set to Tom Clegg

#8 Updated by Peter Amstutz 4 months ago

  • Category deleted (Keep)
  • Status changed from In Progress to New
  • Assignee deleted (Tom Clegg)

Tom Clegg wrote:

Peter Amstutz wrote:

API client and keepstore have different socket settings, which probably why they were not combined in the first place.

I assume you're referring to the code that sets timeout and keepalive settings in source:sdk/go/keepclient/support.go after seeing the list of keep services.

It would seem rude to alter http.DefaultClient this way, but I think it would be reasonable if the default behavior used just one http.Client object per arvadosclient -- i.e., keepclient should default to using arvadosclient's http.Client instead of creating its own, and it should not do anything that sabotages arvadosclient's own use of that http.Client.

I don't see how we can share the same HTTP client for both API requests and keep blocks when we have settings like this:

client.Timeout = 20 * time.Second

We allow API calls to take a lot longer than that to return results.

The pointer to the Arvados client object a public field of the KeepClient struct. If it's okay to pool ArvadosClient objects and update the ApiToken field, then it should be okay to pool the KeepClient objects and update the Arvados field.

True (at least with the current code). I think it would be even better if keepclients were efficient enough to make pools irrelevant. Perhaps if the list of keep services were cached in arvadosclient instead of keepclient (and we make the above changes to http.Client usage) then there would be no remaining motivation for pooling?

Keeping the list of keep services on ArvadosClient would be an improvement, but it still needs to be updatable (I'm thinking the keep service roots should get their own struct with a lock that ArvadosClient can point to). Although we can do that and have the keep services list stay on the keep client object, and then the keep client object is safely copyable again.

#9 Updated by Tom Clegg 4 months ago

Peter Amstutz wrote:

I don't see how we can share the same HTTP client for both API requests and keep blocks when we have settings like this:

Good point. ArvadosClient would need two http.Clients, one for Keep and one for API.

Keeping the list of keep services on ArvadosClient would be an improvement, but it still needs to be updatable (I'm thinking the keep service roots should get their own struct with a lock that ArvadosClient can point to). Although we can do that and have the keep services list stay on the keep client object, and then the keep client object is safely copyable again.

In order for this to be a reasonable experience for callers I think we just have to use the appropriate number of http.Client objects, discovery doc requests, and keep_services/accessible requests, no matter how many times the caller does this:

arv := arvados.NewClientFromEnv()
arv.Token = token
arv.KeepClient().Get(...)

It should be trivial for callers to get efficient resource usage without juggling pools of ArvadosClient and KeepClient objects.

#10 Updated by Tom Clegg 4 months ago

Some related fixes:

9005-conn-leak @ cad24bba2240b47f59bc5719a035e85ff5eb60ef

#11 Updated by Tom Morris 4 months ago

  • Target version changed from Arvados Future Sprints to 2017-06-07 sprint

#12 Updated by Tom Morris 4 months ago

  • Assignee set to Tom Clegg

#13 Updated by Tom Clegg 4 months ago

  • Category set to SDKs
  • Status changed from New to In Progress

#14 Updated by Tom Clegg 4 months ago

9005-keep-http-client @ 6fe6390690471cee8ba23984e3560fc4ced8b180

I'd also like to make keep-web remember the list of keep services from one request to the next. In the meantime the above branch should bring back HTTP keepalive without leaking unlimited FDs.

#15 Updated by Peter Amstutz 4 months ago

Seems like this should be in sync with either the "disk" or "non-disk" keep client settings. How about declaring some constants in keepclient.go for all the various timeouts just so its a bit more visible, then use those constants here?

    h := &proxyHandler{
        Handler:    rest,
        KeepClient: kc,
        timeout:    timeout,
        transport: &http.Transport{
            Dial: (&net.Dialer{
                Timeout:   20 * time.Second,
                KeepAlive: 10 * time.Second,
            }).Dial,
            TLSClientConfig:     arvadosclient.MakeTLSConfig(kc.Arvados.ApiInsecure),
            TLSHandshakeTimeout: 10 * time.Second,
        },
        ApiTokenCache: &ApiTokenCache{
            tokens:     make(map[string]int64),
            expireTime: 300,
        },
    }

#16 Updated by Peter Amstutz 4 months ago

Also, instead of spinning up a goroutine to periodically call CloseIdleConnections() there are settings on http.Transport we could use instead:

https://golang.org/pkg/net/http/

 For control over proxies, TLS configuration, keep-alives, compression, and other settings, create a Transport:

tr := &http.Transport{
    MaxIdleConns:       10,
    IdleConnTimeout:    30 * time.Second,
    DisableCompression: true,
}
client := &http.Client{Transport: tr}
resp, err := client.Get("https://example.com")

#17 Updated by Tom Clegg 4 months ago

9005-keep-http-client @ 75c5b123e0b4cbfebed9b15364a97c2209f94740

SDK & keepproxy now copy http.DefaultTransport and modify the timeouts instead of building from scratch, so we can benefit from the default Go settings (non-zero MaxIdleConns, etc).

#18 Updated by Tom Clegg 4 months ago

9005-share-discovery @ 916cf89b440fd13a9b9c055d817b34d339442ea3

TBD
  • keepclient.ClearCache() - better name? maybe RefreshServiceDiscovery()?

#19 Updated by Peter Amstutz 4 months ago

So the strategy is to keep a global map of API server address to keep service list, spin up a goroutine which keeps that service list up to date, and request the latest service list on a channel whenever we need it.

  • It looks like it uses the first API token passed to it as the API token for subsequent keep services requests? That should probably be reflected in the comment to MakeKeepClient().
  • Agree that "ClearCache" is not a good name. RefreshServiceDiscovery() is fine, or ResetServiceCache()
  • Nitpick, inconsistent implementation of parsing booleans:

NewClientFromEnv:

strings.ToLower(os.Getenv("ARVADOS_API_HOST_INSECURE")); s == "1" || s == "yes" || s == "true" 

MakeArvadosClient:

     var matchTrue = regexp.MustCompile("^(?i:1|yes|true)$")

Workbench CollectionUploadTest is failing.

#20 Updated by Tom Clegg 4 months ago

9005-share-discovery @ 0f5295ae31362eefe182f3a2329b3903d6f82a3b
  • Made a "StringBool" function and used that everywhere I found a copy&pasted matchTrue regexp.
  • Renamed ClearCache to RefreshServiceDiscovery
  • Punted RWMutex change

passed https://ci.curoverse.com/job/developer-run-tests/313/

Seems like it'll be a problem to remember the first token we see for a given API host and reuse it forever.

Is there any reason we shouldn't make the /arvados/v1/keep_services/accessible API exempt from the token-checking middleware, like we do for the discovery doc?

#21 Updated by Peter Amstutz 4 months ago

Tom Clegg wrote:

Seems like it'll be a problem to remember the first token we see for a given API host and reuse it forever.

Is there any reason we shouldn't make the /arvados/v1/keep_services/accessible API exempt from the token-checking middleware, like we do for the discovery doc?

The SDK will need to be backwards compatible with current API servers that require a token. Otherwise I don't have a problem with making the "accessible" route exempt from token checking.

#22 Updated by Peter Amstutz 4 months ago

Another option is to use the anonymous user token, which could be published in the discovery doc if it isn't already?

#23 Updated by Tom Clegg 4 months ago

9005-share-discovery @ 5d03f6499055ef109ca2c8d5d59941b25de1fa47
  • ignores auth status for keep_services/accessible, so it doesn't matter which token the client uses.
  • ideally we'd clear the api token in keepclient/discover.go and omit the Auth header if token is empty... but for now it seems more useful to at least have a chance of working with older API servers.

Probably the sanest way of dealing with the upgrade is to add a note to Upgrading to master, "if you upgrade keep-web, make sure to upgrade API server too" ...?

#25 Updated by Tom Clegg 4 months ago

  • Status changed from In Progress to Resolved
  • % Done changed from 75 to 100

Applied in changeset arvados|commit:cb230b07e0125d819991bc74a1f528740068157d.

Also available in: Atom PDF