Story #9005

[SDKs] Go SDK's arvadosclient and keepclient should share http.Client objects by default

Added by Tom Clegg about 1 year ago. Updated 4 days ago.

Status:NewStart date:
Priority:NormalDue date:
Assignee:Tom Clegg% Done:

0%

Category:-
Target version:2017-06-07 sprint
Story points1.0Remaining (hours)0.00 hour
Velocity based estimate-

Description

Background

As an example, keep-web currently
  • gets an arvadosclient.ArvadosClient from an arvadosclient.ClientPool
  • configures it with the current token
  • passes it to keepclient.MakeKeepClient() in order to get a keepclient.Client

(The public APIs for creating a keepclient.Client all involve passing in an arvadosclient.ArvadosClient, so we can't just make a pool of keepclient.Clients and assume they won't leak credentials across requests.)

At this point, keepclient.MakeKeepClient() creates a new http.Client object. This means
  • Consecutive keep-web requests cannot reuse http connections to keep services
  • (Currently) the HTTP connections to keep services are left open anyway, because we don't tell the http.Client that we won't be reusing it

In general, sharing an http.Client across requests is beneficial because it reduces TCP and TLS handshaking.

Proposed improvements

keepclient should use kc.Arvados.Client instead of its own Client field unless the caller specifies otherwise.
  • keepclient.New() should leave the Client field set to nil by default.
  • keepclient methods that use an http client should get it from a private httpClient() method that works along the lines of the one in source:sdk/go/arvados/client.go: use the KeepClient field if non-nil, otherwise the Arvados one.

(This change alone should reduce the proliferation of http.Client objects significantly.)

Also, arvadosclient.MakeArvadosClient should reuse the same two http.Client objects for all new ArvadosClient objects -- e.g., arvadosclient.DefaultHTTPClient when ApiInsecure is false, arvadosclient.DefaultInsecureHTTPClient when true. The caller can override it later if desired.

arvadosclient.DefaultHTTPClient should default to http.DefaultClient. (keepclient modifies client timeouts and keepalive settings according to proxy/non-proxy config, and we shouldn't muck with the http package globals this way.)

(This change should reduce the proliferation of http.Client objects further, especially in programs that don't use an arvados client pool.)


Subtasks

Task #11761: ReviewNewPeter Amstutz

Associated revisions

Revision b51781b9
Added by Tom Clegg about 1 year ago

Merge branch '9004-close-keep-connections'

refs #9004
refs #9005

Revision 303424c3
Added by Tom Clegg 9 days ago

Merge branch '9005-conn-leak'

refs #9005

Revision 138eedbe
Added by Tom Clegg 9 days ago

Merge branch '9005-disable-keepalive'

refs #9005
refs #11726
refs #11729

History

#1 Updated by Tom Clegg about 1 year ago

  • Description updated (diff)

#2 Updated by Brett Smith about 1 year ago

  • Tracker changed from Bug to Story

#3 Updated by Brett Smith about 1 year ago

  • Target version set to Arvados Future Sprints

#4 Updated by Peter Amstutz 10 days ago

A couple thoughts:

API client and keepstore have different socket settings, which probably why they were not combined in the first place.

The pointer to the Arvados client object a public field of the KeepClient struct. If it's okay to pool ArvadosClient objects and update the ApiToken field, then it should be okay to pool the KeepClient objects and update the Arvados field.

#5 Updated by Tom Clegg 10 days ago

  • Description updated (diff)

#6 Updated by Tom Clegg 10 days ago

  • Description updated (diff)

Peter Amstutz wrote:

API client and keepstore have different socket settings, which probably why they were not combined in the first place.

I assume you're referring to the code that sets timeout and keepalive settings in source:sdk/go/keepclient/support.go after seeing the list of keep services.

It would seem rude to alter http.DefaultClient this way, but I think it would be reasonable if the default behavior used just one http.Client object per arvadosclient -- i.e., keepclient should default to using arvadosclient's http.Client instead of creating its own, and it should not do anything that sabotages arvadosclient's own use of that http.Client.

The pointer to the Arvados client object a public field of the KeepClient struct. If it's okay to pool ArvadosClient objects and update the ApiToken field, then it should be okay to pool the KeepClient objects and update the Arvados field.

True (at least with the current code). I think it would be even better if keepclients were efficient enough to make pools irrelevant. Perhaps if the list of keep services were cached in arvadosclient instead of keepclient (and we make the above changes to http.Client usage) then there would be no remaining motivation for pooling?

(I suppose ideally clients like keep-web shouldn't even need to know about a "keepclient" thing -- they should just say stuff like arv.GetCollection(uuid).Open(filename))

#7 Updated by Tom Clegg 9 days ago

  • Category set to Keep
  • Status changed from New to In Progress
  • Assignee set to Tom Clegg

#8 Updated by Peter Amstutz 9 days ago

  • Category deleted (Keep)
  • Status changed from In Progress to New
  • Assignee deleted (Tom Clegg)

Tom Clegg wrote:

Peter Amstutz wrote:

API client and keepstore have different socket settings, which probably why they were not combined in the first place.

I assume you're referring to the code that sets timeout and keepalive settings in source:sdk/go/keepclient/support.go after seeing the list of keep services.

It would seem rude to alter http.DefaultClient this way, but I think it would be reasonable if the default behavior used just one http.Client object per arvadosclient -- i.e., keepclient should default to using arvadosclient's http.Client instead of creating its own, and it should not do anything that sabotages arvadosclient's own use of that http.Client.

I don't see how we can share the same HTTP client for both API requests and keep blocks when we have settings like this:

client.Timeout = 20 * time.Second

We allow API calls to take a lot longer than that to return results.

The pointer to the Arvados client object a public field of the KeepClient struct. If it's okay to pool ArvadosClient objects and update the ApiToken field, then it should be okay to pool the KeepClient objects and update the Arvados field.

True (at least with the current code). I think it would be even better if keepclients were efficient enough to make pools irrelevant. Perhaps if the list of keep services were cached in arvadosclient instead of keepclient (and we make the above changes to http.Client usage) then there would be no remaining motivation for pooling?

Keeping the list of keep services on ArvadosClient would be an improvement, but it still needs to be updatable (I'm thinking the keep service roots should get their own struct with a lock that ArvadosClient can point to). Although we can do that and have the keep services list stay on the keep client object, and then the keep client object is safely copyable again.

#9 Updated by Tom Clegg 9 days ago

Peter Amstutz wrote:

I don't see how we can share the same HTTP client for both API requests and keep blocks when we have settings like this:

Good point. ArvadosClient would need two http.Clients, one for Keep and one for API.

Keeping the list of keep services on ArvadosClient would be an improvement, but it still needs to be updatable (I'm thinking the keep service roots should get their own struct with a lock that ArvadosClient can point to). Although we can do that and have the keep services list stay on the keep client object, and then the keep client object is safely copyable again.

In order for this to be a reasonable experience for callers I think we just have to use the appropriate number of http.Client objects, discovery doc requests, and keep_services/accessible requests, no matter how many times the caller does this:

arv := arvados.NewClientFromEnv()
arv.Token = token
arv.KeepClient().Get(...)

It should be trivial for callers to get efficient resource usage without juggling pools of ArvadosClient and KeepClient objects.

#10 Updated by Tom Clegg 9 days ago

Some related fixes:

9005-conn-leak @ cad24bba2240b47f59bc5719a035e85ff5eb60ef

#11 Updated by Tom Morris 5 days ago

  • Target version changed from Arvados Future Sprints to 2017-06-07 sprint

#12 Updated by Tom Morris 4 days ago

  • Assignee set to Tom Clegg

Also available in: Atom PDF