Feature #16347: crunch-run runs local keepstore - Arvados

new config Containers.LocalKeepBlobBuffersPerVCPU, default 0
tested successfully on ce8i5, see ce8i5-xvhdp-bwrxfm4znro10b0
adds keepstore.txt file to container log
fixes keepstore's Azure driver log spam (404 when checking whether a block already exists before writing it)

tbd

figure out a test case
decide how to spell keepstore.txt config options:
- log everything (too verbose, unless debugging)
- suppress "request" and "successful response" lines -- only log startup messages, error responses, backend errors, anything else that might get logged by third party libraries, etc.
- discard all logs (more secure)
decide how to spell the config option to enable "send copy of all logs to systemd-cat" (currently always enabled if systemd-cat exists, which is not ideal because journald does a ridiculous amount of IO and the logs typically disappear via instance shutdown before anyone bothers to look at them anyway)

Actions

Copy link

#13

Updated by Tom Clegg over 3 years ago

16347-container-local-keepstore @ bda9093be4d24d45a6fff29148fbb5438e283897 -- developer-run-tests: #2729

adds config for keepstore logging -- LocalKeepLogsToContainerLog: none/all/errors ("errors" isn't the perfect description of what it does, though. It includes startup messages like "starting keepstore on http://localhost:35471", "started volume zzzzz-nyw5e-000000000000001", etc. Which is wrong, the name or the behavior?)
adds test case

I think the logging-to-systemd question deserves its own issue. For now I've just made it skip the systemd-cat setup (instead of preventing anything from working at all) when systemd-cat is not in PATH.

Actions

Copy link

#14

Updated by Peter Amstutz over 3 years ago

Target version changed from 2021-10-13 sprint to 2021-10-27 sprint

Actions

Copy link

#15

Updated by Ward Vandewege over 3 years ago

Reviewing bda9093be4d24d45a6fff29148fbb5438e283897

In lib/config/config.default.yml

+      # This feature has security implications. (1) Container logs
+      # will include keepstore log files, which typically reveal some
+      # volume configuration details, error messages from the cloud
+      # storage provider, etc., which are not otherwise visible to
+      # users.

This should probably be updated to indicate that the container logs are disabled by default?

It may also be worth noting here that if this feature is enabled, the communication with the local keepstore process will happen over a tcp connection on the localhost interface without ssl.

+      LocalKeepBlobBuffersPerVCPU: 0

I wonder if that is the correct default. It's backwards compatible but probably not what ideal in most scenarios.

I think we should also add a page to the Architecture section of our docs that describes this feature and why it is important. I can write that if you want, I suppose we can do a followup story for that.

No comments on the rest of the diff, it looks great. LGTM, thanks!

Actions

Copy link

#16

Updated by Tom Clegg over 3 years ago

I forgot to mention some limitations that need to be documented until we can fix them.

if all of your volumes have AccessViaHost, this won't work because the local keepstore process won't have any volumes. This is detected by config-check etc. as an invalid config.
if some of your volumes have AccessViaHost, those volumes will be inaccessible, which means some blocks in a container's inputs or docker image container might be inaccessible, causing containers to fail. We could address this by having keepstore fall back on proxying to other keepstores but that pretty much requires our future keep architecture.
if any volume has lower replication than the default collection replication level, writes will fail "insufficient replicas" when keepstore decides to write to those volumes (because keepstore returns after writing to a single volume even if the client wanted more replicas). We could address this by adding an "I'm the only keepstore" flag to keepstore that causes it to write multiple volumes instead.

Perhaps the 2nd and 3rd points should throw config-check warnings or errors?

Or, we could interpret the config as "...if possible" and just disable it automatically in these troublesome situations. That way the default could be non-zero.

I think we should also add a page to the Architecture section of our docs that describes this feature and why it is important.

Agreed

This should probably be updated to indicate that the container logs are disabled by default?

Oops, fixed.

16347-container-local-keepstore @ c8d252f51c23484484e4aa023fcd1f86ee961eab

Actions

Copy link

#17

Updated by Ward Vandewege over 3 years ago

Tom Clegg wrote:

I forgot to mention some limitations that need to be documented until we can fix them.

if all of your volumes have AccessViaHost, this won't work because the local keepstore process won't have any volumes. This is detected by config-check etc. as an invalid config.

if some of your volumes have AccessViaHost, those volumes will be inaccessible, which means some blocks in a container's inputs or docker image container might be inaccessible, causing containers to fail. We could address this by having keepstore fall back on proxying to other keepstores but that pretty much requires our future keep architecture.

if any volume has lower replication than the default collection replication level, writes will fail "insufficient replicas" when keepstore decides to write to those volumes (because keepstore returns after writing to a single volume even if the client wanted more replicas). We could address this by adding an "I'm the only keepstore" flag to keepstore that causes it to write multiple volumes instead.

Perhaps the 2nd and 3rd points should throw config-check warnings or errors?

Yeah the 2nd point should also be a config-check error.

Would the 3rd point be a problem in general? Not just when Containers.LocalKeepBlobBuffersPerVCPU is non-zero?

Or, we could interpret the config as "...if possible" and just disable it automatically in these troublesome situations. That way the default could be non-zero.

That would remove need for configuration, so that sounds good. Would there be a way for an admin to tell if a compute node used a local keepstore or not from the container (requeest) logs? We'd need that, and we would need to clarify how to do that in the docs.

I think we should also add a page to the Architecture section of our docs that describes this feature and why it is important.

Agreed

This should probably be updated to indicate that the container logs are disabled by default?

Oops, fixed.

16347-container-local-keepstore @ c8d252f51c23484484e4aa023fcd1f86ee961eab

Thanks!

Actions

Copy link

#18

Updated by Tom Clegg over 3 years ago

16347-container-local-keepstore @ 68259bcde57277cb709296fc24e86826d9c131d5 -- developer-run-tests: #2735

(re-ran the "remainder" job which failed on a port# collision: developer-run-tests-remainder: #2843 )
default is LocalKeepBlobBuffersPerVCPU: 1
if LocalKeepBlobBuffersPerVCPU>0 but volume config is unsuitable, crunch-job logs a message to that effect and then proceeds as if LocalKeepBlobBuffersPerVCPU=0
reserves extra memory when choosing node type

Is the plain http point worth mentioning if it's only listening on localhost? It seems to me we do this in various places like nginx→controller so I wouldn't think it would surprise anyone.

How about something like this for the release notes:

When Arvados runs a container, the crunch-run supervisor process now brings up its own keepstore server to handle I/O for mounted collections, outputs, and logs. With the default configuration, the keepstore process allocates one 64 MiB block buffer per VCPU requested by the container. For most workloads this will increase throughput, reduce total network traffic, and make it possible to run more containers at once without provisioning additional keepstore nodes to handle the I/O load.

If you have containers that can effectively handle multiple I/O threads per VCPU, consider increasing the Containers.LocalKeepBlobBuffersPerVCPU value, or setting it to 0 to disable this feature.
This feature is enabled only if no volumes use AccessViaHosts, and no volumes have underlying Replication less than the default collection replication. If the feature is configured but cannot be enabled due to an incompatible volume configuration, this will be noted in the crunch-run.txt file in the container log.

Would the 3rd point be a problem in general? Not just when Containers.LocalKeepBlobBuffersPerVCPU is non-zero?

(the "insufficient replicas" thing) isn't a problem when a client can access multiple keepstore servers (client writes to multiple keepstore servers until reaching desired replication) or a single keepproxy server (keepproxy writes to multiple keepstore servers). So I think this would only come up elsewhere if you manually set ARVADOS_KEEP_SERVICES to a single keepstore server, or if you only provision one keepstore node.

Actions

Copy link

#19

Updated by Ward Vandewege over 3 years ago

Tom Clegg wrote:

16347-container-local-keepstore @ 68259bcde57277cb709296fc24e86826d9c131d5 -- developer-run-tests: #2735

(re-ran the "remainder" job which failed on a port# collision: developer-run-tests-remainder: #2843 )

default is LocalKeepBlobBuffersPerVCPU: 1

if LocalKeepBlobBuffersPerVCPU>0 but volume config is unsuitable, crunch-job logs a message to that effect and then proceeds as if LocalKeepBlobBuffersPerVCPU=0

reserves extra memory when choosing node type

Is the plain http point worth mentioning if it's only listening on localhost? It seems to me we do this in various places like nginx→controller so I wouldn't think it would surprise anyone.

Yeah, I think it's ok to omit it.

How about something like this for the release notes:
When Arvados runs a container, the crunch-run supervisor process now brings up its own keepstore server to handle I/O for mounted collections, outputs, and logs. With the default configuration, the keepstore process allocates one 64 MiB block buffer per VCPU requested by the container. For most workloads this will increase throughput, reduce total network traffic, and make it possible to run more containers at once without provisioning additional keepstore nodes to handle the I/O load.

If you have containers that can effectively handle multiple I/O threads per VCPU, consider increasing the Containers.LocalKeepBlobBuffersPerVCPU value, or setting it to 0 to disable this feature.

The "or setting it to 0 to disable this feature" is a bit of a non-sequitor? Should that be a separate bullet?

This feature is enabled only if no volumes use AccessViaHosts, and no volumes have underlying Replication less than the default collection replication. If the feature is configured but cannot be enabled due to an incompatible volume configuration, this will be noted in the crunch-run.txt file in the container log.

Would the 3rd point be a problem in general? Not just when Containers.LocalKeepBlobBuffersPerVCPU is non-zero?

(the "insufficient replicas" thing) isn't a problem when a client can access multiple keepstore servers (client writes to multiple keepstore servers until reaching desired replication) or a single keepproxy server (keepproxy writes to multiple keepstore servers). So I think this would only come up elsewhere if you manually set ARVADOS_KEEP_SERVICES to a single keepstore server, or if you only provision one keepstore node.

OK, got it.

LGTM, thanks!

Actions

Copy link

#20

Updated by Peter Amstutz over 3 years ago

Target version changed from 2021-10-27 sprint to 2021-11-10 sprint

Actions

Copy link

#21

Updated by Tom Clegg over 3 years ago

Re non-sequitur, I was thinking of the "already have plenty of keepstore nodes provisioned" situation -- is this any better?

If you have containers that can effectively handle multiple I/O threads per VCPU, consider either increasing the Containers.LocalKeepBlobBuffersPerVCPU value, or setting it to 0 to disable this feature and preserve the previous behavior of sending container I/O traffic to your separately provisioned keepstore servers.

Actions

Copy link

#22

Updated by Tom Clegg over 3 years ago

with upgrade note added and main merged:

16347-container-local-keepstore @ 83c996d75698093446fbfff89ea4abeb36cbc8c4 -- developer-run-tests: #2748

with fix for unrelated flaky keep-web test:

16347-container-local-keepstore @ 39a723673e92b842233b1da5fde27aa595fcc59f -- developer-run-tests: #2749

Actions

Copy link

#23

Updated by Ward Vandewege over 3 years ago

Tom Clegg wrote:

Re non-sequitur, I was thinking of the "already have plenty of keepstore nodes provisioned" situation -- is this any better?

If you have containers that can effectively handle multiple I/O threads per VCPU, consider either increasing the Containers.LocalKeepBlobBuffersPerVCPU value, or setting it to 0 to disable this feature and preserve the previous behavior of sending container I/O traffic to your separately provisioned keepstore servers.

Yes! Much better.

Actions

Copy link

#24

Updated by Ward Vandewege over 3 years ago

Tom Clegg wrote:

with upgrade note added and main merged:

16347-container-local-keepstore @ 83c996d75698093446fbfff89ea4abeb36cbc8c4 -- developer-run-tests: #2748

LGTM thanks!

with fix for unrelated flaky keep-web test:

16347-container-local-keepstore @ 39a723673e92b842233b1da5fde27aa595fcc59f -- developer-run-tests: #2749

Nice improvement, LGTM, thanks!

Actions

Copy link

#25

Updated by Tom Clegg over 3 years ago

16347-container-local-keepstore @ 54836b787450bf23abcf7be291831799093a17b0

Actions

Copy link

#26

Updated by Ward Vandewege over 3 years ago

Tom Clegg wrote:

16347-container-local-keepstore @ 54836b787450bf23abcf7be291831799093a17b0

That's perfect, thank you, LGTM!

Actions

Copy link

#27

Updated by Tom Clegg over 3 years ago

% Done changed from 0 to 100
Status changed from In Progress to Resolved

Applied in changeset arvados-private:commit:arvados|f2788dd5fc3ef725969d6c0fcc4ddee3754829fc.

Actions

Copy link

#28

Updated by Peter Amstutz about 3 years ago

Release set to 46

Actions

Copy link

#29

Updated by Ward Vandewege almost 3 years ago

Related to Feature #18992: Enable local keepstore on slurm/lsf if cluster config file already exists on compute node added

Actions

Copy link

#30

Updated by Ward Vandewege almost 3 years ago

Related to Bug #19054: [documentation] clarify AWS credentials needed for local keepstore on compute added

Project

General

Profile

Arvados

Custom queries

Feature #16347

crunch-run runs local keepstore

Updated by Peter Amstutz almost 5 years ago

Updated by Peter Amstutz about 4 years ago

Updated by Peter Amstutz about 4 years ago

Updated by Peter Amstutz about 4 years ago

Updated by Peter Amstutz over 3 years ago

Updated by Peter Amstutz over 3 years ago

Updated by Peter Amstutz over 3 years ago

Updated by Peter Amstutz over 3 years ago

Updated by Tom Clegg over 3 years ago

Updated by Tom Clegg over 3 years ago

Updated by Tom Clegg over 3 years ago

Updated by Peter Amstutz over 3 years ago

Updated by Ward Vandewege over 3 years ago

Updated by Tom Clegg over 3 years ago

Updated by Ward Vandewege over 3 years ago

Updated by Tom Clegg over 3 years ago

Updated by Ward Vandewege over 3 years ago

Updated by Peter Amstutz over 3 years ago

Updated by Tom Clegg over 3 years ago

Updated by Tom Clegg over 3 years ago

Updated by Ward Vandewege over 3 years ago

Updated by Ward Vandewege over 3 years ago

Updated by Tom Clegg over 3 years ago

Updated by Ward Vandewege over 3 years ago

Updated by Tom Clegg over 3 years ago

Updated by Peter Amstutz about 3 years ago

Updated by Ward Vandewege almost 3 years ago

Updated by Ward Vandewege almost 3 years ago

Related to Arvados Epics - Idea #16516: Run Keepstore on local compute nodes	Resolved		10/01/2021	11/30/2021	Actions
Related to Arvados - Feature #18992: Enable local keepstore on slurm/lsf if cluster config file already exists on compute node	Resolved	Tom Clegg	04/14/2022		Actions
Related to Arvados - Bug #19054: [documentation] clarify AWS credentials needed for local keepstore on compute	Resolved	Ward Vandewege	04/25/2022		Actions