Project

General

Profile

Actions

Feature #16347

closed

crunch-run runs local keepstore

Added by Peter Amstutz over 4 years ago. Updated over 2 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
Keep
Target version:
Story points:
-
Release relationship:
Auto

Description

  • Runs arvados-server (same binary as crunch-run) in a subprocess in keepstore when starting up crunch-runs
  • Config parameter which for (N * number of allocated CPU cores) number of keepstore buffers
  • Use environment variable ARVADOS_KEEP_SERVICES to override the keepstore server used by arv-mount and inside the container when {API: true}
  • Shut down keepstore service when shutting down crunch-run

Subtasks 1 (0 open1 closed)

Task #18229: Review 16347-container-local-keepstoreResolvedWard Vandewege10/08/2021Actions

Related issues

Related to Arvados Epics - Idea #16516: Run Keepstore on local compute nodesResolved10/01/202111/30/2021Actions
Related to Arvados - Feature #18992: Enable local keepstore on slurm/lsf if cluster config file already exists on compute nodeResolvedTom Clegg04/14/2022Actions
Related to Arvados - Bug #19054: [documentation] clarify AWS credentials needed for local keepstore on computeResolvedWard Vandewege04/25/2022Actions
Actions #2

Updated by Peter Amstutz over 4 years ago

Actions #4

Updated by Peter Amstutz almost 4 years ago

  • Related to Idea #16516: Run Keepstore on local compute nodes added
Actions #5

Updated by Peter Amstutz almost 4 years ago

Actions #6

Updated by Peter Amstutz almost 4 years ago

  • Subject changed from run keepstore locally on compute node in standard configuration to crunch-run runs local keepstore
Actions #7

Updated by Peter Amstutz about 3 years ago

  • Target version set to 2021-10-13 sprint
Actions #8

Updated by Peter Amstutz about 3 years ago

  • Description updated (diff)
Actions #9

Updated by Peter Amstutz about 3 years ago

  • Category set to Keep
Actions #10

Updated by Peter Amstutz about 3 years ago

  • Assigned To set to Tom Clegg
Actions #11

Updated by Tom Clegg about 3 years ago

  • Status changed from New to In Progress
Actions #12

Updated by Tom Clegg about 3 years ago

So far:

16347-container-local-keepstore @ e2aaf1be932a49cdcfa140167ca296e39b6e5158 -- developer-run-tests: #2726

  • new config Containers.LocalKeepBlobBuffersPerVCPU, default 0
  • tested successfully on ce8i5, see ce8i5-xvhdp-bwrxfm4znro10b0
  • adds keepstore.txt file to container log
  • fixes keepstore's Azure driver log spam (404 when checking whether a block already exists before writing it)
tbd
  • figure out a test case
  • decide how to spell keepstore.txt config options:
    • log everything (too verbose, unless debugging)
    • suppress "request" and "successful response" lines -- only log startup messages, error responses, backend errors, anything else that might get logged by third party libraries, etc.
    • discard all logs (more secure)
  • decide how to spell the config option to enable "send copy of all logs to systemd-cat" (currently always enabled if systemd-cat exists, which is not ideal because journald does a ridiculous amount of IO and the logs typically disappear via instance shutdown before anyone bothers to look at them anyway)
Actions #13

Updated by Tom Clegg about 3 years ago

16347-container-local-keepstore @ bda9093be4d24d45a6fff29148fbb5438e283897 -- developer-run-tests: #2729

  • adds config for keepstore logging -- LocalKeepLogsToContainerLog: none/all/errors ("errors" isn't the perfect description of what it does, though. It includes startup messages like "starting keepstore on http://localhost:35471", "started volume zzzzz-nyw5e-000000000000001", etc. Which is wrong, the name or the behavior?)
  • adds test case

I think the logging-to-systemd question deserves its own issue. For now I've just made it skip the systemd-cat setup (instead of preventing anything from working at all) when systemd-cat is not in PATH.

Actions #14

Updated by Peter Amstutz about 3 years ago

  • Target version changed from 2021-10-13 sprint to 2021-10-27 sprint
Actions #15

Updated by Ward Vandewege about 3 years ago

Reviewing bda9093be4d24d45a6fff29148fbb5438e283897

In lib/config/config.default.yml

+      # This feature has security implications. (1) Container logs
+      # will include keepstore log files, which typically reveal some
+      # volume configuration details, error messages from the cloud
+      # storage provider, etc., which are not otherwise visible to
+      # users. 

This should probably be updated to indicate that the container logs are disabled by default?

It may also be worth noting here that if this feature is enabled, the communication with the local keepstore process will happen over a tcp connection on the localhost interface without ssl.

+      LocalKeepBlobBuffersPerVCPU: 0

I wonder if that is the correct default. It's backwards compatible but probably not what ideal in most scenarios.

I think we should also add a page to the Architecture section of our docs that describes this feature and why it is important. I can write that if you want, I suppose we can do a followup story for that.

No comments on the rest of the diff, it looks great. LGTM, thanks!

Actions #16

Updated by Tom Clegg about 3 years ago

I forgot to mention some limitations that need to be documented until we can fix them.
  • if all of your volumes have AccessViaHost, this won't work because the local keepstore process won't have any volumes. This is detected by config-check etc. as an invalid config.
  • if some of your volumes have AccessViaHost, those volumes will be inaccessible, which means some blocks in a container's inputs or docker image container might be inaccessible, causing containers to fail. We could address this by having keepstore fall back on proxying to other keepstores but that pretty much requires our future keep architecture.
  • if any volume has lower replication than the default collection replication level, writes will fail "insufficient replicas" when keepstore decides to write to those volumes (because keepstore returns after writing to a single volume even if the client wanted more replicas). We could address this by adding an "I'm the only keepstore" flag to keepstore that causes it to write multiple volumes instead.

Perhaps the 2nd and 3rd points should throw config-check warnings or errors?

Or, we could interpret the config as "...if possible" and just disable it automatically in these troublesome situations. That way the default could be non-zero.

I think we should also add a page to the Architecture section of our docs that describes this feature and why it is important.

Agreed

This should probably be updated to indicate that the container logs are disabled by default?

Oops, fixed.

16347-container-local-keepstore @ c8d252f51c23484484e4aa023fcd1f86ee961eab

Actions #17

Updated by Ward Vandewege about 3 years ago

Tom Clegg wrote:

I forgot to mention some limitations that need to be documented until we can fix them.
  • if all of your volumes have AccessViaHost, this won't work because the local keepstore process won't have any volumes. This is detected by config-check etc. as an invalid config.
  • if some of your volumes have AccessViaHost, those volumes will be inaccessible, which means some blocks in a container's inputs or docker image container might be inaccessible, causing containers to fail. We could address this by having keepstore fall back on proxying to other keepstores but that pretty much requires our future keep architecture.
  • if any volume has lower replication than the default collection replication level, writes will fail "insufficient replicas" when keepstore decides to write to those volumes (because keepstore returns after writing to a single volume even if the client wanted more replicas). We could address this by adding an "I'm the only keepstore" flag to keepstore that causes it to write multiple volumes instead.

Perhaps the 2nd and 3rd points should throw config-check warnings or errors?

Yeah the 2nd point should also be a config-check error.

Would the 3rd point be a problem in general? Not just when Containers.LocalKeepBlobBuffersPerVCPU is non-zero?

Or, we could interpret the config as "...if possible" and just disable it automatically in these troublesome situations. That way the default could be non-zero.

That would remove need for configuration, so that sounds good. Would there be a way for an admin to tell if a compute node used a local keepstore or not from the container (requeest) logs? We'd need that, and we would need to clarify how to do that in the docs.

I think we should also add a page to the Architecture section of our docs that describes this feature and why it is important.

Agreed

This should probably be updated to indicate that the container logs are disabled by default?

Oops, fixed.

16347-container-local-keepstore @ c8d252f51c23484484e4aa023fcd1f86ee961eab

Thanks!

Actions #18

Updated by Tom Clegg about 3 years ago

16347-container-local-keepstore @ 68259bcde57277cb709296fc24e86826d9c131d5 -- developer-run-tests: #2735
  • (re-ran the "remainder" job which failed on a port# collision: developer-run-tests-remainder: #2843 )
  • default is LocalKeepBlobBuffersPerVCPU: 1
  • if LocalKeepBlobBuffersPerVCPU>0 but volume config is unsuitable, crunch-job logs a message to that effect and then proceeds as if LocalKeepBlobBuffersPerVCPU=0
  • reserves extra memory when choosing node type

Is the plain http point worth mentioning if it's only listening on localhost? It seems to me we do this in various places like nginx→controller so I wouldn't think it would surprise anyone.

How about something like this for the release notes:

When Arvados runs a container, the crunch-run supervisor process now brings up its own keepstore server to handle I/O for mounted collections, outputs, and logs. With the default configuration, the keepstore process allocates one 64 MiB block buffer per VCPU requested by the container. For most workloads this will increase throughput, reduce total network traffic, and make it possible to run more containers at once without provisioning additional keepstore nodes to handle the I/O load.
  • If you have containers that can effectively handle multiple I/O threads per VCPU, consider increasing the Containers.LocalKeepBlobBuffersPerVCPU value, or setting it to 0 to disable this feature.
  • This feature is enabled only if no volumes use AccessViaHosts, and no volumes have underlying Replication less than the default collection replication. If the feature is configured but cannot be enabled due to an incompatible volume configuration, this will be noted in the crunch-run.txt file in the container log.

Would the 3rd point be a problem in general? Not just when Containers.LocalKeepBlobBuffersPerVCPU is non-zero?

(the "insufficient replicas" thing) isn't a problem when a client can access multiple keepstore servers (client writes to multiple keepstore servers until reaching desired replication) or a single keepproxy server (keepproxy writes to multiple keepstore servers). So I think this would only come up elsewhere if you manually set ARVADOS_KEEP_SERVICES to a single keepstore server, or if you only provision one keepstore node.

Actions #19

Updated by Ward Vandewege about 3 years ago

Tom Clegg wrote:

16347-container-local-keepstore @ 68259bcde57277cb709296fc24e86826d9c131d5 -- developer-run-tests: #2735
  • (re-ran the "remainder" job which failed on a port# collision: developer-run-tests-remainder: #2843 )
  • default is LocalKeepBlobBuffersPerVCPU: 1
  • if LocalKeepBlobBuffersPerVCPU>0 but volume config is unsuitable, crunch-job logs a message to that effect and then proceeds as if LocalKeepBlobBuffersPerVCPU=0
  • reserves extra memory when choosing node type

Is the plain http point worth mentioning if it's only listening on localhost? It seems to me we do this in various places like nginx→controller so I wouldn't think it would surprise anyone.

Yeah, I think it's ok to omit it.

How about something like this for the release notes:

When Arvados runs a container, the crunch-run supervisor process now brings up its own keepstore server to handle I/O for mounted collections, outputs, and logs. With the default configuration, the keepstore process allocates one 64 MiB block buffer per VCPU requested by the container. For most workloads this will increase throughput, reduce total network traffic, and make it possible to run more containers at once without provisioning additional keepstore nodes to handle the I/O load.
  • If you have containers that can effectively handle multiple I/O threads per VCPU, consider increasing the Containers.LocalKeepBlobBuffersPerVCPU value, or setting it to 0 to disable this feature.

The "or setting it to 0 to disable this feature" is a bit of a non-sequitor? Should that be a separate bullet?

  • This feature is enabled only if no volumes use AccessViaHosts, and no volumes have underlying Replication less than the default collection replication. If the feature is configured but cannot be enabled due to an incompatible volume configuration, this will be noted in the crunch-run.txt file in the container log.

Would the 3rd point be a problem in general? Not just when Containers.LocalKeepBlobBuffersPerVCPU is non-zero?

(the "insufficient replicas" thing) isn't a problem when a client can access multiple keepstore servers (client writes to multiple keepstore servers until reaching desired replication) or a single keepproxy server (keepproxy writes to multiple keepstore servers). So I think this would only come up elsewhere if you manually set ARVADOS_KEEP_SERVICES to a single keepstore server, or if you only provision one keepstore node.

OK, got it.

LGTM, thanks!

Actions #20

Updated by Peter Amstutz about 3 years ago

  • Target version changed from 2021-10-27 sprint to 2021-11-10 sprint
Actions #21

Updated by Tom Clegg about 3 years ago

Re non-sequitur, I was thinking of the "already have plenty of keepstore nodes provisioned" situation -- is this any better?

If you have containers that can effectively handle multiple I/O threads per VCPU, consider either increasing the Containers.LocalKeepBlobBuffersPerVCPU value, or setting it to 0 to disable this feature and preserve the previous behavior of sending container I/O traffic to your separately provisioned keepstore servers.

Actions #22

Updated by Tom Clegg about 3 years ago

with upgrade note added and main merged:

16347-container-local-keepstore @ 83c996d75698093446fbfff89ea4abeb36cbc8c4 -- developer-run-tests: #2748

with fix for unrelated flaky keep-web test:

16347-container-local-keepstore @ 39a723673e92b842233b1da5fde27aa595fcc59f -- developer-run-tests: #2749

Actions #23

Updated by Ward Vandewege about 3 years ago

Tom Clegg wrote:

Re non-sequitur, I was thinking of the "already have plenty of keepstore nodes provisioned" situation -- is this any better?

If you have containers that can effectively handle multiple I/O threads per VCPU, consider either increasing the Containers.LocalKeepBlobBuffersPerVCPU value, or setting it to 0 to disable this feature and preserve the previous behavior of sending container I/O traffic to your separately provisioned keepstore servers.

Yes! Much better.

Actions #24

Updated by Ward Vandewege about 3 years ago

Tom Clegg wrote:

with upgrade note added and main merged:

16347-container-local-keepstore @ 83c996d75698093446fbfff89ea4abeb36cbc8c4 -- developer-run-tests: #2748

LGTM thanks!

with fix for unrelated flaky keep-web test:

16347-container-local-keepstore @ 39a723673e92b842233b1da5fde27aa595fcc59f -- developer-run-tests: #2749

Nice improvement, LGTM, thanks!

Actions #25

Updated by Tom Clegg about 3 years ago

16347-container-local-keepstore @ 54836b787450bf23abcf7be291831799093a17b0

Actions #26

Updated by Ward Vandewege about 3 years ago

Tom Clegg wrote:

16347-container-local-keepstore @ 54836b787450bf23abcf7be291831799093a17b0

That's perfect, thank you, LGTM!

Actions #27

Updated by Tom Clegg about 3 years ago

  • % Done changed from 0 to 100
  • Status changed from In Progress to Resolved

Applied in changeset arvados-private:commit:arvados|f2788dd5fc3ef725969d6c0fcc4ddee3754829fc.

Actions #28

Updated by Peter Amstutz over 2 years ago

  • Release set to 46
Actions #29

Updated by Ward Vandewege over 2 years ago

  • Related to Feature #18992: Enable local keepstore on slurm/lsf if cluster config file already exists on compute node added
Actions #30

Updated by Ward Vandewege over 2 years ago

  • Related to Bug #19054: [documentation] clarify AWS credentials needed for local keepstore on compute added
Actions

Also available in: Atom PDF