Feature #19166
closedContainer shell support for SLURM and LSF dispatchers
Description
Unlike the arvados-dispatch-cloud case, the dispatcher doesn't know which HPC compute node will run the container, and the HPC compute node isn't necessarily even reachable from controller. To work around this, we will make an initial connection in the opposite direction and set up a tunnel.
- crunch-run connects to new controller API arvados/v1/containers/{uuid}/gateway_tunnel, authenticated using the container key (GatewayAuthSecret)
- controller registers its own internalURL as the container’s GatewayAddress, and uses the tunnel to route incoming container_ssh connections to crunch-run through the tunnel
- there can be multiple controller hosts/processes; the container_ssh API on controller A will sometimes need to proxy through the same API on controller B
Updated by Peter Amstutz over 2 years ago
- Target version changed from 2022-07-20 to 2022-06-22 Sprint
Updated by Tom Clegg over 2 years ago
- Related to Idea #17207: services running in containers added
Updated by Tom Clegg over 2 years ago
- Status changed from New to In Progress
- Description updated (diff)
Updated by Tom Clegg over 2 years ago
19166-gateway-tunnel @ b44ac131b7385af241acdcbf3835f743ea590b6a -- developer-run-tests: #3180
Updated by Peter Amstutz over 2 years ago
- Target version changed from 2022-06-22 Sprint to 2022-07-06
Updated by Tom Clegg over 2 years ago
19166-gateway-tunnel @ 3fae0f0626c5152a5aa6f39f0874f0190f2131db -- developer-run-tests: #3196
Includes a doc page about HPC with a description of how the multiplex-tunnel setup works, and an update to the InternalURLs info in the install docs to reflect that it relies on controller-to-controller connections.
Updated by Tom Clegg over 2 years ago
As discussed in chat, TODO: crunch-run should not set up a tunnel if it won't actually be used by controller (i.e., if crunch-run won't be saving the tunnel endpoint in the container record because $GatewayAddress is set).
Updated by Tom Clegg over 2 years ago
- don't set up tunnel if it won't be used
- add required glue to slurm and lsf dispatchers (pass GatewayAuthSecret env var)
Updated by Peter Amstutz over 2 years ago
- Target version changed from 2022-07-06 to 2022-07-20
Updated by Tom Clegg over 2 years ago
- add
arvados-server dispatch-slurm
subcommand (missed in #18947) - add
crunch-run -version
- improve some log/debug messages
- fix plumbing so "shell {uuid} echo ok" exits after running, instead of hanging
- tested on 9tee4 using slurm+singularity (works, although it's a bit disconcerting that you land in
root@compute0:~#
because singularity doesn't set up an imaginary hostname inside the container like docker does) - tested on 9tee4 using lsf+singularity (doesn't work on 9tee4 because firewall rules prohibit outgoing connections from non-root users to 127.0.0.1, and unlike Slurm, LSF on 9tee4 is configured to run crunch-run as the "crunch" user; but the error message shows that the LSF part per se is working)
todo: add an API handler to "GET .../ssh" so an old arvados-client returns a helpful "upgrade your client" error instead of a mysterious "405 method not allowed".
Updated by Tom Clegg over 2 years ago
19166-gateway-tunnel @ 2261d1fd9e1b69d0a60f1f7fe9029317aeb4cf52 -- developer-run-tests: #3219
Example result using old arvados-client:
$ arvados-client shell 9tee4-xvhdp-49i6665mzesonf3 connecting to container 9tee4-dz642-zluu70frgwkb5ke error setting up tunnel: server did not provide a tunnel: API endpoint is obsolete -- please upgrade your arvados-client program (HTTP 410)
Updated by Tom Clegg over 2 years ago
- Target version changed from 2022-07-20 to 2022-08-03 Sprint
Updated by Peter Amstutz over 2 years ago
Let's go ahead and merge this, otherwise it's going to sit forever. LGTM.
Updated by Tom Clegg over 2 years ago
(re-testing after merging main)
19166-gateway-tunnel @ 2e03d03bc55b5a612c2bf04d878a72f2ee420d99 -- developer-run-tests: #3246
Updated by Tom Clegg over 2 years ago
- Status changed from In Progress to Resolved
Applied in changeset arvados-private:commit:arvados|c9b8b9b9c78a77dd30b828914c8bee9fa8dcbb90.