Pull Docker images without requiring Docker on client
Web-only users do not have access to Docker on the client (in addition, some shell users don't have access to Docker for security reasons, for example visitors to cloud.curoverse.com). Ther should be a mechanism to submit a request to pull a Docker image for use in Arvados.
Unprivileged pull inside a normal container request¶
There's at least one utility for pulling/manipulating images:
However last I checked it doesn't support conversion to the "docker save" tarfile dump that we use. Maybe we could add support.
Special container request¶
Special format container request recognized by crunch-run which executes "docker pull" (instead of "docker run") and produces the image collection as output.
Dedicated "docker pull" service and/or WES¶
New microservice with API for "pull image". This would avoid the overhead of starting up a dedicated VM to run a download process that usually only takes a few seconds.
Note: the Workflow Execution Service (WES) server sort of already does this; if workflow run is submitted that requires pulling Docker images, it will pull them and upload them. This is existing arvados-cwl-runner behavior which normally requires Docker on the client, but in the case of WES, the WES gateway is the agent that runs arvados-cwl-runner and not the original client.
So there's also an option to migrate clients that submit workflows (a-c-r, workbench, composer) to use Arvados WES instead of directly creating container requests.
Arvados Docker registry service¶
Deploy https://github.com/docker/distribution or implement the API https://docs.docker.com/registry/spec/api/ . Store layers in keep instead of whole image tarballs. Regular "docker push" and "docker pull" works. Unprivileged import is more tractable by avoiding format conversion.
Additional consideration: to access private registries, we need to provide credentials. Secrets handling is available for container requests.
Updated by Tom Clegg about 2 months ago
- Status changed from New to In Progress
- Doesn't require docker on system nodes, only on compute nodes where it is typically already installed
- Maintains the reproducibility feature of retaining the exact image that was used to run each workflow step (except the "pull" process itself, which is inherently not reproducible)