Feature #22551
openContainers can expose HTTP endpoints
Description
https://dev.arvados.org/projects/arvados/wiki/Service_containers
API changes¶
Containers and container requests¶
They get two new fields:
"service" | boolean | does the container represent a long lived service or a once-through batch run? if "service" is "true" then this has the effect of disabling container reuse. |
"published_ports" | dict | dictionary with keys being the port on the container, and values described below |
published_port values
"access" | string | "public" or "private". public means unauthenticated connections are allowed. private means you must provide an unscoped Arvados API key for the same user as owns the container |
"label" | string | text describing the service to be displayed in workbench |
Example:
{ "published_ports": { "80": { "access": "private", "label": "My great web app" } } }
New "link" type¶
The "published_port_name" link let us assign friendly, stable names to services. Discussed more in the "controller" section.
link_class | "published_port" |
owner_uuid | normal ownership (user or project) |
name | a hostname for which requests will be proxed to the given port on the container associated with the given container request. must be a valid DNS name, cannot have invalid characters or dots |
head_uuid | container request uuid |
properties | has a single key, "port", which is an integer e.g. {"port": 80} |
The API server shall have a new unique constraint index on so that only one link object of (link_class=published_port, name) can exist at a time.
Controller/crunch-run changes¶
Requests to virtual host in the configured HTTP container domain (e.g. *.containers.zzzzz.arvadosapi.com) are directed to controller and intercepted.
1. Controller uses the hostname to determine what container they apply to.- The hostname has the format "uuid-port" e.g. zzzzz-xvhdp-iiiiiiiiiiiiiii-1234.containers.zzzzz.arvadosapi.com (It might be nice if the port is missing to default to port 80 on the target)
- The hostname has been claimed with a "hostname" link type as described above (it should fall back to this after checking uuid to prevent someone from taking over a uuid, I think?)
This gets a container request uuid and target port. From the container request, it gets the container uuid.
If the container does not exist, return a 404 error.
2. Check for "?api_token=" in the URL. If found, remove "?api_token" from the query, and return a redirect setting a cookie containing the token (this should use the same mechanism that keep-web uses).
3. Get "published_ports" from the container record. Look up the port that the user has requested to connect to. Check if access is marked "public", otherwise it is considered "private"
If it is "private", check for an authorization cookie and/or "Authorization" header to get the token. The token must be valid and correspond to the user that owns the container.
If the user does not match or there is no authorization header and the container is private, return a 403 error.
4. If the request passes access control, the request is proxied to the container's crunch-run process using the mechanism previously developed for container shell and container logs.
5. The container's crunch-run process receives the request and proxies it to the container on the specified port.
If the port is not open on the client, return an error (should be a 404 or a 502? depends on whether we want the client to retry)
6. The response is relayed back through crunch-run and controller.
Workbench changes¶
If a container request is running and has a non-empty "published ports" it should display those prominently at the top as something the user can click on. It should check for "hostname" links pointing to the container and preferentially use those when constructing the URL. The link should include "?api_token=".
arvados-cwl-runner changes¶
Introduce an extension to CommandLineTool that corresponds to setting the "service" and "published_ports" fields.
In the future, we may want to be able to launch a service and then return its endpoint as output that can be passed downstream; this is out of scope for this ticket.
Updated by Peter Amstutz 18 days ago
- Target version changed from Development 2025-02-26 to Future
Updated by Peter Amstutz 18 days ago
- Related to Idea #17207: services running in containers added
Updated by Peter Amstutz 11 days ago
- Target version changed from Future to Development 2025-02-26
Updated by Tom Clegg 11 days ago
- Idea #17207: services running in containers
- details about http-forwarding
- vague ideas about follow-up features
- Feature #22551: Containers can expose HTTP endpoints
- hand-waving about http-forwarding
- details about adding simple access controls
- details about adding human-friendly container names
- Service containers wiki
- thoughts about service containers generally, and some more-specific ideas about http forwarding
- port-forwarding mechanism based on {uuid}-{port} in dns name -- #17207 is already this ticket, if we delete "phase 2+" or rename it to "possible future work we're not doing here"
- update docs/DNS/proxy/config scripts to make the relevant traffic go to controller so it can actually work IRL (or perhaps this should be included in #17207?)
- add "published_endpoints" attribute to containers and container_requests, and update controller to bypass the token check in the 'public' case
- add "service" attribute to containers and container_requests
- add a-c-r features to set "published_endpoints" and "service" (assuming these are trivial enough that it makes sense to just combine them although they're not technically interdependent)
- add special behaviors/index for the new "named port" link_class
- update controller to check for "named port" links when routing if the requested host does not match {uuid} or {uuid}-{port}
- update workbench to show buttons/links for published ports when viewing a CR
- update workbench to browse/search/add "named port" links (UI TBD)
- link_class="hostname" doesn't sound great to me (it leans too far to "name of machine" as opposed to [a prefix of] the hostname part of an HTTP URL). In the sense that we're doing the HTTP-only subset of a more general "exposed ports" feature, the link signifies an "externally accessible port" or "externally accessible service".
- the feature is "proxy https://NAME to http://localhost:PORT in container CTR". So perhaps link_class="proxy" or "http_proxy" would make sense?
- currently the link refers to "port", the container record refers to "published_endpoints", and ports and endpoints seem to be synonymous. Maybe we can say "published_ports" instead, like "exposed ports" and "published ports" in Docker?
Updated by Peter Amstutz 10 days ago
- Related to Feature #22581: Implement API server changes described in #22551 added