Service containers » History » Version 12
Peter Amstutz, 02/05/2025 08:28 PM
1 | 1 | Peter Amstutz | h1. Service containers |
---|---|---|---|
2 | |||
3 | Concept: Containers launched via the Crunch infrastructure, but provide a network port that things can connect to. |
||
4 | |||
5 | 2 | Peter Amstutz | Arvados epic: https://dev.arvados.org/issues/17207 |
6 | |||
7 | 3 | Peter Amstutz | h2. Uses cases |
8 | 1 | Peter Amstutz | |
9 | 2 | Peter Amstutz | * Applications providing an API |
10 | ** a bunch of data needs to be loaded into RAM before it can be used, queried, or computed on |
||
11 | ** e.g. large language models, databases, function-as-a-service |
||
12 | 1 | Peter Amstutz | ** Makes sense when the time spent on any given query is much much smaller than the loading time |
13 | 2 | Peter Amstutz | |
14 | * User facing web applications |
||
15 | ** e.g. Integrative Genomics Viewer (IGV), Jupyter notebooks |
||
16 | ** Also includes web applications that interact with an API (first bullet) |
||
17 | 1 | Peter Amstutz | |
18 | 4 | Peter Amstutz | * Cluster maintenance services |
19 | 5 | Peter Amstutz | ** Services that react to stuff happening on the cluster, such as kicking off a workflow when a collection appears in a certain project, or checking projects for metadata conformance. These things currently run outside of the cluster, but could may benefit from Arvados features if they were also managed by the cluster. |
20 | ** This doesn't strictly require the ability to expose web services, but would benefit from other tweaks to better accommodate long-lived containers. |
||
21 | 4 | Peter Amstutz | |
22 | 3 | Peter Amstutz | h2. Fundamental requirement |
23 | |||
24 | Crunch launches a container and makes it possible for an outside client to communicate with the container. |
||
25 | |||
26 | h2. Discussion points |
||
27 | 1 | Peter Amstutz | |
28 | 4 | Peter Amstutz | h3. Who can communicate with the container |
29 | 1 | Peter Amstutz | |
30 | 4 | Peter Amstutz | Exposing services primarily to outside clients vs communication between containers on the inside have different requirements. |
31 | 3 | Peter Amstutz | |
32 | 4 | Peter Amstutz | Outside: Must be able to connect from outside. Because containers are on a private network, some kind of proxying or network address translation (NAT) is required. |
33 | 1 | Peter Amstutz | |
34 | 4 | Peter Amstutz | Inside: Assuming containers are on the same private network and can route to each other, they can communicate directly. Need to be able to discover how to contact other containers. (Might even want a way of declaring exactly containers can connect to which other containers). |
35 | 3 | Peter Amstutz | |
36 | h3. HTTP only, or arbitrary TCP connections? |
||
37 | 1 | Peter Amstutz | |
38 | 4 | Peter Amstutz | HTTP only: Can proxy HTTP requests using wildcard DNS and "Host:" headers, we have machinery and operational experience doing that already. Can apply Arvados authentication to requests, e.g. setting a cookie with an Arvados token so the client can only communicate with containers that have read access to. Cannot host services that don't use HTTP. |
39 | 1 | Peter Amstutz | |
40 | 4 | Peter Amstutz | Arbitrary TCP: Would need to apply NAT or connection tunneling to connections on an arbitrary external port that is associated with the container. We don't currently have machinery to do this. Authentication is left up to the service. Can host services that have their own protocols, such as postgresql or ssh. |
41 | 3 | Peter Amstutz | |
42 | Container shell uses connection tunneling, it makes a HTTP connection and doing a connection upgrade to SSH. This requires special cooperation between arvados-client and ssh, which doesn't generalize. |
||
43 | |||
44 | 4 | Peter Amstutz | Internal-only connections (between containers) may be a bit easier to orchestrate arbitrary TCP connections without tunneling. Authentication is still left up to the container, or requires fiddling with firewall rules on the fly to control who can access the container. |
45 | 1 | Peter Amstutz | |
46 | 4 | Peter Amstutz | h3. Redundancy with other platforms |
47 | 3 | Peter Amstutz | |
48 | 4 | Peter Amstutz | Kubernetes orchestrates services. This feature overlaps with kubernetes. We don't have the resources to compete with Kubernetes. However, with Arvados as a data analytics platform where scheduling and running code is a core feature, a carefully scoped feature for hosting services could give us some very significant new capability relative to the amount of work. |
49 | 1 | Peter Amstutz | |
50 | 5 | Peter Amstutz | h3. Long lived containers |
51 | |||
52 | We might want to limit certain kinds of logging such as the stats from crunchstat, hoststat, and arv-mount, because a container running for weeks will accumulate a _lot_ of logs. |
||
53 | |||
54 | h3. Container naming |
||
55 | |||
56 | If you start a service, use it for a bit, shut it down, then submit a new container request to bring it back up again, it will get a new UUID. This is a problem if a new session represents the same service and people have it bookmarked, written into scripts, etc. |
||
57 | |||
58 | It would be great to be able to assign a friendly hostname to a running container. Example: instead of https://zzzzz-xvhdp-iiiiiiiiiiiiiii.svc.zzzzz.arvadosapi.com/ you could go to https://ollama.svc.zzzzz.arvadosapi.com/ |
||
59 | |||
60 | 1 | Peter Amstutz | h2. Initial proposal |
61 | 5 | Peter Amstutz | |
62 | 1. container request zzzzz-xvhdp-iiiiiiiiiiiiiii submitted with |
||
63 | |||
64 | <pre> |
||
65 | { |
||
66 | runtime_constraints: { |
||
67 | expose_http_from: 80 |
||
68 | } |
||
69 | } |
||
70 | </pre> |
||
71 | |||
72 | This means "expose the HTTP service running inside the container on port 80". Must be an unencrypted HTTP endpoint. |
||
73 | This creates a corresponding container zzzzz-dz642-iiiiiiiiiiiiiii |
||
74 | |||
75 | 2. For running containers with "expose_http_from", a user can visit a URL proxied by controller: |
||
76 | https://zzzzz-xvhdp-iiiiiiiiiiiiiii.svc.zzzzz.arvadosapi.com/foo?baz&api_token=v2/foo/bar |
||
77 | This does a cookie-setting-redirect to: |
||
78 | https://zzzzz-xvhdp-iiiiiiiiiiiiiii.svc.zzzzz.arvadosapi.com/foo?baz |
||
79 | |||
80 | On each request, the proxy checks the API token to determine if the user has read access to the container request. |
||
81 | The proxy also adds X-Arvados-User-UUID to the request. |
||
82 | If the container is in a project shared with the anonymous user, no API token is required. |
||
83 | |||
84 | 3. Controller forwards the request to the container and returns the response using the mechanism that has been developed for container shell and container logs. |
||
85 | |||
86 | Visiting the container request on workbench give an easy to click link to "https://zzzzz-xvhdp-iiiiiiiiiiiiiii.svc.zzzzz.arvadosapi.com/?api_token=v2/foo/bar" |
||
87 | 6 | Peter Amstutz | |
88 | h2. Engineering meeting notes |
||
89 | |||
90 | Considering the notion of a service container (a long-lived container process) and a container that is available over HTTP to be distinct features. |
||
91 | |||
92 | h3. Service container request |
||
93 | |||
94 | 7 | Peter Amstutz | Service containers can _only_ reuse running containers. |
95 | 6 | Peter Amstutz | |
96 | 7 | Peter Amstutz | Need to double check container cancellation behavior, we might want to be able to do a gracious shutdown. |
97 | 6 | Peter Amstutz | |
98 | 9 | Peter Amstutz | Should be a new top level database field of containers and container requests. |
99 | |||
100 | 6 | Peter Amstutz | h3. HTTP endpoints |
101 | |||
102 | 1 | Peter Amstutz | Mulling over the idea of being able to connect to arbitrary ports but also have named, published endpoints. |
103 | 6 | Peter Amstutz | |
104 | 12 | Peter Amstutz | The default name is uuid followed by the port which will try to proxy HTTP requests to the container at that port: |
105 | |||
106 | https://zzzzz-xvhdp-iiiiiiiiiiiiiii-1234.containers.zzzzz.arvadosapi.com/ |
||
107 | |||
108 | 6 | Peter Amstutz | Arbitrary ports are only available to the user that own the container. |
109 | 9 | Peter Amstutz | |
110 | Should be a new top level database field of containers and container requests. |
||
111 | 6 | Peter Amstutz | |
112 | 7 | Peter Amstutz | Published endpoints have access control: |
113 | 6 | Peter Amstutz | |
114 | 8 | Peter Amstutz | * private (owner only) |
115 | * public (anybody) |
||
116 | |||
117 | Future version could have more access level in between: |
||
118 | |||
119 | 6 | Peter Amstutz | * can_manage |
120 | * can_write |
||
121 | * can_read |
||
122 | 10 | Peter Amstutz | |
123 | Something like: |
||
124 | |||
125 | <pre> |
||
126 | "published_endpoints": [ |
||
127 | { |
||
128 | "port": 80, |
||
129 | "access": "public", |
||
130 | "hostname": "toms-great-container", |
||
131 | "label": "Tom's great container" |
||
132 | } |
||
133 | ] |
||
134 | </pre> |
||
135 | 12 | Peter Amstutz | |
136 | Published endpoints get listed in workbench. Non-published endpoints can be connected are considered private but can still be connected to by the user if the are actually open on the container. |
||
137 | 11 | Peter Amstutz | |
138 | hostnames are first come first served and owned by the user until the link is deleted (by the user or admin). |
||
139 | |||
140 | link_class: hostname |
||
141 | owner_uuid: user or project |
||
142 | head_uuid: container request |
||
143 | name: hostname |
||
144 | properties: { port: 80 } |
||
145 | |||
146 | for all links, 'name' must be unique where 'link_class=hostname' |
||
147 | |||
148 | This makes it possible to access containers with a "friendly" name: |
||
149 | |||
150 | https://friendlyname.containers.zzzzz.arvadosapi.com |
||
151 | |||
152 | API server should validate that the hostname in "name" is valid for DNS and doesn't contain periods, and reject if not. |
||
153 | |||
154 | This scheme for links can be used to assign friendly names to collections in the future. |