Project

General

Profile

Service containers » History » Version 4

Peter Amstutz, 02/04/2025 09:16 PM

1 1 Peter Amstutz
h1. Service containers
2
3
Concept: Containers launched via the Crunch infrastructure, but provide a network port that things can connect to.
4
5 2 Peter Amstutz
Arvados epic: https://dev.arvados.org/issues/17207
6
7 3 Peter Amstutz
h2. Uses cases
8 1 Peter Amstutz
9 2 Peter Amstutz
* Applications providing an API
10
** a bunch of data needs to be loaded into RAM before it can be used, queried, or computed on
11
** e.g. large language models, databases, function-as-a-service
12 1 Peter Amstutz
** Makes sense when the time spent on any given query is much much smaller than the loading time
13 2 Peter Amstutz
14
* User facing web applications
15
** e.g. Integrative Genomics Viewer (IGV), Jupyter notebooks
16
** Also includes web applications that interact with an API (first bullet)
17 1 Peter Amstutz
18 4 Peter Amstutz
* Cluster maintenance services
19
** Services that react to stuff happening on the cluster, such as kicking off a workflow when a collection appears in a certain project, or checking projects for metadata conformance.  These things currently run outside of the cluster, but could may benefit from Arvados features if they were also managed by the cluster.
20
21 3 Peter Amstutz
h2. Fundamental requirement
22
23
Crunch launches a container and makes it possible for an outside client to communicate with the container.
24
25
h2. Discussion points
26 1 Peter Amstutz
27 4 Peter Amstutz
h3. Who can communicate with the container 
28 1 Peter Amstutz
29 4 Peter Amstutz
Exposing services primarily to outside clients vs communication between containers on the inside have different requirements.
30 3 Peter Amstutz
31 4 Peter Amstutz
Outside: Must be able to connect from outside.  Because containers are on a private network, some kind of proxying or network address translation (NAT) is required.
32 1 Peter Amstutz
33 4 Peter Amstutz
Inside: Assuming containers are on the same private network and can route to each other, they can communicate directly.  Need to be able to discover how to contact other containers.  (Might even want a way of declaring exactly containers can connect to which other containers).
34 3 Peter Amstutz
35
h3. HTTP only, or arbitrary TCP connections?
36 1 Peter Amstutz
37 4 Peter Amstutz
HTTP only: Can proxy HTTP requests using wildcard DNS and "Host:" headers, we have machinery and operational experience doing that already.  Can apply Arvados authentication to requests, e.g. setting a cookie with an Arvados token so the client can only communicate with containers that have read access to.  Cannot host services that don't use HTTP.
38 1 Peter Amstutz
39 4 Peter Amstutz
Arbitrary TCP: Would need to apply NAT or connection tunneling to connections on an arbitrary external port that is associated with the container.  We don't currently have machinery to do this.  Authentication is left up to the service.  Can host services that have their own protocols, such as postgresql or ssh.
40 3 Peter Amstutz
41
Container shell uses connection tunneling, it makes a HTTP connection and doing a connection upgrade to SSH.  This requires special cooperation between arvados-client and ssh, which doesn't generalize.
42
43 4 Peter Amstutz
Internal-only connections (between containers) may be a bit easier to orchestrate arbitrary TCP connections without tunneling.  Authentication is still left up to the container, or requires fiddling with firewall rules on the fly to control who can access the container.
44 1 Peter Amstutz
45 4 Peter Amstutz
h3. Redundancy with other platforms
46 3 Peter Amstutz
47 4 Peter Amstutz
Kubernetes orchestrates services.  This feature overlaps with kubernetes.  We don't have the resources to compete with Kubernetes.  However, with Arvados as a data analytics platform where scheduling and running code is a core feature, a carefully scoped feature for hosting services could give us some very significant new capability relative to the amount of work.
48 3 Peter Amstutz
49 4 Peter Amstutz
h2. Initial proposal