Admin CLI for managing dispatcher cloud VMs » History » Version 2
Tom Clegg, 05/02/2023 06:25 PM
1 | 1 | Tom Clegg | h1. Admin CLI for managing dispatcher / cloud VMs |
---|---|---|---|
2 | |||
3 | Background: arvados-dispatch-cloud has a management interface for reporting and controlling the container queue and cloud VM instances. In principle, this is useful for identifying and killing stuck containers/instances, etc. However, it's very cumbersome to use because the only available frontend is a generic HTTP client like curl. |
||
4 | |||
5 | We should address this by adding some @arvados-server@ subcommands (alongside @cloudtest@, @config-check@, etc.). |
||
6 | |||
7 | For now, these commands will error out if arvados-dispatch-cloud is not running. In future some commands may also be usable with crunch-dispatch-slurm et al. |
||
8 | |||
9 | 2 | Tom Clegg | Like other @arvados-server@ commands, these are intended to be run on a server node, i.e., where @/etc/arvados/config.yml@ is readable and services' InternalURLs are reachable. |
10 | |||
11 | 1 | Tom Clegg | h2. Proposed commands |
12 | |||
13 | h3. @arvados-server instance list@ |
||
14 | |||
15 | Display all known instance IDs, one per line, followed by state (unknown/booting/idle/...), time since last successful probe ("-" if none), and last container UUID ("-" if none). |
||
16 | |||
17 | h3. @arvados-server instance kill -reason "optional reason" {instanceID|containerUUID}@ |
||
18 | |||
19 | Terminate specified instance. |
||
20 | |||
21 | If a container UUID is given, terminate whichever instance is running that container. |
||
22 | |||
23 | h3. @arvados-server instance hold|drain|resume {instanceID|containerUUID}@ |
||
24 | |||
25 | Set idle behavior for specified instance to hold/drain/resume. See https://doc.arvados.org/main/api/dispatch.html |
||
26 | |||
27 | If a container UUID is given, operate on whichever instance is running that container. |
||
28 | |||
29 | h3. @arvados-server container kill -reason "optional reason" {containerUUID}@ |
||
30 | |||
31 | Terminate specified container. Unlike clicking "cancel" in workbench2 which resets container request priority to 0, this signals the crunch-run supervisor process to terminate immediately. |