Project

General

Profile

Actions

Admin CLI for managing dispatcher / cloud VMs

Background: arvados-dispatch-cloud has a management interface for reporting and controlling the container queue and cloud VM instances. In principle, this is useful for identifying and killing stuck containers/instances, etc. However, it's very cumbersome to use because the only available frontend is a generic HTTP client like curl.

We should address this by adding some arvados-server subcommands (alongside cloudtest, config-check, etc.).

For now, these commands will error out if arvados-dispatch-cloud is not running. In future some commands may also be usable with crunch-dispatch-slurm et al.

Like other arvados-server commands, these are intended to be run on a server node, i.e., where /etc/arvados/config.yml is readable and services' InternalURLs are reachable.

Proposed commands

arvados-server instance list

Display all known instance IDs, one per line, followed by state (unknown/booting/idle/...), time since last successful probe ("-" if none), and last container UUID ("-" if none).

arvados-server instance kill -reason "optional reason" {instanceID|containerUUID}

Terminate specified instance.

If a container UUID is given, terminate whichever instance is running that container.

arvados-server instance hold|drain|resume {instanceID|containerUUID}

Set idle behavior for specified instance to hold/drain/resume. See https://doc.arvados.org/main/api/dispatch.html

If a container UUID is given, operate on whichever instance is running that container.

arvados-server container kill -reason "optional reason" {containerUUID}

Terminate specified container. Unlike clicking "cancel" in workbench2 which resets container request priority to 0, this signals the crunch-run supervisor process to terminate immediately.

Updated by Tom Clegg over 1 year ago · 2 revisions