Story #15026

Updated by Tom Clegg 6 months ago

Provide an arvados-server "cloudtest" subcommand (lib/cloud/test) that uses the configured credentials (from cluster config file) to verify that
* the selected driver implements the cloud.Driver interface properly (empty and non-empty instance tag sets; no implicit filtering of instances list; Instances() includes the new instance if called immediately after Create() returns success; Destroy() works)
* the cloud provider accepts the configured credentials
* resulting VMs accept the configured SSH private key and run commands as root

This has three main uses:
# Dev tests when creating/modifying a driver
# CI tests
# Verify/debug config while creating/updating a real cluster

Specs:
* By default, use InstanceSetID "cloudtest-$(whoami)@$(hostname)" so a series of aborted/broken runs will recognize any abandoned instances. Accept a command line argument -instance-set-id=string to override.
* Use the selected driver directly: don't use a worker.Pool, rateLimitingInstanceSet, etc.
* Start by listing all instances and checking whether any are tagged with the selected InstanceSetID.
** If so, and a @-clear@ command line flag was given: destroy them, get an updated list, and repeat until they're all gone.
** If so, and a @-clear@ command line flag was not given: log a message mentioning the "-clear" option, and error out.
* Create an instance, using a {"CloudTestPID":"$PID","InstanceSetID":"$InstanceSetID"} tag plus any ResourceTags in the cluster config. If an error is returned, log it (and exit non-zero later), but keep going in case an instance was created.
* Verify that the Tags() on the returned instance match the ones passed to Create().
* List all instances. If an error is returned, log it but keep going so the test instance (if any) can be destroyed.
* Verify that the instance list has an instance with the same ID as the one returned from Create(). If not, keep going, but log an error: the instance is supposed to appear in the very next Instances() call after Create() returns.
* Verify that the instance returned in the list has the same tags.
* If a new instance was created (either Create() succeeded or List() returned an instance with our InstanceSetID):
** Poll Instances() until the instance has a non-empty Address() or TimeoutBooting expires.
** Use ssh_executor to run BootProbeCommand on the instance (or "true" if that's empty). Retry until it succeeds or TimeoutBooting expires.
** Destroy the instance.
* Exit 0 if everything succeeded, otherwise 1.

Back