Project

General

Profile

Idea #15026

Updated by Tom Clegg over 5 years ago

Provide an arvados-server "cloudtest" subcommand (lib/cloud/test) that uses the configured credentials (from cluster config file) to verify that 
 * the selected driver implements the cloud.Driver interface properly (empty and non-empty instance tag sets; no implicit filtering of instances list; Instances() includes the new instance if called immediately after Create() returns success; Destroy() works) 
 * the cloud provider accepts the configured credentials 
 * resulting VMs accept the configured SSH private key and run commands as root 

 This has three main uses: 
 # Dev tests when creating/modifying a driver 
 # CI tests 
 # Verify/debug config while creating/updating a real cluster 

 Specs: 
 * By default, use InstanceSetID "cloudtest-$(whoami)@$(hostname)" so a series of aborted/broken runs will recognize any abandoned instances. Accept a command line argument -instance-set-id=string to override. 
 * Use the selected driver directly: don't use a worker.Pool, rateLimitingInstanceSet, etc. 
 * Start by listing all instances and checking whether any are tagged with the selected InstanceSetID. 
 ** If so, and a @-clear@ command line flag was given: destroy them, get an updated list, and repeat until they're all gone. 
 ** If so, and a @-clear@ command line flag was not given: log a message mentioning the "-clear" option, and error out. 
 * Create an instance, using a {"CloudTestPID":"$PID","InstanceSetID":"$InstanceSetID"} tag plus any ResourceTags in the cluster config. If an error is returned, log it (and exit non-zero later), but keep going in case an instance was created. 
 * Verify that the Tags() on the returned instance match the ones passed to Create(). 
 * List all instances. If an error is returned, log it but keep going so the test instance (if any) can be destroyed. 
 * Verify that the instance list has an instance with the same ID as the one returned from Create(). If not, keep going, but log an error: the instance is supposed to appear in the very next Instances() call after Create() returns. 
 * Verify that the instance returned in the list has the same tags. 
 * If a new instance was created (either Create() succeeded or List() returned an instance with our InstanceSetID): 
 ** Poll Instances() until the instance has a non-empty Address() or TimeoutBooting expires. 
 ** Use ssh_executor to run BootProbeCommand on the instance (or "docker ps -q" "true" if that's empty). Retry until it succeeds or TimeoutBooting expires. 
 ** If the -pause-before-destroy flag is given, show a sample SSH command line for connecting to the instance, and wait for the user to press Enter before proceeding. 
 ** Destroy the instance. 
 ** Poll Instances() until the instance disappears. 
 * Exit 0 if everything succeeded, otherwise 1. 

 If the -quiet flag isn't given, log progress to stdout. 

 <pre> 
 $ arvados-server cloudtest -exec 'echo $(hostname) $(date)' -pause-before-destroy 
 getting instance list 
 got instance list (N=13) 
 no instances are tagged with our InstanceSetID (7 instances are not tagged with any InstanceSetID at all) 
 creating instance with tags map[CloudTestPID:1234, InstanceSetID:cloudtest-ops@4xphq] 
 created instance with id i-12345abcde 
 all requested tags are present 
 getting instance list 
 got instance list (N=14) 
 found our instance i-12345abcde in returned list 
 all requested tags are present 
 instance has no address 
 waiting probeInterval 10s 
 getting instance list 
 got instance list (N=14) 
 found our instance i-12345abcde in returned list 
 instance has no address 
 waiting probeInterval 10s 
 getting instance list 
 got instance list (N=14) 
 found our instance i-12345abcde in returned list 
 instance i-12345abcde has addr address 10.2.3.4 
 executing command "docker ps -q" "true" on i-12345abcde addr 10.2.3.4 port 2222 
 executing command failed (attempt 1): failed: connection refused, output "" refused 
 waiting probeInterval 10s 
 executing command "docker ps -q" "true" on i-12345abcde addr 10.2.3.4 port 2222 
 executing command failed (attempt 2): failed: connection refused, output "" refused 
 waiting probeInterval 10s 
 executing command "docker ps -q" "true" on i-12345abcde addr 10.2.3.4 port 2222 
 executing command succeeded (attempt 3), output "" 
 executing command "echo $(hostname) $(date)" on i-12345abcde addr 10.2.3.4 port 2222 
 executing command succeeded (attempt 1), output "i-12345abcde.cloud.example Tue Jun 11 11:28:23 EDT 2019\n" 
 instance is booted 
 ... you can connect with "ssh -p2222 debian@10.2.3.4" 
 ... hit Enter when you are finished, and ready to destroy the instance: {pause until user hits Enter} 
 destroying instance i-12345abcde 
 destroyed instance i-12345abcde 
 getting instance list 
 got instance list (N=14) 
 found our instance i-12345abcde in returned list 
 waiting probeInterval 10s 
 getting instance list 
 got instance list (N=14) 
 found our instance i-12345abcde in returned list 
 waiting probeInterval 10s 
 getting instance list 
 got instance list (N=14) 
 instance i-12345abcde not found in returned list 
 done 
 </pre> 

Back