Project

General

Profile

Actions

Bug #21258

closed

lib/dispatchcloud TestDispatchToStubDriver is racey

Added by Brett Smith 6 months ago. Updated 18 days ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
Tests
Story points:
-
Release:
Release relationship:
Auto

Description

Failure mode 2

This failure was seen in a run that had b015c9e45f2a81b7069e5ecde3e0e9e0c5c619fa merged.

dispatcher_test.go:297:
    c.Check(resp.Body.String(), check.Matches, `(?ms).*boot_outcomes{outcome="failure"} [^0].*`)
... value string = "" +
...     "# HELP arvados_dispatchcloud_at_quota Flag indicating the cloud driver is reporting an at-quota condition.\n" +
...     "# TYPE arvados_dispatchcloud_at_quota gauge\n" +
...     "arvados_dispatchcloud_at_quota 0\n" +
...     "# HELP arvados_dispatchcloud_boot_outcomes Boot outcomes by type.\n" +
...     "# TYPE arvados_dispatchcloud_boot_outcomes counter\n" +
...     "arvados_dispatchcloud_boot_outcomes{outcome=\"aborted\"} 7\n" +
...     "arvados_dispatchcloud_boot_outcomes{outcome=\"disappeared\"} 5\n" +
...     "arvados_dispatchcloud_boot_outcomes{outcome=\"failure\"} 0\n" +
...     "arvados_dispatchcloud_boot_outcomes{outcome=\"success\"} 41\n" +
...     "# HELP arvados_dispatchcloud_containers_allocated_not_started Number of containers allocated to a worker but not started yet (worker is booting).\n" +
...     "# TYPE arvados_dispatchcloud_containers_allocated_not_started gauge\n" +
...     "arvados_dispatchcloud_containers_allocated_not_started 0\n" +
...     "# HELP arvados_dispatchcloud_containers_longest_wait_time_seconds Current longest wait time of any container since queuing, and before the start of crunch-run.\n" +
...     "# TYPE arvados_dispatchcloud_containers_longest_wait_time_seconds gauge\n" +
...     "arvados_dispatchcloud_containers_longest_wait_time_seconds 0\n" +
...     "# HELP arvados_dispatchcloud_containers_not_allocated_over_quota Number of containers not allocated to a worker because the system has hit a quota.\n" +
...     "# TYPE arvados_dispatchcloud_containers_not_allocated_over_quota gauge\n" +
...     "arvados_dispatchcloud_containers_not_allocated_over_quota 0\n" +
...     "# HELP arvados_dispatchcloud_containers_running Number of containers reported running by cloud VMs.\n" +
...     "# TYPE arvados_dispatchcloud_containers_running gauge\n" +
...     "arvados_dispatchcloud_containers_running 0\n" +
...     "# HELP arvados_dispatchcloud_containers_time_from_queue_to_crunch_run_seconds Number of seconds between the queuing of a container and the start of crunch-run.\n" +
...     "# TYPE arvados_dispatchcloud_containers_time_from_queue_to_crunch_run_seconds summary\n" +
...     "arvados_dispatchcloud_containers_time_from_queue_to_crunch_run_seconds{quantile=\"0.5\"} 9.223372036854776e+09\n" +
...     "arvados_dispatchcloud_containers_time_from_queue_to_crunch_run_seconds{quantile=\"0.9\"} 9.223372036854776e+09\n" +
...     "arvados_dispatchcloud_containers_time_from_queue_to_crunch_run_seconds{quantile=\"0.95\"} 9.223372036854776e+09\n" +
...     "arvados_dispatchcloud_containers_time_from_queue_to_crunch_run_seconds{quantile=\"0.99\"} 9.223372036854776e+09\n" +
...     "arvados_dispatchcloud_containers_time_from_queue_to_crunch_run_seconds_sum 1.9461314997763523e+12\n" +
...     "arvados_dispatchcloud_containers_time_from_queue_to_crunch_run_seconds_count 211\n" +
...     "# HELP arvados_dispatchcloud_driver_operations Number of instance-create/destroy/list operations performed via cloud driver.\n" +
...     "# TYPE arvados_dispatchcloud_driver_operations counter\n" +
...     "arvados_dispatchcloud_driver_operations{error=\"0\",operation=\"Create\"} 53\n" +
...     "arvados_dispatchcloud_driver_operations{error=\"0\",operation=\"Destroy\"} 53\n" +
...     "arvados_dispatchcloud_driver_operations{error=\"0\",operation=\"List\"} 111\n" +
...     "arvados_dispatchcloud_driver_operations{error=\"0\",operation=\"SetTags\"} 25\n" +
...     "arvados_dispatchcloud_driver_operations{error=\"1\",operation=\"Create\"} 15\n" +
...     "arvados_dispatchcloud_driver_operations{error=\"1\",operation=\"Destroy\"} 7\n" +
...     "arvados_dispatchcloud_driver_operations{error=\"1\",operation=\"List\"} 0\n" +
...     "arvados_dispatchcloud_driver_operations{error=\"1\",operation=\"SetTags\"} 0\n" +
...     "# HELP arvados_dispatchcloud_instances_disappeared Number of occurrences of an instance disappearing from the cloud provider's list of instances.\n" +
...     "# TYPE arvados_dispatchcloud_instances_disappeared counter\n" +
...     "arvados_dispatchcloud_instances_disappeared{state=\"booting\"} 0\n" +
...     "arvados_dispatchcloud_instances_disappeared{state=\"idle\"} 0\n" +
...     "arvados_dispatchcloud_instances_disappeared{state=\"running\"} 0\n" +
...     "arvados_dispatchcloud_instances_disappeared{state=\"shutdown\"} 52\n" +
...     "arvados_dispatchcloud_instances_disappeared{state=\"unknown\"} 0\n" +
...     "# HELP arvados_dispatchcloud_instances_price Price of cloud VMs.\n" +
...     "# TYPE arvados_dispatchcloud_instances_price gauge\n" +
...     "arvados_dispatchcloud_instances_price{category=\"booting\"} 0\n" +
...     "arvados_dispatchcloud_instances_price{category=\"hold\"} 0\n" +
...     "arvados_dispatchcloud_instances_price{category=\"idle\"} 0.984\n" +
...     "arvados_dispatchcloud_instances_price{category=\"inuse\"} 0\n" +
...     "arvados_dispatchcloud_instances_price{category=\"unknown\"} 0\n" +
...     "# HELP arvados_dispatchcloud_instances_run_probe_duration_seconds Number of seconds per runProbe call.\n" +
...     "# TYPE arvados_dispatchcloud_instances_run_probe_duration_seconds summary\n" +
...     "arvados_dispatchcloud_instances_run_probe_duration_seconds{outcome=\"fail\",quantile=\"0.5\"} 0.000299431\n" +
...     "arvados_dispatchcloud_instances_run_probe_duration_seconds{outcome=\"fail\",quantile=\"0.9\"} 0.00055759\n" +
...     "arvados_dispatchcloud_instances_run_probe_duration_seconds{outcome=\"fail\",quantile=\"0.95\"} 0.00072938\n" +
...     "arvados_dispatchcloud_instances_run_probe_duration_seconds{outcome=\"fail\",quantile=\"0.99\"} 0.00126165\n" +
...     "arvados_dispatchcloud_instances_run_probe_duration_seconds_sum{outcome=\"fail\"} 0.04786191400000002\n" +
...     "arvados_dispatchcloud_instances_run_probe_duration_seconds_count{outcome=\"fail\"} 128\n" +
...     "arvados_dispatchcloud_instances_run_probe_duration_seconds{outcome=\"success\",quantile=\"0.5\"} 0.000334\n" +
...     "arvados_dispatchcloud_instances_run_probe_duration_seconds{outcome=\"success\",quantile=\"0.9\"} 0.00065138\n" +
...     "arvados_dispatchcloud_instances_run_probe_duration_seconds{outcome=\"success\",quantile=\"0.95\"} 0.00091094\n" +
...     "arvados_dispatchcloud_instances_run_probe_duration_seconds{outcome=\"success\",quantile=\"0.99\"} 0.00174714\n" +
...     "arvados_dispatchcloud_instances_run_probe_duration_seconds_sum{outcome=\"success\"} 0.3429287439999997\n" +
...     "arvados_dispatchcloud_instances_run_probe_duration_seconds_count{outcome=\"success\"} 876\n" +
...     "# HELP arvados_dispatchcloud_instances_time_from_shutdown_request_to_disappearance_seconds Number of seconds between the first shutdown attempt and the disappearance of the worker.\n" +
...     "# TYPE arvados_dispatchcloud_instances_time_from_shutdown_request_to_disappearance_seconds summary\n" +
...     "arvados_dispatchcloud_instances_time_from_shutdown_request_to_disappearance_seconds{quantile=\"0.5\"} 0.00741143\n" +
...     "arvados_dispatchcloud_instances_time_from_shutdown_request_to_disappearance_seconds{quantile=\"0.9\"} 0.01473868\n" +
...     "arvados_dispatchcloud_instances_time_from_shutdown_request_to_disappearance_seconds{quantile=\"0.95\"} 0.01628774\n" +
...     "arvados_dispatchcloud_instances_time_from_shutdown_request_to_disappearance_seconds{quantile=\"0.99\"} 0.01721516\n" +
...     "arvados_dispatchcloud_instances_time_from_shutdown_request_to_disappearance_seconds_sum 0.40030032699999996\n" +
...     "arvados_dispatchcloud_instances_time_from_shutdown_request_to_disappearance_seconds_count 52\n" +
...     "# HELP arvados_dispatchcloud_instances_time_to_ready_for_container_seconds Number of seconds between the first successful SSH connection and ready to run a container.\n" +
...     "# TYPE arvados_dispatchcloud_instances_time_to_ready_for_container_seconds summary\n" +
...     "arvados_dispatchcloud_instances_time_to_ready_for_container_seconds{quantile=\"0.5\"} 0.009143631\n" +
...     "arvados_dispatchcloud_instances_time_to_ready_for_container_seconds{quantile=\"0.9\"} 0.01485901\n" +
...     "arvados_dispatchcloud_instances_time_to_ready_for_container_seconds{quantile=\"0.95\"} 0.01578203\n" +
...     "arvados_dispatchcloud_instances_time_to_ready_for_container_seconds{quantile=\"0.99\"} 0.0195959\n" +
...     "arvados_dispatchcloud_instances_time_to_ready_for_container_seconds_sum 0.37931822099999996\n" +
...     "arvados_dispatchcloud_instances_time_to_ready_for_container_seconds_count 41\n" +
...     "# HELP arvados_dispatchcloud_instances_time_to_ssh_seconds Number of seconds between instance creation and the first successful SSH connection.\n" +
...     "# TYPE arvados_dispatchcloud_instances_time_to_ssh_seconds summary\n" +
...     "arvados_dispatchcloud_instances_time_to_ssh_seconds{quantile=\"0.5\"} 0.0181355\n" +
...     "arvados_dispatchcloud_instances_time_to_ssh_seconds{quantile=\"0.9\"} 0.02653959\n" +
...     "arvados_dispatchcloud_instances_time_to_ssh_seconds{quantile=\"0.95\"} 0.02768423\n" +
...     "arvados_dispatchcloud_instances_time_to_ssh_seconds{quantile=\"0.99\"} 0.03185776\n" +
...     "arvados_dispatchcloud_instances_time_to_ssh_seconds_sum 0.9493334640000001\n" +
...     "arvados_dispatchcloud_instances_time_to_ssh_seconds_count 50\n" +
...     "# HELP arvados_dispatchcloud_instances_total Number of cloud VMs.\n" +
...     "# TYPE arvados_dispatchcloud_instances_total gauge\n" +
...     "arvados_dispatchcloud_instances_total{category=\"booting\",instance_type=\"type1\"} 0\n" +
...     "arvados_dispatchcloud_instances_total{category=\"booting\",instance_type=\"type16\"} 0\n" +
...     "arvados_dispatchcloud_instances_total{category=\"booting\",instance_type=\"type2\"} 0\n" +
...     "arvados_dispatchcloud_instances_total{category=\"booting\",instance_type=\"type3\"} 0\n" +
...     "arvados_dispatchcloud_instances_total{category=\"booting\",instance_type=\"type4\"} 0\n" +
...     "arvados_dispatchcloud_instances_total{category=\"booting\",instance_type=\"type6\"} 0\n" +
...     "arvados_dispatchcloud_instances_total{category=\"booting\",instance_type=\"type8\"} 0\n" +
...     "arvados_dispatchcloud_instances_total{category=\"hold\",instance_type=\"type1\"} 0\n" +
...     "arvados_dispatchcloud_instances_total{category=\"hold\",instance_type=\"type16\"} 0\n" +
...     "arvados_dispatchcloud_instances_total{category=\"hold\",instance_type=\"type2\"} 0\n" +
...     "arvados_dispatchcloud_instances_total{category=\"hold\",instance_type=\"type3\"} 0\n" +
...     "arvados_dispatchcloud_instances_total{category=\"hold\",instance_type=\"type4\"} 0\n" +
...     "arvados_dispatchcloud_instances_total{category=\"hold\",instance_type=\"type6\"} 0\n" +
...     "arvados_dispatchcloud_instances_total{category=\"hold\",instance_type=\"type8\"} 0\n" +
...     "arvados_dispatchcloud_instances_total{category=\"idle\",instance_type=\"type1\"} 0\n" +
...     "arvados_dispatchcloud_instances_total{category=\"idle\",instance_type=\"type16\"} 0\n" +
...     "arvados_dispatchcloud_instances_total{category=\"idle\",instance_type=\"type2\"} 0\n" +
...     "arvados_dispatchcloud_instances_total{category=\"idle\",instance_type=\"type3\"} 0\n" +
...     "arvados_dispatchcloud_instances_total{category=\"idle\",instance_type=\"type4\"} 0\n" +
...     "arvados_dispatchcloud_instances_total{category=\"idle\",instance_type=\"type6\"} 0\n" +
...     "arvados_dispatchcloud_instances_total{category=\"idle\",instance_type=\"type8\"} 1\n" +
...     "arvados_dispatchcloud_instances_total{category=\"inuse\",instance_type=\"type1\"} 0\n" +
...     "arvados_dispatchcloud_instances_total{category=\"inuse\",instance_type=\"type16\"} 0\n" +
...     "arvados_dispatchcloud_instances_total{category=\"inuse\",instance_type=\"type2\"} 0\n" +
...     "arvados_dispatchcloud_instances_total{category=\"inuse\",instance_type=\"type3\"} 0\n" +
...     "arvados_dispatchcloud_instances_total{category=\"inuse\",instance_type=\"type4\"} 0\n" +
...     "arvados_dispatchcloud_instances_total{category=\"inuse\",instance_type=\"type6\"} 0\n" +
...     "arvados_dispatchcloud_instances_total{category=\"inuse\",instance_type=\"type8\"} 0\n" +
...     "arvados_dispatchcloud_instances_total{category=\"unknown\",instance_type=\"type1\"} 0\n" +
...     "arvados_dispatchcloud_instances_total{category=\"unknown\",instance_type=\"type16\"} 0\n" +
...     "arvados_dispatchcloud_instances_total{category=\"unknown\",instance_type=\"type2\"} 0\n" +
...     "arvados_dispatchcloud_instances_total{category=\"unknown\",instance_type=\"type3\"} 0\n" +
...     "arvados_dispatchcloud_instances_total{category=\"unknown\",instance_type=\"type4\"} 0\n" +
...     "arvados_dispatchcloud_instances_total{category=\"unknown\",instance_type=\"type6\"} 0\n" +
...     "arvados_dispatchcloud_instances_total{category=\"unknown\",instance_type=\"type8\"} 0\n" +
...     "# HELP arvados_dispatchcloud_last_503_time Time of most recent 503 error received from API.\n" +
...     "# TYPE arvados_dispatchcloud_last_503_time gauge\n" +
...     "arvados_dispatchcloud_last_503_time 1.703100514e+09\n" +
...     "# HELP arvados_dispatchcloud_max_concurrent_containers Dynamically assigned limit on number of containers scheduled concurrency, set after receiving 503 errors from API.\n" +
...     "# TYPE arvados_dispatchcloud_max_concurrent_containers gauge\n" +
...     "arvados_dispatchcloud_max_concurrent_containers 5\n" +
...     "# HELP arvados_dispatchcloud_memory_bytes_total Total memory on all cloud VMs.\n" +
...     "# TYPE arvados_dispatchcloud_memory_bytes_total gauge\n" +
...     "arvados_dispatchcloud_memory_bytes_total{category=\"booting\"} 0\n" +
...     "arvados_dispatchcloud_memory_bytes_total{category=\"hold\"} 0\n" +
...     "arvados_dispatchcloud_memory_bytes_total{category=\"idle\"} 8.589934592e+09\n" +
...     "arvados_dispatchcloud_memory_bytes_total{category=\"inuse\"} 0\n" +
...     "arvados_dispatchcloud_memory_bytes_total{category=\"unknown\"} 0\n" +
...     "# HELP arvados_dispatchcloud_probe_age_seconds_max Maximum number of seconds since an instance's most recent successful probe.\n" +
...     "# TYPE arvados_dispatchcloud_probe_age_seconds_max gauge\n" +
...     "arvados_dispatchcloud_probe_age_seconds_max 0.01719935\n" +
...     "# HELP arvados_dispatchcloud_probe_age_seconds_median Median number of seconds since an instance's most recent successful probe.\n" +
...     "# TYPE arvados_dispatchcloud_probe_age_seconds_median gauge\n" +
...     "arvados_dispatchcloud_probe_age_seconds_median 0.01719935\n" +
...     "# HELP arvados_dispatchcloud_vcpus_total Total VCPUs on all cloud VMs.\n" +
...     "# TYPE arvados_dispatchcloud_vcpus_total gauge\n" +
...     "arvados_dispatchcloud_vcpus_total{category=\"booting\"} 0\n" +
...     "arvados_dispatchcloud_vcpus_total{category=\"hold\"} 0\n" +
...     "arvados_dispatchcloud_vcpus_total{category=\"idle\"} 8\n" +
...     "arvados_dispatchcloud_vcpus_total{category=\"inuse\"} 0\n" +
...     "arvados_dispatchcloud_vcpus_total{category=\"unknown\"} 0\n" 
... regex string = "(?ms).*boot_outcomes{outcome=\"failure\"} [^0].*" 

time="2023-12-20T19:28:35.182068484Z" level=info msg="instance disappeared in cloud" Instance="inst46,providertype8" WorkerState=shutdown
OOPS: 11 passed, 1 FAILED
--- FAIL: Test (1.25s)
FAIL
    git.arvados.org/arvados.git/lib/dispatchcloud    coverage: 75.1% of statements
FAIL    git.arvados.org/arvados.git/lib/dispatchcloud    1.271s
FAIL

Failure mode 1

FAIL: dispatcher_test.go:160: DispatcherSuite.TestDispatchToStubDriver
[way too many logs omitted, see attachment]
dispatcher_test.go:296:
    c.Check(resp.Body.String(), check.Matches, `(?ms).*boot_outcomes{outcome="failure"} [^0].*`)
... value string = "" +
...     "# HELP arvados_dispatchcloud_at_quota Flag indicating the cloud driver is reporting an at-quota condition.\n" +
...     "# TYPE arvados_dispatchcloud_at_quota gauge\n" +
...     "arvados_dispatchcloud_at_quota 0\n" +
...     "# HELP arvados_dispatchcloud_boot_outcomes Boot outcomes by type.\n" +
...     "# TYPE arvados_dispatchcloud_boot_outcomes counter\n" +
...     "arvados_dispatchcloud_boot_outcomes{outcome=\"aborted\"} 9\n" +
...     "arvados_dispatchcloud_boot_outcomes{outcome=\"disappeared\"} 4\n" +
...     "arvados_dispatchcloud_boot_outcomes{outcome=\"failure\"} 0\n" +
...     "arvados_dispatchcloud_boot_outcomes{outcome=\"success\"} 49\n" +
...     "# HELP arvados_dispatchcloud_containers_allocated_not_started Number of containers allocated to a worker but not started yet (worker is booting).\n" +
...     "# TYPE arvados_dispatchcloud_containers_allocated_not_started gauge\n" +
...     "arvados_dispatchcloud_containers_allocated_not_started 0\n" +
...     "# HELP arvados_dispatchcloud_containers_longest_wait_time_seconds Current longest wait time of any container since queuing, and before the start of crunch-run.\n" +
...     "# TYPE arvados_dispatchcloud_containers_longest_wait_time_seconds gauge\n" +
...     "arvados_dispatchcloud_containers_longest_wait_time_seconds 0\n" +
...     "# HELP arvados_dispatchcloud_containers_not_allocated_over_quota Number of containers not allocated to a worker because the system has hit a quota.\n" +
...     "# TYPE arvados_dispatchcloud_containers_not_allocated_over_quota gauge\n" +
...     "arvados_dispatchcloud_containers_not_allocated_over_quota 0\n" +
...     "# HELP arvados_dispatchcloud_containers_running Number of containers reported running by cloud VMs.\n" +
...     "# TYPE arvados_dispatchcloud_containers_running gauge\n" +
...     "arvados_dispatchcloud_containers_running 0\n" +
...     "# HELP arvados_dispatchcloud_containers_time_from_queue_to_crunch_run_seconds Number of seconds between the queuing of a container and the start of crunch-run.\n" +
...     "# TYPE arvados_dispatchcloud_containers_time_from_queue_to_crunch_run_seconds summary\n" +
...     "arvados_dispatchcloud_containers_time_from_queue_to_crunch_run_seconds{quantile=\"0.5\"} 9.223372036854776e+09\n" +
...     "arvados_dispatchcloud_containers_time_from_queue_to_crunch_run_seconds{quantile=\"0.9\"} 9.223372036854776e+09\n" +
...     "arvados_dispatchcloud_containers_time_from_queue_to_crunch_run_seconds{quantile=\"0.95\"} 9.223372036854776e+09\n" +
...     "arvados_dispatchcloud_containers_time_from_queue_to_crunch_run_seconds{quantile=\"0.99\"} 9.223372036854776e+09\n" +
...     "arvados_dispatchcloud_containers_time_from_queue_to_crunch_run_seconds_sum 1.9830249879237712e+12\n" +
...     "arvados_dispatchcloud_containers_time_from_queue_to_crunch_run_seconds_count 215\n" +
...     "# HELP arvados_dispatchcloud_driver_operations Number of instance-create/destroy/list operations performed via cloud driver.\n" +
...     "# TYPE arvados_dispatchcloud_driver_operations counter\n" +
...     "arvados_dispatchcloud_driver_operations{error=\"0\",operation=\"Create\"} 62\n" +
...     "arvados_dispatchcloud_driver_operations{error=\"0\",operation=\"Destroy\"} 62\n" +
...     "arvados_dispatchcloud_driver_operations{error=\"0\",operation=\"List\"} 103\n" +
...     "arvados_dispatchcloud_driver_operations{error=\"0\",operation=\"SetTags\"} 21\n" +
...     "arvados_dispatchcloud_driver_operations{error=\"1\",operation=\"Create\"} 13\n" +
...     "arvados_dispatchcloud_driver_operations{error=\"1\",operation=\"Destroy\"} 9\n" +
...     "arvados_dispatchcloud_driver_operations{error=\"1\",operation=\"List\"} 0\n" +
...     "arvados_dispatchcloud_driver_operations{error=\"1\",operation=\"SetTags\"} 0\n" +
...     "# HELP arvados_dispatchcloud_instances_disappeared Number of occurrences of an instance disappearing from the cloud provider's list of instances.\n" +
...     "# TYPE arvados_dispatchcloud_instances_disappeared counter\n" +
...     "arvados_dispatchcloud_instances_disappeared{state=\"booting\"} 0\n" +
...     "arvados_dispatchcloud_instances_disappeared{state=\"idle\"} 0\n" +
...     "arvados_dispatchcloud_instances_disappeared{state=\"running\"} 0\n" +
...     "arvados_dispatchcloud_instances_disappeared{state=\"shutdown\"} 62\n" +
...     "arvados_dispatchcloud_instances_disappeared{state=\"unknown\"} 0\n" +
...     "# HELP arvados_dispatchcloud_instances_price Price of cloud VMs.\n" +
...     "# TYPE arvados_dispatchcloud_instances_price gauge\n" +
...     "arvados_dispatchcloud_instances_price{category=\"booting\"} 0\n" +
...     "arvados_dispatchcloud_instances_price{category=\"hold\"} 0\n" +
...     "arvados_dispatchcloud_instances_price{category=\"idle\"} 0\n" +
...     "arvados_dispatchcloud_instances_price{category=\"inuse\"} 0\n" +
...     "arvados_dispatchcloud_instances_price{category=\"unknown\"} 0\n" +
...     "# HELP arvados_dispatchcloud_instances_run_probe_duration_seconds Number of seconds per runProbe call.\n" +
...     "# TYPE arvados_dispatchcloud_instances_run_probe_duration_seconds summary\n" +
...     "arvados_dispatchcloud_instances_run_probe_duration_seconds{outcome=\"fail\",quantile=\"0.5\"} 0.00025819\n" +
...     "arvados_dispatchcloud_instances_run_probe_duration_seconds{outcome=\"fail\",quantile=\"0.9\"} 0.00049002\n" +
...     "arvados_dispatchcloud_instances_run_probe_duration_seconds{outcome=\"fail\",quantile=\"0.95\"} 0.00068864\n" +
...     "arvados_dispatchcloud_instances_run_probe_duration_seconds{outcome=\"fail\",quantile=\"0.99\"} 0.0016936\n" +
...     "arvados_dispatchcloud_instances_run_probe_duration_seconds_sum{outcome=\"fail\"} 0.04495730099999998\n" +
...     "arvados_dispatchcloud_instances_run_probe_duration_seconds_count{outcome=\"fail\"} 138\n" +
...     "arvados_dispatchcloud_instances_run_probe_duration_seconds{outcome=\"success\",quantile=\"0.5\"} 0.00025914\n" +
...     "arvados_dispatchcloud_instances_run_probe_duration_seconds{outcome=\"success\",quantile=\"0.9\"} 0.00051182\n" +
...     "arvados_dispatchcloud_instances_run_probe_duration_seconds{outcome=\"success\",quantile=\"0.95\"} 0.00081387\n" +
...     "arvados_dispatchcloud_instances_run_probe_duration_seconds{outcome=\"success\",quantile=\"0.99\"} 0.001874889\n" +
...     "arvados_dispatchcloud_instances_run_probe_duration_seconds_sum{outcome=\"success\"} 0.2827972210000001\n" +
...     "arvados_dispatchcloud_instances_run_probe_duration_seconds_count{outcome=\"success\"} 843\n" +
...     "# HELP arvados_dispatchcloud_instances_time_from_shutdown_request_to_disappearance_seconds Number of seconds between the first shutdown attempt and the disappearance of the worker.\n" +
...     "# TYPE arvados_dispatchcloud_instances_time_from_shutdown_request_to_disappearance_seconds summary\n" +
...     "arvados_dispatchcloud_instances_time_from_shutdown_request_to_disappearance_seconds{quantile=\"0.5\"} 0.009525099\n" +
...     "arvados_dispatchcloud_instances_time_from_shutdown_request_to_disappearance_seconds{quantile=\"0.9\"} 0.016427759\n" +
...     "arvados_dispatchcloud_instances_time_from_shutdown_request_to_disappearance_seconds{quantile=\"0.95\"} 0.017818548\n" +
...     "arvados_dispatchcloud_instances_time_from_shutdown_request_to_disappearance_seconds{quantile=\"0.99\"} 0.018422408\n" +
...     "arvados_dispatchcloud_instances_time_from_shutdown_request_to_disappearance_seconds_sum 0.5827228839999999\n" +
...     "arvados_dispatchcloud_instances_time_from_shutdown_request_to_disappearance_seconds_count 62\n" +
...     "# HELP arvados_dispatchcloud_instances_time_to_ready_for_container_seconds Number of seconds between the first successful SSH connection and ready to run a container.\n" +
...     "# TYPE arvados_dispatchcloud_instances_time_to_ready_for_container_seconds summary\n" +
...     "arvados_dispatchcloud_instances_time_to_ready_for_container_seconds{quantile=\"0.5\"} 0.007698769\n" +
...     "arvados_dispatchcloud_instances_time_to_ready_for_container_seconds{quantile=\"0.9\"} 0.018579038\n" +
...     "arvados_dispatchcloud_instances_time_to_ready_for_container_seconds{quantile=\"0.95\"} 0.021335438\n" +
...     "arvados_dispatchcloud_instances_time_to_ready_for_container_seconds{quantile=\"0.99\"} 0.036827616\n" +
...     "arvados_dispatchcloud_instances_time_to_ready_for_container_seconds_sum 0.4455547920000001\n" +
...     "arvados_dispatchcloud_instances_time_to_ready_for_container_seconds_count 49\n" +
...     "# HELP arvados_dispatchcloud_instances_time_to_ssh_seconds Number of seconds between instance creation and the first successful SSH connection.\n" +
...     "# TYPE arvados_dispatchcloud_instances_time_to_ssh_seconds summary\n" +
...     "arvados_dispatchcloud_instances_time_to_ssh_seconds{quantile=\"0.5\"} 0.017542368\n" +
...     "arvados_dispatchcloud_instances_time_to_ssh_seconds{quantile=\"0.9\"} 0.026203417\n" +
...     "arvados_dispatchcloud_instances_time_to_ssh_seconds{quantile=\"0.95\"} 0.028872557\n" +
...     "arvados_dispatchcloud_instances_time_to_ssh_seconds{quantile=\"0.99\"} 0.033338397\n" +
...     "arvados_dispatchcloud_instances_time_to_ssh_seconds_sum 1.1238655879999997\n" +
...     "arvados_dispatchcloud_instances_time_to_ssh_seconds_count 60\n" +
...     "# HELP arvados_dispatchcloud_instances_total Number of cloud VMs.\n" +
...     "# TYPE arvados_dispatchcloud_instances_total gauge\n" +
...     "arvados_dispatchcloud_instances_total{category=\"booting\",instance_type=\"type1\"} 0\n" +
...     "arvados_dispatchcloud_instances_total{category=\"booting\",instance_type=\"type16\"} 0\n" +
...     "arvados_dispatchcloud_instances_total{category=\"booting\",instance_type=\"type2\"} 0\n" +
...     "arvados_dispatchcloud_instances_total{category=\"booting\",instance_type=\"type3\"} 0\n" +
...     "arvados_dispatchcloud_instances_total{category=\"booting\",instance_type=\"type4\"} 0\n" +
...     "arvados_dispatchcloud_instances_total{category=\"booting\",instance_type=\"type6\"} 0\n" +
...     "arvados_dispatchcloud_instances_total{category=\"booting\",instance_type=\"type8\"} 0\n" +
...     "arvados_dispatchcloud_instances_total{category=\"hold\",instance_type=\"type1\"} 0\n" +
...     "arvados_dispatchcloud_instances_total{category=\"hold\",instance_type=\"type16\"} 0\n" +
...     "arvados_dispatchcloud_instances_total{category=\"hold\",instance_type=\"type2\"} 0\n" +
...     "arvados_dispatchcloud_instances_total{category=\"hold\",instance_type=\"type3\"} 0\n" +
...     "arvados_dispatchcloud_instances_total{category=\"hold\",instance_type=\"type4\"} 0\n" +
...     "arvados_dispatchcloud_instances_total{category=\"hold\",instance_type=\"type6\"} 0\n" +
...     "arvados_dispatchcloud_instances_total{category=\"hold\",instance_type=\"type8\"} 0\n" +
...     "arvados_dispatchcloud_instances_total{category=\"idle\",instance_type=\"type1\"} 0\n" +
...     "arvados_dispatchcloud_instances_total{category=\"idle\",instance_type=\"type16\"} 0\n" +
...     "arvados_dispatchcloud_instances_total{category=\"idle\",instance_type=\"type2\"} 1\n" +
...     "arvados_dispatchcloud_instances_total{category=\"idle\",instance_type=\"type3\"} 0\n" +
...     "arvados_dispatchcloud_instances_total{category=\"idle\",instance_type=\"type4\"} 0\n" +
...     "arvados_dispatchcloud_instances_total{category=\"idle\",instance_type=\"type6\"} 0\n" +
...     "arvados_dispatchcloud_instances_total{category=\"idle\",instance_type=\"type8\"} 0\n" +
...     "arvados_dispatchcloud_instances_total{category=\"inuse\",instance_type=\"type1\"} 0\n" +
...     "arvados_dispatchcloud_instances_total{category=\"inuse\",instance_type=\"type16\"} 0\n" +
...     "arvados_dispatchcloud_instances_total{category=\"inuse\",instance_type=\"type2\"} 0\n" +
...     "arvados_dispatchcloud_instances_total{category=\"inuse\",instance_type=\"type3\"} 0\n" +
...     "arvados_dispatchcloud_instances_total{category=\"inuse\",instance_type=\"type4\"} 0\n" +
...     "arvados_dispatchcloud_instances_total{category=\"inuse\",instance_type=\"type6\"} 0\n" +
...     "arvados_dispatchcloud_instances_total{category=\"inuse\",instance_type=\"type8\"} 0\n" +
...     "arvados_dispatchcloud_instances_total{category=\"unknown\",instance_type=\"type1\"} 0\n" +
...     "arvados_dispatchcloud_instances_total{category=\"unknown\",instance_type=\"type16\"} 0\n" +
...     "arvados_dispatchcloud_instances_total{category=\"unknown\",instance_type=\"type2\"} 0\n" +
...     "arvados_dispatchcloud_instances_total{category=\"unknown\",instance_type=\"type3\"} 0\n" +
...     "arvados_dispatchcloud_instances_total{category=\"unknown\",instance_type=\"type4\"} 0\n" +
...     "arvados_dispatchcloud_instances_total{category=\"unknown\",instance_type=\"type6\"} 0\n" +
...     "arvados_dispatchcloud_instances_total{category=\"unknown\",instance_type=\"type8\"} 0\n" +
...     "# HELP arvados_dispatchcloud_last_503_time Time of most recent 503 error received from API.\n" +
...     "# TYPE arvados_dispatchcloud_last_503_time gauge\n" +
...     "arvados_dispatchcloud_last_503_time 1.701292139e+09\n" +
...     "# HELP arvados_dispatchcloud_max_concurrent_containers Dynamically assigned limit on number of containers scheduled concurrency, set after receiving 503 errors from API.\n" +
...     "# TYPE arvados_dispatchcloud_max_concurrent_containers gauge\n" +
...     "arvados_dispatchcloud_max_concurrent_containers 6\n" +
...     "# HELP arvados_dispatchcloud_memory_bytes_total Total memory on all cloud VMs.\n" +
...     "# TYPE arvados_dispatchcloud_memory_bytes_total gauge\n" +
...     "arvados_dispatchcloud_memory_bytes_total{category=\"booting\"} 0\n" +
...     "arvados_dispatchcloud_memory_bytes_total{category=\"hold\"} 0\n" +
...     "arvados_dispatchcloud_memory_bytes_total{category=\"idle\"} 0\n" +
...     "arvados_dispatchcloud_memory_bytes_total{category=\"inuse\"} 0\n" +
...     "arvados_dispatchcloud_memory_bytes_total{category=\"unknown\"} 0\n" +
...     "# HELP arvados_dispatchcloud_probe_age_seconds_max Maximum number of seconds since an instance's most recent successful probe.\n" +
...     "# TYPE arvados_dispatchcloud_probe_age_seconds_max gauge\n" +
...     "arvados_dispatchcloud_probe_age_seconds_max 0\n" +
...     "# HELP arvados_dispatchcloud_probe_age_seconds_median Median number of seconds since an instance's most recent successful probe.\n" +
...     "# TYPE arvados_dispatchcloud_probe_age_seconds_median gauge\n" +
...     "arvados_dispatchcloud_probe_age_seconds_median 0\n" +
...     "# HELP arvados_dispatchcloud_vcpus_total Total VCPUs on all cloud VMs.\n" +
...     "# TYPE arvados_dispatchcloud_vcpus_total gauge\n" +
...     "arvados_dispatchcloud_vcpus_total{category=\"booting\"} 0\n" +
...     "arvados_dispatchcloud_vcpus_total{category=\"hold\"} 0\n" +
...     "arvados_dispatchcloud_vcpus_total{category=\"idle\"} 0\n" +
...     "arvados_dispatchcloud_vcpus_total{category=\"inuse\"} 0\n" +
...     "arvados_dispatchcloud_vcpus_total{category=\"unknown\"} 0\n" 
... regex string = "(?ms).*boot_outcomes{outcome=\"failure\"} [^0].*" 

Files

developer-run-tests-remainder-4133.log (1.33 MB) developer-run-tests-remainder-4133.log Brett Smith, 12/01/2023 05:05 PM
developer-run-tests-remainder-4175.log (1.31 MB) developer-run-tests-remainder-4175.log Full log exhibiting failure mode #2 Brett Smith, 12/28/2023 05:26 PM

Subtasks 2 (0 open2 closed)

Task #21270: Review 21258-flaky-adc-testResolvedTom Clegg12/06/2023Actions
Task #21352: Review 21258-flaky-adc-testResolvedTom Clegg12/06/2023Actions
Actions

Also available in: Atom PDF