Project

General

Profile

Actions

Idea #20473

open

Automated scalability regression test

Added by Brett Smith 12 months ago. Updated 12 months ago.

Status:
New
Priority:
Normal
Assigned To:
-
Category:
Tests
Target version:
-
Story points:
-
Release:
Release relationship:
Auto

Description

Write at automated test that

  1. brings up an Arvados cluster
  2. submits a large work queue
  3. lets it run for some short timeā€”at least some containers should finish, but not all or even most of them
  4. checks logs and metrics of all services afterwards, and fails if any of the following appear:
    • 5xx responses from web services
    • containers being retried or other signs of Crunch thrashing
    • Crunch does not use maximum compute nodes available to it
    • Other signs of trouble in Prometheus (tbd: what?)

This test is not expected to run on every branch or even commit to main. Instead we run it when we're testing a branch that could have significant scalability consequences, or when we're preparing a major release.

Implementation details (we're less wedded to these): The basic idea is to spin up a middle-sized cloud node, deploy a single-node Arvados cluster onto it, and run the tests there. We can submit large workflows to generate large-record-size container requests, but all workflows and workflow steps should have tiny resource requirements, so we can run a lot of them on the same node. For example, maybe download a multi-GiB collection to a temporary directory, and then confirm its portable data hash.

The cluster should use the default configuration as much as possible. The only configuration values that should change are the ones that are necessarily tied to the capabilities of the underlying hardware, like MaxComputeVMs.


Related issues

Related to Arvados - Feature #14922: Run multiple containers concurrently on a single cloud VMNewActions
Actions #1

Updated by Brett Smith 12 months ago

  • Category set to Tests
  • Description updated (diff)
Actions #2

Updated by Brett Smith 12 months ago

  • Related to Feature #14922: Run multiple containers concurrently on a single cloud VM added
Actions

Also available in: Atom PDF