Project

General

Profile

Actions

Bug #18264

closed

[CI] simplify the way we run the CWL tests

Added by Ward Vandewege over 2 years ago. Updated about 2 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
-
Target version:
Story points:
-
Release relationship:
Auto

Description

We currently run the CWL tests on our test clusters by launching a custom script on the main jenkins server which copies and runs a custom script to the shell node of the cluster. This is convoluted and error prone. Make some changes:

  • instead of running with -j1, increase the parallelism to whatever the target cluster can handle (the bottleneck is the machine that runs a-c-r !)
  • instead of relying on a shell node, just start a jenkins satellite with appropriate Arvados credentials for the target cluster and run the test suite that way
  • instead of having one CI job for the upstream CWL test suite and our Arvados CWL tests, make those 2 jobs and run them in parallel (if the target cluster can handle that)
  • if possible, instead of one (a pair of) CI jobs for each cluster, make a parameterized job that is launched with the appropriate parameters in the build pipeline for each cluster

Subtasks 1 (0 open1 closed)

Task #18271: reviewResolvedPeter Amstutz10/26/2021Actions

Related issues

Blocked by Arvados - Bug #18238: CWL integration test failingResolvedPeter AmstutzActions
Actions #1

Updated by Ward Vandewege over 2 years ago

  • Status changed from New to In Progress
Actions #2

Updated by Ward Vandewege over 2 years ago

  • Description updated (diff)
Actions #3

Updated by Ward Vandewege over 2 years ago

Ready for review at commit:e4376aca8fd1e81a03b8534cab6cbd07220c45b9 on branch 18264-cwl-testing in the arvados-dev repo

I've made an example of the corresponding CI changes that I'm going to set up in

https://ci.arvados.org/view/Developer/job/developer-diagnostics-9tee4/

That job has 2 downstream projects:

developer-run-tests-arvados-cwl
developer-run-tests-cwl-suite

which are invoked with the appropriate parameters (cluster_id 9tee4).

So; in the build pipeline, I'm planning to:

  • decommission run-cwl-test-9tee4
  • replace it with a copy of developer-run-tests-arvados-cwl and developer-run-tests-cwl-suite which will be invoked with $cluster_id set to 9tee4 and run in parallel
  • rinse and repeat for ce8i5 and tordo

At some point we can do the equivalent cleanup for the deploy-to-XXXXX and diagnostics-XXXXX CI jobs, and consolidate those into one job with a parameter.

Actions #4

Updated by Peter Amstutz over 2 years ago

  • Target version changed from 2021-10-13 sprint to 2021-10-27 sprint
Actions #5

Updated by Ward Vandewege over 2 years ago

A few more things:

  • we should be running the 1.2 version of the conformance tests (done as of commit:7e0e0601f5f20003db4e8955503edfc8e003dd8f on branch 18264-cwl-testing in the arvados-dev repo)
  • there are test failures in the 1.2 version of the conformance tests, due to a bug in a-c-r, cf. https://dev.arvados.org/issues/18238#note-6, waiting for a fix there
  • sort out the use of git_hash among all the jobs (sometimes it's the arvados repo hash - correct - sometimes it is the arvados-dev repo)
  • make sure the ci job installs (on the satellite node) the exact version of of the packages that corresponds to git_hash
Actions #6

Updated by Ward Vandewege over 2 years ago

  • Blocked by Bug #18238: CWL integration test failing added
Actions #7

Updated by Ward Vandewege over 2 years ago

Ward Vandewege wrote:

A few more things:

DONE * we should be running the 1.2 version of the conformance tests (done as of commit:7e0e0601f5f20003db4e8955503edfc8e003dd8f on branch 18264-cwl-testing in the arvados-dev repo)
DONE * there are test failures in the 1.2 version of the conformance tests, due to a bug in a-c-r, cf. https://dev.arvados.org/issues/18238#note-6, waiting for a fix there
DONE * sort out the use of git_hash among all the jobs (sometimes it's the arvados repo hash - correct - sometimes it is the arvados-dev repo)
DONE * make sure the ci job installs (on the satellite node) the exact version of of the packages that corresponds to git_hash

There are 2 branches ready to review:

repo commit branch
arvados-dev commit:7e0e0601f5f20003db4e8955503edfc8e003dd8f 18264-cwl-testing
arvados 1e8731c242c2e2926819e24856743d0ec7e70a56 18264-cwl-test-running-improvements

Example run for both CI jobs at

developer-run-tests-arvados-cwl: #16
developer-run-tests-cwl-suite: #33
Actions #8

Updated by Peter Amstutz over 2 years ago

  • Target version changed from 2021-10-27 sprint to 2021-11-10 sprint
Actions #9

Updated by Peter Amstutz over 2 years ago

There's a small improvement you can make to run-cwl-test-suite.sh

You can get rid of "arvados-cwl-runner-with-checksum.sh" and pass additional parameters using EXTRA, so this should work:

./run_test.sh -j$JOBS --timeout=900 RUNNER=arvados-cwl-runner EXTRA="--compute-checksum --disable-reuse --eval-timeout 60" -Sdocker_entrypoint

Actions #10

Updated by Peter Amstutz over 2 years ago

rest LGTM

Actions #11

Updated by Ward Vandewege over 2 years ago

Peter Amstutz wrote:

There's a small improvement you can make to run-cwl-test-suite.sh

You can get rid of "arvados-cwl-runner-with-checksum.sh" and pass additional parameters using EXTRA, so this should work:

./run_test.sh -j$JOBS --timeout=900 RUNNER=arvados-cwl-runner EXTRA="--compute-checksum --disable-reuse --eval-timeout 60" -Sdocker_entrypoint

Oh, excellent, running with that change at developer-run-tests-cwl-suite: #35

Actions #12

Updated by Ward Vandewege over 2 years ago

Peter Amstutz wrote:

rest LGTM

Thanks will merge with the simplification suggested. in note 9.

Actions #13

Updated by Ward Vandewege over 2 years ago

  • Status changed from In Progress to Resolved

Ward Vandewege wrote:

Peter Amstutz wrote:

rest LGTM

Thanks will merge with the simplification suggested in note 9.

Done; the CI pipeline changes have also been made.

Actions #14

Updated by Peter Amstutz about 2 years ago

  • Release set to 46
Actions

Also available in: Atom PDF