Bug #18264
closed[CI] simplify the way we run the CWL tests
Description
We currently run the CWL tests on our test clusters by launching a custom script on the main jenkins server which copies and runs a custom script to the shell node of the cluster. This is convoluted and error prone. Make some changes:
- instead of running with -j1, increase the parallelism to whatever the target cluster can handle (the bottleneck is the machine that runs a-c-r !)
- instead of relying on a shell node, just start a jenkins satellite with appropriate Arvados credentials for the target cluster and run the test suite that way
- instead of having one CI job for the upstream CWL test suite and our Arvados CWL tests, make those 2 jobs and run them in parallel (if the target cluster can handle that)
- if possible, instead of one (a pair of) CI jobs for each cluster, make a parameterized job that is launched with the appropriate parameters in the build pipeline for each cluster
Updated by Ward Vandewege over 3 years ago
- Status changed from New to In Progress
Updated by Ward Vandewege over 3 years ago
Ready for review at commit:e4376aca8fd1e81a03b8534cab6cbd07220c45b9 on branch 18264-cwl-testing in the arvados-dev repo
I've made an example of the corresponding CI changes that I'm going to set up in
https://ci.arvados.org/view/Developer/job/developer-diagnostics-9tee4/
That job has 2 downstream projects:
developer-run-tests-arvados-cwl
developer-run-tests-cwl-suite
which are invoked with the appropriate parameters (cluster_id 9tee4).
So; in the build pipeline, I'm planning to:
- decommission run-cwl-test-9tee4
- replace it with a copy of developer-run-tests-arvados-cwl and developer-run-tests-cwl-suite which will be invoked with $cluster_id set to 9tee4 and run in parallel
- rinse and repeat for ce8i5 and tordo
At some point we can do the equivalent cleanup for the deploy-to-XXXXX and diagnostics-XXXXX CI jobs, and consolidate those into one job with a parameter.
Updated by Peter Amstutz over 3 years ago
- Target version changed from 2021-10-13 sprint to 2021-10-27 sprint
Updated by Ward Vandewege over 3 years ago
A few more things:
- we should be running the 1.2 version of the conformance tests (done as of commit:7e0e0601f5f20003db4e8955503edfc8e003dd8f on branch 18264-cwl-testing in the arvados-dev repo)
- there are test failures in the 1.2 version of the conformance tests, due to a bug in a-c-r, cf. https://dev.arvados.org/issues/18238#note-6, waiting for a fix there
- sort out the use of git_hash among all the jobs (sometimes it's the arvados repo hash - correct - sometimes it is the arvados-dev repo)
- make sure the ci job installs (on the satellite node) the exact version of of the packages that corresponds to git_hash
Updated by Ward Vandewege over 3 years ago
- Blocked by Bug #18238: CWL integration test failing added
Updated by Ward Vandewege over 3 years ago
Ward Vandewege wrote:
A few more things:
DONE * we should be running the 1.2 version of the conformance tests (done as of commit:7e0e0601f5f20003db4e8955503edfc8e003dd8f on branch 18264-cwl-testing in the arvados-dev repo)
DONE * there are test failures in the 1.2 version of the conformance tests, due to a bug in a-c-r, cf. https://dev.arvados.org/issues/18238#note-6, waiting for a fix there
DONE * sort out the use of git_hash among all the jobs (sometimes it's the arvados repo hash - correct - sometimes it is the arvados-dev repo)
DONE * make sure the ci job installs (on the satellite node) the exact version of of the packages that corresponds to git_hash
There are 2 branches ready to review:
repo | commit | branch |
---|---|---|
arvados-dev | commit:7e0e0601f5f20003db4e8955503edfc8e003dd8f | 18264-cwl-testing |
arvados | 1e8731c242c2e2926819e24856743d0ec7e70a56 | 18264-cwl-test-running-improvements |
Example run for both CI jobs at
developer-run-tests-arvados-cwl: #16 |
developer-run-tests-cwl-suite: #33 |
Updated by Peter Amstutz over 3 years ago
- Target version changed from 2021-10-27 sprint to 2021-11-10 sprint
Updated by Peter Amstutz over 3 years ago
There's a small improvement you can make to run-cwl-test-suite.sh
You can get rid of "arvados-cwl-runner-with-checksum.sh" and pass additional parameters using EXTRA, so this should work:
./run_test.sh -j$JOBS --timeout=900 RUNNER=arvados-cwl-runner EXTRA="--compute-checksum --disable-reuse --eval-timeout 60" -Sdocker_entrypoint
Updated by Ward Vandewege over 3 years ago
Peter Amstutz wrote:
There's a small improvement you can make to run-cwl-test-suite.sh
You can get rid of "arvados-cwl-runner-with-checksum.sh" and pass additional parameters using EXTRA, so this should work:
./run_test.sh -j$JOBS --timeout=900 RUNNER=arvados-cwl-runner EXTRA="--compute-checksum --disable-reuse --eval-timeout 60" -Sdocker_entrypoint
Oh, excellent, running with that change at developer-run-tests-cwl-suite: #35
Updated by Ward Vandewege over 3 years ago
Peter Amstutz wrote:
rest LGTM
Thanks will merge with the simplification suggested. in note 9.
Updated by Ward Vandewege over 3 years ago
- Status changed from In Progress to Resolved
Ward Vandewege wrote:
Peter Amstutz wrote:
rest LGTM
Thanks will merge with the simplification suggested in note 9.
Done; the CI pipeline changes have also been made.