Bug #22579
closedHave most Jenkins jobs use a single shared image jenkins-image-arvados-tests
Description
Before #22489, the shell script(s) that we used to call from jenkins/packer-images/jenkins-image-with-docker.json
installed tools on the image so they could run the build/run-build-packages*.sh
scripts, and the build-packages-*
Jenkins jobs were configured to use build
images.
As part of #22489, I made sure these tools were installed from source:tools/ansible/install-test-env.yml — developers should be able to build packages as part of their work, after all. But this means the tools are no longer installed in the build
image, instead they're installed in the tests
image. After #22489 merged, build-packages-multijob
started failing because make
isn't installed on the image anymore.
For a quick fix, I have addressed the issue by reconfiguring these jobs to use the tests
images instead. In general, I don't understand why we have so many different images. The test-provision
images should be as small as possible to provide the best test of installer completeness, sure. Other than that exception, I don't see what we gain. The only technical reason to split up the images that I can see is if we want to keep the disk size of Jenkins instances down, and we need different sets of packages for different tasks to leave enough space available for work. That's at least understandable. But it would cost significant ops time and overhead to make that happen.
We should try to be clear about why we install different pieces of software on the image; e.g., what specific Jenkins job(s) need them. Ansible can help us structure our deployment in a way that makes that clearer. But that can be done in our devops code. We don't need to split the output images to make it clear.
To discuss: are we cool with just having all our internal development jobs run from the same image with all of the tools we need installed? Packer, Docker, Arvados development libraries, package build tools, etc.?
- If yes: merge Ansible playbooks, configure Jenkins to make that happen, retire the jobs to build other images.
- If no: Update
jenkins-arvados-packer.yml
to install the package build tools (make, createrepo-c, dpkg-dev) - Either way: Update the ops wiki that documents the different Packer builds with this information.
Updated by Brett Smith about 2 months ago
- Related to Feature #22489: Convert packer-build-jenkins-image-arvados-tests to use Ansible added
Updated by Brett Smith about 2 months ago
- Related to Feature #22559: Review, clean up the division of software on Jenkins images added
Updated by Brett Smith about 2 months ago
- Related to Feature #22560: Clean up Workbench 2 test image code added
Updated by Peter Amstutz about 2 months ago
Let's chat about it at standup, but I agree that having separate test and build images doesn't serve much purpose, especially since the actual actual building and packaging happen inside a docker container anyway.
Updated by Brett Smith about 2 months ago
Discussed and agreed at standup that yes, we are happy to have a single Jenkins image that has all the tools necessary for all jobs (test-provision excepted). I have updated the ops wiki page to be a little optimistic to describe where we want to be. We should opportunistically clean up our Jenkins configuration and arvados-dev to match.
Updated by Brett Smith about 2 months ago
- Subject changed from Jenkins image with Docker used to include tools for build-packages, doesn't anymore, now what? to Have most Jenkins jobs use a single shared image jenkins-image-arvados-tests
Updated by Brett Smith about 2 months ago
With more Jenkins jobs using the same tests
image, we are seeing more intermittent failures, probably because nodes are being reused more than they have been previously. At least one of those was clearly caused by running out of disk space build-packages-ubuntu2004: #2059 and it seems plausible that others were as well:
arvados-cwl-conformance-tests/label=tests,suite=conformance-v1.0: #1866
I have increased the disk size of Jenkins tests
nodes from 40 GB to 60. There are other things we can do to improve this situation, and I'm open to those, but they'll take some planning and implementation and it doesn't seem like a good trade-off to have more random job failures before those actually happen.
Updated by Brett Smith about 1 month ago
- Status changed from New to In Progress
arvados-dev branch 22579-jenkins-image-cleanups @ commit:7aa13bbed77bf9ceec5cab31ff74f321f941d076
New image build: packer-build-jenkins-image-arvados-tests: #115
Test run with the new image: developer-run-tests: #4668 - Note this tested the main branch, no changes to the arvados repo are required. Note that you can tell this is a very recent build because output from
run-tests.sh
reports having Go 1.23 very early during its checks, before it runs any install steps.
This branch removes the Ansible playbook, Packer template, and old shell scripts for jenkins-image-with-docker
and jenkins-image-workbench2-tests
(see #22560). The necessary functionality has been rolled into the jenkins-image-arvados-tests
Ansible playbook.
- All agreed upon points are implemented / addressed.
- Everything that can be done in the branch
- Anything not implemented (discovered or discussed during work) has a follow-up story.
- This ticket can remain open for the Jenkins configuration tasks that need to follow
- Code is tested and passing, both automated and manual, what manual testing was done is described
- See above
- Documentation has been updated.
- As noted earlier, I already updated the wiki page to reflect where we want to be. These changes bring us more in line with those goals.
- Behaves appropriately at the intended scale (describe intended scale).
- N/A
- Considered backwards and forwards compatibility issues between client and server.
- N/A
- Follows our coding standards and GUI style guidelines.
- N/A (no applicable style guide)
Updated by Lucas Di Pentima about 1 month ago
Note that the OPS wiki page linked from the README file is not accessible by all. This is OK by me since we're going to be moving the opsy code to a private repo, but just wanted to mention it in case you expected to be publicly readable.
LGTM.
Updated by Brett Smith about 1 month ago
Lucas Di Pentima wrote in #note-10:
Note that the OPS wiki page linked from the README file is not accessible by all. This is OK by me since we're going to be moving the opsy code to a private repo, but just wanted to mention it in case you expected to be publicly readable.
Nope, we're on the same page, I had the same thought since we're expecting to make it private soon anyway there was no reason not to do this. Thanks.
Updated by Brett Smith about 1 month ago
- Related to Support #22594: Remove unused Packer image builds from arvados-dev added
Updated by Brett Smith 28 days ago
- Status changed from In Progress to Resolved
I am going to go ahead and close this ticket. At this point, most jobs in active use have been reconfigured to use the tests
node. Any that we've missed can be cleaned up opportunistically. The wiki documents the state of things, including the old images, so there's clear instruction on what to do.