Project

General

Profile

Actions

Bug #22579

closed

Have most Jenkins jobs use a single shared image jenkins-image-arvados-tests

Added by Brett Smith about 2 months ago. Updated 28 days ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
CI
Target version:
Story points:
-

Description

Before #22489, the shell script(s) that we used to call from jenkins/packer-images/jenkins-image-with-docker.json installed tools on the image so they could run the build/run-build-packages*.sh scripts, and the build-packages-* Jenkins jobs were configured to use build images.

As part of #22489, I made sure these tools were installed from source:tools/ansible/install-test-env.yml — developers should be able to build packages as part of their work, after all. But this means the tools are no longer installed in the build image, instead they're installed in the tests image. After #22489 merged, build-packages-multijob started failing because make isn't installed on the image anymore.

For a quick fix, I have addressed the issue by reconfiguring these jobs to use the tests images instead. In general, I don't understand why we have so many different images. The test-provision images should be as small as possible to provide the best test of installer completeness, sure. Other than that exception, I don't see what we gain. The only technical reason to split up the images that I can see is if we want to keep the disk size of Jenkins instances down, and we need different sets of packages for different tasks to leave enough space available for work. That's at least understandable. But it would cost significant ops time and overhead to make that happen.

We should try to be clear about why we install different pieces of software on the image; e.g., what specific Jenkins job(s) need them. Ansible can help us structure our deployment in a way that makes that clearer. But that can be done in our devops code. We don't need to split the output images to make it clear.

To discuss: are we cool with just having all our internal development jobs run from the same image with all of the tools we need installed? Packer, Docker, Arvados development libraries, package build tools, etc.?

  • If yes: merge Ansible playbooks, configure Jenkins to make that happen, retire the jobs to build other images.
  • If no: Update jenkins-arvados-packer.yml to install the package build tools (make, createrepo-c, dpkg-dev)
  • Either way: Update the ops wiki that documents the different Packer builds with this information.

Subtasks 1 (0 open1 closed)

Task #22591: Review arvados-dev branch 22579-jenkins-image-cleanupsResolvedLucas Di Pentima02/18/2025Actions

Related issues 4 (0 open4 closed)

Related to Arvados - Feature #22489: Convert packer-build-jenkins-image-arvados-tests to use AnsibleResolvedBrett SmithActions
Related to Arvados - Feature #22559: Review, clean up the division of software on Jenkins imagesRejectedActions
Related to Arvados - Feature #22560: Clean up Workbench 2 test image codeResolvedBrett SmithActions
Related to Arvados - Support #22594: Remove unused Packer image builds from arvados-devResolvedBrett SmithActions
Actions #1

Updated by Brett Smith about 2 months ago

  • Related to Feature #22489: Convert packer-build-jenkins-image-arvados-tests to use Ansible added
Actions #2

Updated by Brett Smith about 2 months ago

  • Related to Feature #22559: Review, clean up the division of software on Jenkins images added
Actions #3

Updated by Brett Smith about 2 months ago

  • Related to Feature #22560: Clean up Workbench 2 test image code added
Actions #4

Updated by Peter Amstutz about 2 months ago

Let's chat about it at standup, but I agree that having separate test and build images doesn't serve much purpose, especially since the actual actual building and packaging happen inside a docker container anyway.

Actions #5

Updated by Brett Smith about 2 months ago

Discussed and agreed at standup that yes, we are happy to have a single Jenkins image that has all the tools necessary for all jobs (test-provision excepted). I have updated the ops wiki page to be a little optimistic to describe where we want to be. We should opportunistically clean up our Jenkins configuration and arvados-dev to match.

Actions #6

Updated by Brett Smith about 2 months ago

  • Subject changed from Jenkins image with Docker used to include tools for build-packages, doesn't anymore, now what? to Have most Jenkins jobs use a single shared image jenkins-image-arvados-tests
Actions #7

Updated by Brett Smith about 2 months ago

With more Jenkins jobs using the same tests image, we are seeing more intermittent failures, probably because nodes are being reused more than they have been previously. At least one of those was clearly caused by running out of disk space build-packages-ubuntu2004: #2059 and it seems plausible that others were as well:

run-tests-remainder: #4976

arvados-cwl-conformance-tests/label=tests,suite=conformance-v1.0: #1866

I have increased the disk size of Jenkins tests nodes from 40 GB to 60. There are other things we can do to improve this situation, and I'm open to those, but they'll take some planning and implementation and it doesn't seem like a good trade-off to have more random job failures before those actually happen.

Actions #8

Updated by Brett Smith about 1 month ago

  • Status changed from New to In Progress

arvados-dev branch 22579-jenkins-image-cleanups @ commit:7aa13bbed77bf9ceec5cab31ff74f321f941d076

New image build: packer-build-jenkins-image-arvados-tests: #115

Test run with the new image: developer-run-tests: #4668 - Note this tested the main branch, no changes to the arvados repo are required. Note that you can tell this is a very recent build because output from run-tests.sh reports having Go 1.23 very early during its checks, before it runs any install steps.

This branch removes the Ansible playbook, Packer template, and old shell scripts for jenkins-image-with-docker and jenkins-image-workbench2-tests (see #22560). The necessary functionality has been rolled into the jenkins-image-arvados-tests Ansible playbook.

  • All agreed upon points are implemented / addressed.
    • Everything that can be done in the branch
  • Anything not implemented (discovered or discussed during work) has a follow-up story.
    • This ticket can remain open for the Jenkins configuration tasks that need to follow
  • Code is tested and passing, both automated and manual, what manual testing was done is described
    • See above
  • Documentation has been updated.
    • As noted earlier, I already updated the wiki page to reflect where we want to be. These changes bring us more in line with those goals.
  • Behaves appropriately at the intended scale (describe intended scale).
    • N/A
  • Considered backwards and forwards compatibility issues between client and server.
    • N/A
  • Follows our coding standards and GUI style guidelines.
    • N/A (no applicable style guide)
Actions #9

Updated by Brett Smith about 1 month ago

  • Subtask #22591 added
Actions #10

Updated by Lucas Di Pentima about 1 month ago

Note that the OPS wiki page linked from the README file is not accessible by all. This is OK by me since we're going to be moving the opsy code to a private repo, but just wanted to mention it in case you expected to be publicly readable.

LGTM.

Actions #11

Updated by Brett Smith about 1 month ago

Lucas Di Pentima wrote in #note-10:

Note that the OPS wiki page linked from the README file is not accessible by all. This is OK by me since we're going to be moving the opsy code to a private repo, but just wanted to mention it in case you expected to be publicly readable.

Nope, we're on the same page, I had the same thought since we're expecting to make it private soon anyway there was no reason not to do this. Thanks.

Actions #12

Updated by Brett Smith about 1 month ago

  • Related to Support #22594: Remove unused Packer image builds from arvados-dev added
Actions #13

Updated by Brett Smith 28 days ago

  • Status changed from In Progress to Resolved

I am going to go ahead and close this ticket. At this point, most jobs in active use have been reconfigured to use the tests node. Any that we've missed can be cleaned up opportunistically. The wiki documents the state of things, including the old images, so there's clear instruction on what to do.

Actions

Also available in: Atom PDF