Project

General

Profile

Actions

Idea #20846

closed

Support Ubuntu 22.04 LTS

Added by Brett Smith over 1 year ago. Updated 7 months ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
Deployment
Start date:
10/30/2023
Due date:
Story points:
2.0
Release:
Release relationship:
Auto

Description

(This was originally ticket #19213, which accidentally got deleted.)

  • build packages
  • test packages
  • set up package repository
  • test with installer formula

For future reference

Lucas: I started working on this before needing to switch to more urgent tasks, the last commit is 0489e69 (branch 19213-ubuntu2204-support)
There were some packages failing the test phase.

Brett: The likely reason the packages are failing is because of inconsistent OpenSSL versions. Ubuntu 22.04 ships with OpenSSL 3.0. Ruby 2.7 does not support this version, only 1.1. Lucas' branch builds a custom OpenSSL, but then every C library you pull in has to be linked against that custom version, like libpq. I haven't checked but I expect tests are failing because Ruby segfaults when it pulls in both openssl and pg trying to connect to the database.

We've decided #20300 is a blocker. You can build the whole Ruby 2.7 wiki stack on OpenSSL distributions; see this wiki page. But we currently don't package Ruby, so we don't have a way to ship that to users, and we should upgrade Rails/Ruby for other reasons anyway.


Files

20846-package-log.txt (574 KB) 20846-package-log.txt Tom Clegg, 10/25/2023 02:20 PM

Subtasks 5 (0 open5 closed)

Task #21155: Review 20846-ruby3ResolvedBrett Smith10/30/2023Actions
Task #21162: Review 20846-ruby3-compatResolvedTom Clegg10/30/2023Actions
Task #21222: Review 20846-ubuntu2204ResolvedTom Clegg11/28/2023Actions
Task #21231: Brett to take overResolvedBrett Smith10/30/2023Actions
Task #21335: Review 20846-package-build-fixesResolvedBrett Smith01/05/2024Actions

Related issues

Related to Arvados - Idea #21146: Replace ws4py dependency from PySDKResolvedBrett Smith12/01/2023Actions
Related to Arvados Epics - Idea #17001: Arvados uses WB2 by defaultResolvedActions
Related to Arvados - Bug #21169: Fix deprecated ERB usage in account setup email viewResolvedPeter AmstutzActions
Related to Arvados - Idea #20690: Remove workbench 1 from main branch !!!!ResolvedTom Clegg11/20/2023Actions
Related to Arvados - Idea #21230: Remove usage of global "pip install" in package build/test scriptsResolvedBrett Smith01/12/2024Actions
Related to Arvados - Feature #21383: Update Salt installer to support Debian 12ResolvedBrett SmithActions
Related to Arvados - Feature #21388: Update list of supported distributions everywhereResolvedBrett SmithActions
Related to Arvados - Feature #21389: Update arvados/jobs Docker image to Debian 12NewActions
Related to Arvados - Bug #21390: Update arvados/dev-jobs Docker image to Debian 12NewActions
Related to Arvados - Feature #21391: Update arvbox to Debian 12NewActions
Related to Arvados - Idea #21453: Install Python package virtualenvs under /usr/lib/PKGNAMEResolvedBrett Smith02/13/2024Actions
Related to Arvados - Idea #21454: Update required_ruby_version in all our gemspecsResolvedBrett SmithActions
Blocked by Arvados - Idea #20300: RailsAPI upgrade from 5.2 to 7.0ResolvedTom Clegg10/06/2023Actions
Actions #1

Updated by Brett Smith over 1 year ago

  • Blocked by Idea #20300: RailsAPI upgrade from 5.2 to 7.0 added
Actions #2

Updated by Peter Amstutz about 1 year ago

  • Target version changed from To be scheduled to Development 2023-10-25 sprint
Actions #3

Updated by Peter Amstutz about 1 year ago

  • Target version changed from Development 2023-10-25 sprint to Development 2023-10-11 sprint
Actions #4

Updated by Peter Amstutz about 1 year ago

  • Target version changed from Development 2023-10-11 sprint to Development 2023-10-25 sprint
Actions #5

Updated by Peter Amstutz about 1 year ago

  • Target version changed from Development 2023-10-25 sprint to Development 2023-11-08 sprint
Actions #6

Updated by Tom Clegg about 1 year ago

20846-ruby3 @ f75a35d375ec8e9ae0160d2a847d96e96c2974e7

Some preliminary work on Ruby 3.
  • `arvados-server install` chooses Ruby 3.2.2, and adds a couple of new dependency packages
  • source:services/api and source:sdk/ruby have a few syntax updates to make tests pass

on debian:12 `arvados-server install` fails on Python trouble, pretty sure it's because ws4py is going stale.

on ubuntu:22.04 `arvados-server install` warns on python/ws4py trouble, but continues, and then fails on another Python thing while installing source:services/fuse:

go run ./cmd/arvados-package build -package-version=2.7.1rc1 -target-os ubuntu:22.04

...

Moving llfuse-1.5.0-py3.10-linux-x86_64.egg to /var/lib/arvados/lib/python/lib/python3.10/site-packages
Adding llfuse 1.5.0 to easy-install.pth file

Installed /var/lib/arvados/lib/python/lib/python3.10/site-packages/llfuse-1.5.0-py3.10-linux-x86_64.egg
error: setuptools 59.6.0 is installed but setuptools>=62.4.0 is required by {'python-daemon'}
Actions #7

Updated by Tom Clegg about 1 year ago

  • Status changed from New to In Progress
Actions #8

Updated by Tom Clegg about 1 year ago

The same setuptools problem quoted above also seems to happen on debian:11.

Actions #10

Updated by Brett Smith about 1 year ago

  • Related to Idea #21146: Replace ws4py dependency from PySDK added
Actions #11

Updated by Peter Amstutz about 1 year ago

  • Assigned To set to Brett Smith
Actions #12

Updated by Tom Clegg about 1 year ago

20846-ruby3 @ 067f70b50f31c37175a840e7e4e344983c468d10

Fixes setuptools issue, but there are still warnings about ws4py:

  File "build/bdist.linux-x86_64/egg/ws4py/async_websocket.py", line 87
    asyncio.async(closeit())
            ^^^^^
SyntaxError: invalid syntax

...and arvados-server install errors out while trying to install workbench1, which isn't surprising. Need to remove wb1 from lib/install.

20846-ruby3 @ 9ddc63bf01d4603bb373957ce9649da50e7ecd55

Actions #13

Updated by Brett Smith about 1 year ago

The reason my local virtualenv works and this branch doesn't is because I installed with pip and the branch installs with setup.py install, which is deprecated and takes a worse codepath. From the transcript:

+ for src in "/arvados/sdk/python" "/arvados/services/fuse" 
+ rsync -a --delete-after /arvados/sdk/python/ /var/lib/arvados/tmp/python/
+ cd /var/lib/arvados/tmp/python
+ python3 setup.py install
running install
/var/lib/arvados/lib/python/lib/python3.10/site-packages/setuptools/command/install.py:34: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools.

Using the newly-reserved async keyword happens in ws4py.async_websocket. If you install with pip+wheel, this module never gets processed and so we sidestep the problem. If you try to actually run it, like setup.py install does, you run into trouble:

Python 3.11.2 (main, Mar 13 2023, 12:18:29) [GCC 12.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import ws4py
>>> import ws4py.async_websocket
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/arvados/arvenv/lib/python3.11/site-packages/ws4py/async_websocket.py", line 87
    asyncio.async(closeit())
            ^^^^^
SyntaxError: invalid syntax

The branch should be updated so this install code follows a pattern similar to 20432a4533136a5ab9fa52c2e2ec2d90a855ecfb: run setup.py build if necessary (shouldn't hurt), then install with pip install PATH.

Actions #14

Updated by Tom Clegg about 1 year ago

Fixes python usage ("setup.py build" + "pip3 install $path"), and an unrelated workbench2 build issue:

20846-ruby3 @ b24f2530ae41644a7eb9cfe28679182d76468737

Actions #15

Updated by Tom Clegg about 1 year ago

Here's a branch with just the compatibility changes to our Ruby code that I needed to make tests pass in Ruby 3. I think it makes sense to merge this first, instead of simultaneously with the installer changes.

20846-ruby3-compat @ 88e18b7b9826b05e8485a6a99593ecda850969d7 -- developer-run-tests: #3879

Actions #16

Updated by Brett Smith about 1 year ago

Tom Clegg wrote in #note-15:

Here's a branch with just the compatibility changes to our Ruby code that I needed to make tests pass in Ruby 3. I think it makes sense to merge this first, instead of simultaneously with the installer changes.

This all looks fine. Just to make sure I'm following correctly, I believe this branch still maintains Ruby 2.7 compatibility, right? It's just making the keyword argument changes and gem updates necessary to also work on Ruby 3.0, yes?

Since I don't think Jenkins did it, can you please describe how you tested Ruby 3.0 compatibility? Did you run tests on a system where /var/lib/arvados/bin/ruby was Ruby 3.0? Did that run include the Ruby SDK tests?

Assuming I've got all that right, I think this is basically good to merge. One nit, in services/api/test/test_helper.rb in the last commit:

-    define_method method do |action, *args|
+    define_method method do |action, **args|

It seems like it would be good to rename args to kwargs here, since that's what it really is, and having a keyword argument hash named args is surprising. I know it makes a bigger diff but it seems like the right thing to do for readability long-term.

Thanks.

Actions #17

Updated by Tom Clegg about 1 year ago

Brett Smith wrote in #note-16:

This all looks fine. Just to make sure I'm following correctly, I believe this branch still maintains Ruby 2.7 compatibility, right? It's just making the keyword argument changes and gem updates necessary to also work on Ruby 3.0, yes?

Yes. Except that I just tried running tests on Ruby 3 again, and
  • this time I got an interesting new failure mode:
    + bin/rails db:environment:set
    rails aborted!
    LoadError: linked to incompatible 0W\226\322U - /home/tom/.gem/ruby/3.2.0/gems/ruby-prof-1.6.3/lib/ruby_prof.so
    ...
    + bin/rake db:setup
    rake aborted!
    LoadError: linked to incompatible `     nF\211U - /home/tom/.gem/ruby/3.2.0/gems/ruby-prof-1.6.3/lib/ruby_prof.so
    

    ...fixed (?) by removing $HOME/.gem/ruby/3.2.0/ and hitting install services/api again in run-tests.sh (maybe restarting run-tests.sh would have been enough, but I didn't think to try it first)
  • some tests failed, fixed in 14c8fb3d5a

Since I don't think Jenkins did it, can you please describe how you tested Ruby 3.0 compatibility? Did you run tests on a system where /var/lib/arvados/bin/ruby was Ruby 3.0?

I ran arvados-server install -type test from 20846-ruby3 at b24f2530ae to get Ruby 3.2.2 installed at /var/lib/arvados/bin/ruby, then switched to this branch and ran tests using run-tests.sh interactive mode.

Did that run include the Ruby SDK tests?

Huh, apparently not. sdk/ruby-google-api-client, sdk/ruby, and sdk/cli now pass though, after some fixes in c51e59e03b.

It seems like it would be good to rename args to kwargs here

Indeed. Fixed.

20846-ruby3-compat @ c51e59e03bc721de2837db7958415766bd7b46c8 -- developer-run-tests: #3882

Actions #18

Updated by Brett Smith about 1 year ago

Tom Clegg wrote in #note-17:

20846-ruby3-compat @ c51e59e03bc721de2837db7958415766bd7b46c8 -- developer-run-tests: #3882

Following the same process as you (except I just did install deps before tests), I also installed Ruby 3 and I think this all passes for me too. (The failures I mentioned in the meeting are because I was still on the installer branch, my bad.)

I do get a couple of consistent deprecation errors. They don't need to be fixed in this branch, but if they don't get fixed in this ticket, they should get a follow-up story:

/home/brett/Curii/arvados/services/api/app/views/user_notifier/account_is_setup.text.erb:5: warning: Passing safe_level with the 2nd argument of ERB.new is deprecated. Do not use it, and specify other arguments as keyword arguments.
/home/brett/Curii/arvados/services/api/app/views/user_notifier/account_is_setup.text.erb:5: warning: Passing trim_mode with the 3rd argument of ERB.new is deprecated. Use keyword argument like ERB.new(str, trim_mode: ...) instead.

Either way, this looks good to me. Thanks.

Actions #19

Updated by Brett Smith about 1 year ago

  • Assigned To changed from Brett Smith to Tom Clegg
Actions #20

Updated by Tom Clegg about 1 year ago

  • Related to Idea #17001: Arvados uses WB2 by default added
Actions #21

Updated by Tom Clegg about 1 year ago

  • Related to Bug #21169: Fix deprecated ERB usage in account setup email view added
Actions #22

Updated by Tom Clegg about 1 year ago

Moved the ERB usage issue to #21169.

Actions #23

Updated by Tom Clegg about 1 year ago

20846-ruby3 (install/boot/deps) branch rebased on main after 20846-ruby3-compat merge:

20846-ruby3 @ b77707a7b06d57145a7829458d476baf8573317e

Actions #24

Updated by Tom Clegg about 1 year ago

  • Related to Idea #20690: Remove workbench 1 from main branch !!!! added
Actions #25

Updated by Peter Amstutz about 1 year ago

  • Target version changed from Development 2023-11-08 sprint to Development 2023-11-29 sprint
Actions #26

Updated by Tom Clegg about 1 year ago

arvados-server install now lets you specify an alternate Ruby version, so we will be able to use the latest arvados version to build two (or more) Jenkins images to confirm future changes pass tests on both Ruby versions.

Of course, debian12+ruby2.7.7 still doesn't work because of the openssl thing. But both should work on debian11.

Usage: arvados-server install [options] 
  -bundler-version version
        Bundler version to install (do not override in production mode) (default "2.2.19")
  -commit hash
        source commit hash to embed (blank means use 'git log' or all-zero placeholder)
  -eatmydata
        use eatmydata to speed up install
  -nodejs-version version
        Nodejs version to install (not applicable in production mode) (default "v12.22.12")
  -package-version string
        version string to embed in executable files (default "0.0.0")
  -ruby-version version
        Ruby version to install (do not override in production mode) (default "3.2.2")
  -singularity-version version
        Singularity version to install (do not override in production mode) (default "3.10.4")
  -source string
        source tree location (required for -type=package) (default "/arvados")
  -type type
        cluster type: development, test, production, or package (default "production")
  -version
        Write version information to stdout and exit 0

20846-ruby3 @ 9fa5faed898bf23fcea8f4e7946e540473e42e08 -- developer-run-tests: #3904

retry developer-run-tests-sdk-python-ruby: #3230

Actions #27

Updated by Brett Smith about 1 year ago

Tom Clegg wrote in #note-26:

20846-ruby3 @ 9fa5faed898bf23fcea8f4e7946e540473e42e08 -- developer-run-tests: #3904

I just want to make one background thing explicit to make sure we're on the same page: because this branch removes install and boot support for Workbench 1, it's not slated for a 2.7.x release, only 3.0. If you think that too, then we're good, there's no problem there.

My only other comment would be a UI thing about the version arguments. It feels like it would be frustrating that you're expected to prefix the NodeJS version with a v, and there's nothing in the help output or anything to tell you that. I understand the argument that we're just mirroring what NodeJS does, but I don't think anyone who isn't staring at this code deeply would think about whether the v is actually part of the version number, or just a marker. I think I would suggest that the validation for any version number we feel less sure about the format of be ^\d+(\.\d+)*$, and then the code should add the v prefix as necessary for the NodeJS version. Then everything's pretty consistent from a UI standpoint. That said, if you disagree, I wouldn't hold up a merge over it or anything.

Thanks.

Actions #28

Updated by Tom Clegg about 1 year ago

Brett Smith wrote in #note-27:

I just want to make one background thing explicit to make sure we're on the same page: because this branch removes install and boot support for Workbench 1, it's not slated for a 2.7.x release, only 3.0. If you think that too, then we're good, there's no problem there.

Confirmed @ standup that Peter will be cherry-picking commits to make 2.7.1, so we're good to merge 3.0 branches to main.

the code should add the v prefix as necessary for the NodeJS version

Yeah, that makes more sense. Fixed.

Actions #29

Updated by Tom Clegg almost 1 year ago

20846-ubuntu2204 @ a77d65e098bc014d05c4c16cc14c5baa00afdd68

  • adds ubuntu2204 target (from Lucas's 19213 branch)
  • updates ruby to 3.2.2 so it doesn't fail on the openssl issue
  • updates fpm (on all platforms) to a ruby3-compatible version
  • adds debian12 target (since it's also unblocked by ruby 3)
  • skips workbench1 packaging because that fails with ruby 3 (#20690 is about to remove it anyway)
Actions #30

Updated by Tom Clegg 12 months ago

20846-ubuntu2204 @ f7fe711598ccfbf7e5f35e959507fcab6fd62bd5
  • update Ruby install recipes (OS, RVM)
  • recommend OS over RVM unless OS package is too old
  • remove "install from source" option
  • add ubuntu 22.04 and debian 12 to list of supported distributions

Should we avoid merging that last item until we merge the earlier changes and update jenkins to actually start publishing the packages?

Actions #31

Updated by Tom Clegg 12 months ago

Backed out the "add ubuntu 22.04 and debian 12 to list of supported distributions" commit and left it in a separate branch (20846-document-2204-support) to revisit after we have successfully auto-published the packages.

20846-ubuntu2204 @ b4ebaa2edbd67c695ea23f89e74c946b7f4eb221

Actions #32

Updated by Brett Smith 12 months ago

Tom Clegg wrote in #note-31:

20846-ubuntu2204 @ b4ebaa2edbd67c695ea23f89e74c946b7f4eb221

  • build/README notes that to add a new target, you must "Update the package download code near the bottom of `test_package_presence` in `run-library.sh` so it can download packages for the new distribution." Please add the new distros.
  • Re the Python build changes in build/run-library.sh: The issue you noted in comments is going to keep coming up; see #20543. The Python standard library has included venv and ensurepip since 3.4, so the modern way to handle things would be to install Python; python3 -m venv VENVDIR; and then do everything inside the virtualenv, including installing setuptools (and upgrading pip if desired). I believe this should work on all our supported distributions.
    If you want to punt this, that's fine, but please make a follow-up story. Admittedly there's probably a lot of modernization we could do in our Python build stuff.
  • In doc/_includes/_install_ruby_and_bundler.liquid, h4. Alma/CentOS/Red Hat 7: There isn't an Alma 7 either, so you can drop that along with Rocky.
  • In the Makefiles, hardcoding make --jobs 8 seems mildly regressive, especially since we still count processors everywhere else. I don't feel too strongly about it, but what was your rationale for this change?

Thanks.

Actions #33

Updated by Tom Clegg 12 months ago

  • Related to Idea #21230: Remove usage of global "pip install" in package build/test scripts added
Actions #34

Updated by Tom Clegg 12 months ago

  • Target version changed from Development 2023-11-29 sprint to Development 2024-01-03 sprint
Actions #35

Updated by Tom Clegg 12 months ago

Brett Smith wrote in #note-32:

  • build/README notes that to add a new target, you must "Update the package download code near the bottom of `test_package_presence` in `run-library.sh` so it can download packages for the new distribution." Please add the new distros.

Oops, added.

  • Re the Python build changes in build/run-library.sh: The issue you noted in comments is going to keep coming up; see #20543. The Python standard library has included venv and ensurepip since 3.4, so the modern way to handle things would be to install Python; python3 -m venv VENVDIR; and then do everything inside the virtualenv, including installing setuptools (and upgrading pip if desired). I believe this should work on all our supported distributions.
    If you want to punt this, that's fine, but please make a follow-up story. Admittedly there's probably a lot of modernization we could do in our Python build stuff.

I was trying to minimize the "while we're here, let's do x" since adding 2204/12 already requires touching a lot of things. Added #21230 and put it on next sprint so we don't lose it.

  • In doc/_includes/_install_ruby_and_bundler.liquid, h4. Alma/CentOS/Red Hat 7: There isn't an Alma 7 either, so you can drop that along with Rocky.

Fixed.

  • In the Makefiles, hardcoding make --jobs 8 seems mildly regressive, especially since we still count processors everywhere else. I don't feel too strongly about it, but what was your rationale for this change?

Commit message in 981de3b943cb6da04145fb9e7f1ffcba171c9300:

    20846: Fix shell command in env var.

    With Ruby 3, something uses the MAKE var without the expected
    shell-eval, so the number-of-processors trick stopped working.

    Arvados-DCO-1.1-Signed-off-by: Tom Clegg <tom@curii.com>

        make --jobs \$\(grep -c processor /proc/cpuinfo\) DESTDIR\=
        sitearchdir\=./.gem.20231120-15-fa6sx6 sitelibdir\=./.gem.20231120-15-fa6sx6
        clean
        make: invalid option -- 'c'
        Usage: make [options] [target] ...
        Options:
          -b, -m                      Ignored for compatibility.
          [...]

I just wasn't sure the speedup was worth the time to find the right way to spell it. Is "mildly regressive" a hint to follow up on this, or to not?

20846-ubuntu2204 @ 9329bd8bc74cdb4def31a0ced87a6013606db0a3

Actions #36

Updated by Brett Smith 12 months ago

Tom Clegg wrote in #note-35:

I was trying to minimize the "while we're here, let's do x" since adding 2204/12 already requires touching a lot of things. Added #21230 and put it on next sprint so we don't lose it.

Of course, that's fine, thanks.

Commit message in 981de3b943cb6da04145fb9e7f1ffcba171c9300:

Thank you for spelling that out, love that commit message. I honestly don't care about whether we use 8 or the CPU count as a default, I think both have pros and cons, what bugs me more is the Dockerfile isn't internally consistent. But working around others' bugs is an understandable reason to do that.

What I'd really like to see is this decision handled at the Dockerfile level, I'll file my own ticket for that. This is good to merge, thanks.

Actions #37

Updated by Tom Clegg 12 months ago

  • Assigned To changed from Tom Clegg to Brett Smith
Actions #38

Updated by Tom Clegg 12 months ago

todo:
  • ☐ fix ubuntu1804 and rocky8 builds (oops, I just noticed setuptools>=66 seems to have broken them, see build-packages-ubuntu1804: #2432 /console)
  • ☐ configure jenkins to build and publish packages for u2204/deb12, confirm packages get published
  • ☐ add u2204/deb12 to "supported distributions" list at source:doc/install/install-manual-prerequisites.html.textile.liquid
  • ☐ add u2204/deb12 to "enable repo" instructions at source:doc/install/packages.html.textile.liquid
  • ☐ update tests/tools that use debian11 as the default OS (git grep -Ei 'debian.?11|bullseye')
  • ☐ add u2204/deb12 to arvados-workbench2.git/Makefile and confirm workbench2 package gets built (or, we could just wait until workbench2 packaging merges into the arvados repo in #18874)
  • ☐ manual test installing new packages
  • ☐ update build/README with some of the above todo's?
Actions #39

Updated by Brett Smith 11 months ago

Tom Clegg wrote in #note-38:

todo:

This should be done in #21273. Since we want to drop support for Python<3.8 anyway ( #21087 ), the solution is going to depend on how we decide to approach that. See also #20838.

Actions #40

Updated by Brett Smith 11 months ago

Tom Clegg wrote in #note-38:

todo:
  • ☐ configure jenkins to build and publish packages for u2204/deb12, confirm packages get published

After backporting some changes from other branches into these Dockerfiles, the Jenkins builds are failing with:

node[6]: ../src/node_platform.cc:61:std::unique_ptr<long unsigned int> node::WorkerThreadsTaskRunner::DelayedTaskScheduler::Start(): Assertion `(0) == (uv_thread_create(t.get(), start_thread, this))' failed.
 1: 0xa1ae50 node::Abort() [node]
 2: 0xa1aece  [node]
 3: 0xa8c40a node::WorkerThreadsTaskRunner::WorkerThreadsTaskRunner(int) [node]
 4: 0xa8c4da node::NodePlatform::NodePlatform(int, v8::TracingController*) [node]
 5: 0x9eaa69 node::InitializeOncePerProcess(int, char**) [node]
 6: 0x9eaf61 node::Start(int, char**) [node]
 7: 0x7f2c67c75d90  [/lib/x86_64-linux-gnu/libc.so.6]
 8: 0x7f2c67c75e40 __libc_start_main [/lib/x86_64-linux-gnu/libc.so.6]
 9: 0x982005  [node]
Aborted (core dumped)
The command '/bin/sh -c env -C /usr/local/node-v12.22.12-linux-x64/bin PATH="$PATH:." ./npm install -g yarn' returned a non-zero code: 134

Given the nature of the error (failed to spawn a thread→core dumped) and the fact that I can build it just fine on my Debian 12 system, I suspect that the problem is that the kernel on the Jenkins worker node and the libc in the Docker image are too far apart, and the latter is trying to do something that the former doesn't support.

I'll do a little more investigation, but if I'm right, the next step would be to figure out what that kernel is and what our options are for bringing them closer together.

Actions #41

Updated by Brett Smith 11 months ago

We are definitely not the only ones to run into this problem. GitHub issue, AskUbuntu thread

There seems to be consensus that upgrading Docker fixes it, although why is a little ambiguous. Either older Docker doesn't export the clone3 syscall, or it has something to do with seccomp. Maybe both are true at different specific versions.

The Ask thread includes a custom seccomp policy you can use to allow the necessary syscalls.

You can also run the Docker container with seccomp turned off (docker run --security-opt seccomp=unconfined) or fully privileged. Obviously those are both easiest short-term and worst long-term.

Actions #43

Updated by Peter Amstutz 11 months ago

  • Target version changed from Development 2024-01-03 sprint to Development 2024-01-17 sprint
Actions #44

Updated by Brett Smith 11 months ago

All the ubuntu2204 package tests fail like this:

START: arvados-api-server test on arvados/package-test:ubuntu2204                                                                        
Get:1 file:/arvados/packages/ubuntu2204  InRelease                                                                                       
Ign:1 file:/arvados/packages/ubuntu2204  InRelease                                                                                       
Get:2 file:/arvados/packages/ubuntu2204  Release [1220 B]                                                                                
Get:2 file:/arvados/packages/ubuntu2204  Release [1220 B]                                                                                
Get:3 file:/arvados/packages/ubuntu2204  Release.gpg                                                                                     
Ign:3 file:/arvados/packages/ubuntu2204  Release.gpg                                                                                     
Get:4 file:/arvados/packages/ubuntu2204  Packages [8093 B]                                                                               
Err:4 file:/arvados/packages/ubuntu2204  Packages                                                                                        
  Could not open file /var/lib/apt/lists/partial/_arvados_packages_ubuntu2204_Packages.gz - open (13: Permission denied)                 
Get:4 file:/arvados/packages/ubuntu2204  Packages [37.5 kB]                                                                              
Err:4 file:/arvados/packages/ubuntu2204  Packages                                                                                        
  Could not open file /var/lib/apt/lists/partial/_arvados_packages_ubuntu2204_Packages - open (13: Permission denied)                    
Hit:5 http://security.ubuntu.com/ubuntu jammy-security InRelease                                                                         
Hit:6 http://archive.ubuntu.com/ubuntu jammy InRelease                                                                                   
Hit:7 http://archive.ubuntu.com/ubuntu jammy-updates InRelease                                                                           
Hit:8 http://archive.ubuntu.com/ubuntu jammy-backports InRelease                                                                         
Reading package lists...                                                                                                                 
E: Failed to fetch store:/var/lib/apt/lists/partial/_arvados_packages_ubuntu2204_Packages  Could not open file /var/lib/apt/lists/partial/_arvados_packages_ubuntu2204_Packages - open (13: Permission denied)                                                                    
E: Some index files failed to download. They have been ignored, or old ones used instead.                                                
ERROR: arvados-api-server test on arvados/package-test:ubuntu2204 failed with exit status 100         
Actions #45

Updated by Brett Smith 11 months ago

Brett Smith wrote in #note-44:

All the ubuntu2204 package tests fail like this:

I can make this problem go away in an interactive container if I chmod -R go+rX /arvados first. But then the question is can I do something more refined in infrastructure.

Actions #46

Updated by Brett Smith 11 months ago

Next problems:

Traceback (most recent call last):                                                                                                       
  File "/usr/bin/cwltest", line 8, in <module>                                                                                           
    sys.exit(main())                                                                                                                     
  File "/usr/share/python3/dist/python3-cwltest/lib/python3.10/site-packages/cwltest/main.py", line 85, in main                          
    args = arg_parser().parse_args(sys.argv[1:])                                                                                         
  File "/usr/share/python3/dist/python3-cwltest/lib/python3.10/site-packages/cwltest/argparser.py", line 110, in arg_parser              
    pkg = pkg_resources.require("cwltest")                                                                                               
  File "/usr/share/python3/dist/python3-cwltest/lib/python3.10/site-packages/pkg_resources/__init__.py", line 900, in require            
    needed = self.resolve(parse_requirements(requirements))                                                                              
  File "/usr/share/python3/dist/python3-cwltest/lib/python3.10/site-packages/pkg_resources/__init__.py", line 786, in resolve            
    raise DistributionNotFound(req, requirers)                                                                                           
pkg_resources.DistributionNotFound: The 'importlib-metadata>=0.12; python_version < "3.8"' distribution was not found and is required by pytest
ERROR: python3-cwltest test on arvados/package-test:ubuntu2204 failed with exit status 1                                                 
Traceback (most recent call last):                                                                                                       
  File "/usr/bin/arvados-cwl-runner", line 8, in <module>                                                                                                                                                                                                                         
    sys.exit(main())                                                                                                                                                                                                                                                              
  File "/usr/share/python3/dist/python3-arvados-cwl-runner/lib/python3.10/site-packages/arvados_cwl/__init__.py", line 306, in main                                                                                                                                               
    parser = arg_parser()                                                                                                                                                                                                                                                         
  File "/usr/share/python3/dist/python3-arvados-cwl-runner/lib/python3.10/site-packages/arvados_cwl/__init__.py", line 89, in arg_parser                                                                                                                                          
    exgroup.add_argument("--version", action="version", help="Print version and exit", version=versionstring())                          
  File "/usr/share/python3/dist/python3-arvados-cwl-runner/lib/python3.10/site-packages/arvados_cwl/__init__.py", line 61, in versionstring                                                                                                                                       
    arvcwlpkg = pkg_resources.require("arvados-cwl-runner")                                                                                                                                                                                                                       
  File "/usr/share/python3/dist/python3-arvados-cwl-runner/lib/python3.10/site-packages/pkg_resources/__init__.py", line 900, in require                                                                                                                                          
    needed = self.resolve(parse_requirements(requirements))                                                                              
  File "/usr/share/python3/dist/python3-arvados-cwl-runner/lib/python3.10/site-packages/pkg_resources/__init__.py", line 791, in resolve                                                                                                                                          
    raise VersionConflict(dist, req).with_context(dependent_req)                                                                                                                                                                                                                  
pkg_resources.ContextualVersionConflict: (zipp 3.17.0 (/usr/share/python3/dist/python3-arvados-cwl-runner/lib/python3.10/site-packages), Requirement.parse('zipp<3.16.0; python_version < "3.8"'), {'arvados-cwl-runner'})
ERROR: python3-arvados-cwl-runner test on arvados/package-test:ubuntu2204 failed with exit status 1                                                                                                                                                                               

pkg_resources is complaining that it wants to satisfy specific maximum library versions, but the version installed is later than that. But I don't understand why it's complaining, because the python_version < "3.8" condition means these requirements should be ignored.

These are extra weird because we have plenty of Python-versioned requirements like this, but these CWL packages are the only ones that fail.

At first I thought the CWL tests might be more thorough, but they're comparable to the arv-mount and docker-cleaner tests ("run the command with -h or --version"), so that doesn't seem to be the only factor.

Both tools specifically call pkg_resources.require and that's the thing that seems to be trying to satisfy requirements without regard to python_version.

Actions #47

Updated by Brett Smith 11 months ago

sigh

% pydoc pkg_resources.require
/opt/arvados/lib/python3.8/pydoc.py:343: DeprecationWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html
Actions #48

Updated by Brett Smith 11 months ago

With a switch to importlib ubuntu2204 packages are building and testing successfully on my system, on to debian12.

Actions #49

Updated by Brett Smith 11 months ago

== Packages dependencies for ./usr/share/python3/dist/python3-cwltest/lib/python3.11/site-packages/msgpack/_cmsgpack.cpython-311-x86_64-linux-gnu.so ==
dpkg-query: no path found matching pattern /lib/x86_64-linux-gnu/libstdc++.so.6

== Packages dependencies for ./usr/share/python3/dist/python3-arvados-python-client/lib/python3.11/site-packages/pycurl.cpython-311-x86_64-linux-gnu.so ==
dpkg-query: no path found matching pattern /lib/x86_64-linux-gnu/libbrotlicommon.so.1
[... more...]

== Packages dependencies for ./usr/share/python3/dist/python3-crunchstat-summary/lib/python3.11/site-packages/pycurl.cpython-311-x86_64-linux-gnu.so ==
dpkg-query: no path found matching pattern /lib/x86_64-linux-gnu/libbrotlicommon.so.1
[... more...]

These errors are kind of a red herring. Searching for the libraries with dpkg-query isn't working because ldd lists their paths under /lib but dpkg installs them under /usr/lib. Our library-searching code needs to be updated to handle merged-/usr systems. But that's not the blocker, the blocker is this:

/jenkins/package-testing/test-package-python3-cwltest.sh: 8: cwltest: not found
/jenkins/package-testing/test-package-python3-arvados-python-client.sh: 8: arv-put: not found
/jenkins/package-testing/test-package-python3-crunchstat-summary.sh: 8: crunchstat-summary: not found

I'm lost on this. Looking at the built package everything seems to be assembled okay, nothing's obviously missing, so I'm not sure why these packages specifically would report this problem. But this set of errors is consistent so it's a real issue.

Actions #50

Updated by Brett Smith 11 months ago

Ah.

root@ad5cea8cacf8:/# cwltest
bash: /usr/bin/cwltest: cannot execute: required file not found
root@ad5cea8cacf8:/# head -n1 "$(which cwltest)" 
#!/usr/share/python3/dist/python3-cwltest/bin/python
root@ad5cea8cacf8:/# ls -l /usr/share/python3/dist/python3-cwltest/bin/python
lrwxrwxrwx 1 root root 7 Jan  4 21:44 /usr/share/python3/dist/python3-cwltest/bin/python -> python3
root@ad5cea8cacf8:/# ls -l /usr/share/python3/dist/python3-cwltest/bin/python3
lrwxrwxrwx 1 root root 16 Jan  4 21:44 /usr/share/python3/dist/python3-cwltest/bin/python3 -> /usr/bin/python3
root@ad5cea8cacf8:/# ls -l /usr/bin/python3
ls: cannot access '/usr/bin/python3': No such file or directory
Actions #51

Updated by Brett Smith 11 months ago

The "no /usr/bin/python3" thing is very weird because it doesn't affect all packages and I can't tell what's different between them. e.g., python3-arvados-user-activity and python3-cwltest install the exact same set of dependencies in the exact same order, but the former passes and the latter fails.

I am trying to address it by building a virtualenv that links primarily to a specific minor version of Python, which prevents other problems besides this one.

Actions #52

Updated by Brett Smith 11 months ago

20846-package-build-fixes @ d9389508ca23405edc3cd120a181bcf89d

Tests: developer-run-tests: #3985

Older package builds still passing:

debian11 build: build-packages-debian11: #1136
ubuntu2004 build: build-packages-ubuntu2004: #1468

Newer package builds succeed and pass tests. They fail on the upload step, but that means everything before that went fine. Fixing the upload is going to be an ops task outside the branch.

debian12: build-packages-debian12: #7
ubuntu2204: build-packages-ubuntu2204: #6

  • All agreed upon points are implemented / addressed.
    • This does the second todo "configure jenkins to build and publish packages for u2204/deb12, confirm packages get published" except for the last ops bit as noted above. We should resolve that, then do further testing, before proceeding with the rest of the todos.
  • Anything not implemented (discovered or discussed during work) has a follow-up story.
    • More todos on this story, as noted above.
  • Code is tested and passing, both automated and manual, what manual testing was done is described
    • See test results above.
  • Documentation has been updated.
    • Not yet, that's a later todo.
  • Behaves appropriately at the intended scale (describe intended scale).
    • N/A
  • Considered backwards and forwards compatibility issues between client and server.
    • While this does "modernize" our build process, it's not so modern that it shouldn't work on the "oldstable" distributions, as noted above. I was using all these Python build techniques on Debian 10 (July 2019), they're not radical.
  • Follows our coding standards and GUI style guidelines.
    • Yes for the Python, N/A for the shell

The changes to sdk/cwl to migrate to importlib were the best way to address #20846#note-46 and #20846#note-47. This both resolves the build failure and gets us off a deprecated API.

The changes to the way we build virtualenvs—and the way we don't build virtualenvs to build metapackages anymore—has some overlap with #21230#note-5. I could not find anything tooling like the Salt installer that was relying on the old metapackages to still exist. We also published upgrade notes about migrating to the python3 packages way back for 2.1.0, so I seriously doubt any users still need them.

Actions #53

Updated by Brett Smith 11 months ago

Brett Smith wrote in #note-49:

These errors are kind of a red herring. Searching for the libraries with dpkg-query isn't working because ldd lists their paths under /lib but dpkg installs them under /usr/lib. Our library-searching code needs to be updated to handle merged-/usr systems.

Added in 166e1ea3c71e594a5ede646b9d87763c338936f6. build-packages-debian12: #8 - Still fails the same way, but note the dependency searching now has parity with debian11.

Actions #54

Updated by Tom Clegg 11 months ago

Tom Clegg wrote in #note-38:

  • ☐ update tests/tools that use debian11 as the default OS (git grep -Ei 'debian.?11|bullseye')

I still think we should do this (because I think our default OS for these purposes should always be either "oldstable" or "stable", and 11 is already "oldoldstable") ... but it probably makes more sense for it to be a separate ticket.

Actions #55

Updated by Brett Smith 11 months ago

Tom Clegg wrote in #note-54:

Tom Clegg wrote in #note-38:

  • ☐ update tests/tools that use debian11 as the default OS (git grep -Ei 'debian.?11|bullseye')

I still think we should do this (because I think our default OS for these purposes should always be either "oldstable" or "stable", and 11 is already "oldoldstable") ... but it probably makes more sense for it to be a separate ticket.

I think there are some wires crossed somewhere. 12 is stable, 11 is oldstable.

Actions #56

Updated by Brett Smith 11 months ago

Did some additional cleanup work on the branch, mostly aimed at ensuring our entire build stack consistently uses the same minor version of Python. Also cleaned up some unused code. Also rolled in the changes for rocky8, #21273, just because they're so small as part of this branch. See discussion on that ticket.

Now at a3b72ab765012aea0926571d97ebd31ddbf9ea68. No tests because there are only changes to the build scripts from here, so see previous results. Existing distros all green:

build-packages-debian11: #1140
build-packages-ubuntu2004: #1472
build-packages-rocky8: #278

New distros pass until the upload step:

build-packages-debian12: #9
build-packages-ubuntu2204: #7

Actions #57

Updated by Tom Clegg 11 months ago

This LGTM, thanks.

(Re rocky8 package names, switching from python3-xyz to python39-xyz -- and back again in future versions? -- sounds to me like a lot of unnecessary effort that we should skip.)

Actions #58

Updated by Brett Smith 11 months ago

Tom Clegg wrote in #note-57:

(Re rocky8 package names, switching from python3-xyz to python39-xyz -- and back again in future versions? -- sounds to me like a lot of unnecessary effort that we should skip.)

We wouldn't go back again for rocky8. Only, potentially, for rocky9 or other later releases.

The argument in favor is, this violates user expectations. By Red Hat policy, a package named python3-foo should depend on python3 and not some other Python—especially since doing so changes the Python used system-wide. We can, it works, but it can be surprising for system administrators so is unfriendly.

Actions #59

Updated by Peter Amstutz 11 months ago

Brett Smith wrote in #note-58:

Tom Clegg wrote in #note-57:

(Re rocky8 package names, switching from python3-xyz to python39-xyz -- and back again in future versions? -- sounds to me like a lot of unnecessary effort that we should skip.)

We wouldn't go back again for rocky8. Only, potentially, for rocky9 or other later releases.

The argument in favor is, this violates user expectations. By Red Hat policy, a package named python3-foo should depend on python3 and not some other Python—especially since doing so changes the Python used system-wide. We can, it works, but it can be surprising for system administrators so is unfriendly.

Oh, I think I understand the argument better now. Because the dependency is on python39, and in this situation python39 would replace the default python3, it would be more informative to name our packages like python39-arvados-python-client. But for rocky9, if we're using the system python3, it would just be python3-arvados-python-client. Is that right?

Actions #60

Updated by Brett Smith 11 months ago

Peter Amstutz wrote in #note-59:

Oh, I think I understand the argument better now. Because the dependency is on python39, and in this situation python39 would replace the default python3, it would be more informative to name our packages like python39-arvados-python-client. But for rocky9, if we're using the system python3, it would just be python3-arvados-python-client. Is that right?

Yes, that's all correct.

Actions #61

Updated by Peter Amstutz 11 months ago

Brett Smith wrote in #note-60:

Peter Amstutz wrote in #note-59:

Oh, I think I understand the argument better now. Because the dependency is on python39, and in this situation python39 would replace the default python3, it would be more informative to name our packages like python39-arvados-python-client. But for rocky9, if we're using the system python3, it would just be python3-arvados-python-client. Is that right?

Yes, that's all correct.

So in that case, naming the rocky8 packages with python39- instead of python3- would create an awkward special case in our package building, documentation, etc. So I think we're in agreement that while the guidelines exist for a reason, it's doesn't seem to be worth the effort.

Actions #62

Updated by Brett Smith 10 months ago

Removed debian10 and ubuntu1804 from the test-provision multijob. Should create and add jobs for debian12 and ubuntu2204 once the packaging is uploading correctly (in progress now).

Actions #63

Updated by Brett Smith 10 months ago

  • Related to Feature #21383: Update Salt installer to support Debian 12 added
Actions #64

Updated by Brett Smith 10 months ago

Tom Clegg wrote in #note-38:

  • ☐ manual test installing new packages

IMO instead of testing manually we should add test-provision jobs to test the packages automatically. That's already underway in #21383.

  • ☐ update build/README with some of the above todo's?

I agree this whole process should be documented but I think it needs to go in a wiki since a lot of it involves ops tasks that aren't public.

Actions #66

Updated by Brett Smith 10 months ago

  • Related to Feature #21388: Update list of supported distributions everywhere added
Actions #67

Updated by Brett Smith 10 months ago

  • Related to Feature #21389: Update arvados/jobs Docker image to Debian 12 added
Actions #68

Updated by Brett Smith 10 months ago

  • Related to Bug #21390: Update arvados/dev-jobs Docker image to Debian 12 added
Actions #69

Updated by Brett Smith 10 months ago

Actions #70

Updated by Brett Smith 10 months ago

Tom Clegg wrote in #note-38:

#21388

  • ☐ update tests/tools that use debian11 as the default OS (git grep -Ei 'debian.?11|bullseye')

#21389, #21390, #21391, see also #21392

Actions #71

Updated by Peter Amstutz 10 months ago

  • Status changed from In Progress to Resolved
Actions #72

Updated by Brett Smith 10 months ago

  • Related to Idea #21453: Install Python package virtualenvs under /usr/lib/PKGNAME added
Actions #73

Updated by Brett Smith 10 months ago

  • Related to Idea #21454: Update required_ruby_version in all our gemspecs added
Actions #74

Updated by Peter Amstutz 7 months ago

  • Release set to 70
Actions

Also available in: Atom PDF