Bug #22406
openlib/controller/localdb tests build arvados-server, run it in debian:11 container, assume too much glibc compatibility
Description
These tests build an arvados-server
binary and then run it inside a Debian 11 container. This can fail if there's not enough glibc compatibility between the host system and Debian 11. For example, if you run the tests on Debian 12, it fails with a message like this:
/arvados-server: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.34' not found (required by /arvados-server) /arvados-server: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.32' not found (required by /arvados-server)
Update the test to use an image that matches the host system.
Updated by Brett Smith about 2 months ago
- Subject changed from cmd/arvados-package tests assume too much glibc compatibility to Tests build arvados-server, run it in debian:11 container, assume too much glibc compatibility
source:lib/controller/localdb/login_ldap_docker_test.sh follows the same pattern.
Updated by Peter Amstutz about 2 months ago
Just to make sure I'm following, the binary is built in a Debian 12 environment and then brought backwards to a Debian 11 environment?
Updated by Brett Smith about 2 months ago
Peter Amstutz wrote in #note-2:
Just to make sure I'm following, the binary is built in a Debian 12 environment and then brought backwards to a Debian 11 environment?
Correct. The build environment is whatever distro you happen to be running tests on, and then the Docker container it tests inside is hardcoded to be Debian 11 no matter what.
Updated by Tom Clegg about 1 month ago
arvados-package build
(at least for now) only works when the host and target OSes are compatible in this way
- update the test to use a
-target-os=debian:X
arg compatible the current host OS (test case could have a map of "if host is ubuntu2404, test should build debian12 packages", etc) - ...and/or update the test to skip "unsupported platform, not expected to work" if
`lsb_release -si` != "Debian"
- have
arvados-package build
build anarvados-package
binary in a container running the target OS, instead of assuming/proc/self/exe
will run in the target OS
Updated by Brett Smith about 1 month ago
- Target version set to Development 2025-01-29
- Assigned To set to Brett Smith
- Status changed from New to In Progress
- Description updated (diff)
- Subject changed from Tests build arvados-server, run it in debian:11 container, assume too much glibc compatibility to lib/controller/localdb tests build arvados-server, run it in debian:11 container, assume too much glibc compatibility
I've updated the localdb tests so that they use a better image but they're still failing. I'm working on diagnosing that issue.
We should not bother fixing the arvados-package
tests anymore because of #22436.
Updated by Brett Smith 17 days ago
22406-ldap-docker-network @ 62cb379dfe8536c4083bd1fe673c000d73c10455 - developer-run-tests: #4612
Test rerun developer-run-tests: #4615 - Between both runs it passed all tests 🙄
This branch revamps the test to address several issues. See the commit message in 73893d9c6fdf37ecb52c613ea68f1ec234e7deab for gory details.
It addresses the issue in this ticket by teaching the test how to find the right Docker image for the host distribution, and running controller inside that. The logic works for all our supported versions of Debian and Ubuntu.
It also inverts the test's networking setup: now instead of exposing container ports and contacting them from the host, we create a dedicated Docker network and run all containers inside that. We set up forwarding TCP proxies for services running on the test host that these containers need to contact (PostgreSQL and RailsAPI). This should work on a wider variety of test host networking setups. In particular, it should help us run the test on GitHub Actions (although getting us the rest of the way there is out of scope for this ticket).
A lot of the test code has been ported from shell to Go. Probably the simplest way to convince yourself that we're testing the same things is to compare the assertions in each test to the check_contains
calls at the bottom of login_ldap_docker_test.sh
in main. While we check fewer specific fields, we unmarshal entire JSON responses to their corresponding Arvados structs, so we have more confidence the response has the right shape.
This branch adds yq as a test dependency (which transitively pulls in jq). Tom and I talked a lot about different ways to write the cluster configurations for these tests, and he felt strongly that we should do the cluster configuration "in YAML" and avoid creating purpose-built tools. So here it is. I would prefer not to relitigate this question in branch review if at all possible.
- All agreed upon points are implemented / addressed.
- And then some
- Anything not implemented (discovered or discussed during work) has a follow-up story.
- N/A
- Code is tested and passing, both automated and manual, what manual testing was done is described
- See above. Also gets the test passing in my Debian 12 systemd-nspawn VM.
- Documentation has been updated.
- N/A, pure test change
- Behaves appropriately at the intended scale (describe intended scale).
- Improves the performance of the test by moving common test setup to a
SetUpSuite
function. We no longer start and configure an LDAP server for each individual test.
- Improves the performance of the test by moving common test setup to a
- Considered backwards and forwards compatibility issues between client and server.
- N/A
- Follows our coding standards and GUI style guidelines.
- Yes
Updated by Peter Amstutz 10 days ago
- Target version changed from Development 2025-01-29 to Development 2025-02-12
Updated by Tom Clegg 9 days ago
- && test_database=$("${VENV3DIR}/bin/python3" -c "import yaml; print(yaml.safe_load(open('$ARVADOS_CONFIG','r'))['Clusters']['zzzzz']['PostgreSQL']['Connection']['dbname'])") \
- && psql "$test_database" -c "SELECT pg_terminate_backend (pg_stat_activity.pid::int) FROM pg_stat_activity WHERE pg_stat_activity.datname = '$test_database';" 2>/dev/null
+ && psql arvados_test -c "SELECT pg_terminate_backend (pg_stat_activity.pid::int) FROM pg_stat_activity WHERE pg_stat_activity.datname = '$test_database';" 2>/dev/null
- Why hardcode the database name here, rather than change the python oneliner to yq?
- If this is what we're doing, there's another
'$test_database'
in the sql statement to update.
source:lib/controller/localdb/login_ldap_docker_test.go
Style nit: the "This is an integration test..." comment seems like it should be attached to type LoginDockerSuite struct
rather than the top of the file.
cmd.Env = append(...) out, err := cmd.Output() if err != nil { return "", err } lines := bytes.Split(out, []byte{'\n'}) slices.Reverse(lines) for _, line := range lines { if ip := net.ParseIP(string(line)); ip != nil { return ip.String(), nil } }
- (then we don't need to think about things like what happens when we call cmd.Wait() after a non-EOF read error)
setUpConfig should check the Copy error and also the Close error here:
_, err = io.Copy(dst, src) dst.Close() c.Assert(err, check.IsNil)
updateConfig takes a *map[string]interface{}
instead of a map[string]interface{}
, which is unusual since a map is already a reference -- is there a reason?
parseResponse could skip all the type-casting and let json.Unmarshal do the work:
var e struct { Errors []string } err = json.Unmarshal(respBody, &e) if err != nil { return err } e.Errors = append(e.Errors, "(oops, no error message in error response)") return fmt.Errorf("%s: %s", resp.Status, e.Errors[0])
In #20758 we updated osixia/openldap:1.3.0 to 1.5.0. I don't remember whether that changed anything for us, or just seemed like a sensible thing to do before debugging further. Either way we should probably update it in this branch too, since this branch ensures the ldap changes in #20758 are never going to merge.
In TearDownTest err never equals os.ErrNotExist because the error is wrapped -- should use os.IsNotExist(err) instead.
- c.Check(err, check.Equals, os.ErrNotExist)
+ c.Check(os.IsNotExist(err), check.Equals, true)
I don't love that the LDAP config is in source:sdk/python/run_test_server.py despite only being used by tests in source:lib/controller/localdb -- wouldn't it make more sense to use updateConfig() to set this up where it's needed? (I can see an argument for the PAM.DefaultEmailDomain entry being in run_test_server since it concerns assumptions made by the run-tests/run_test_server infrastructure, not just the PAM test case itself.)
I think we can delete source:lib/controller/localdb/login_pam_test.go now that (*LoginDockerSuite)TestLoginPAM()
exists.
Updated by Brett Smith 8 days ago
Tom Clegg wrote in #note-12:
- Why hardcode the database name here, rather than change the python oneliner to yq?
Because the Rails configuration loader does too. Added a comment about this.
- If this is what we're doing, there's another
'$test_database'
in the sql statement to update.
I extended this even further so configuration setup now sets all the environment variables psql
needs to do work on the test database.
updateConfig takes a
*map[string]interface{}
instead of amap[string]interface{}
, which is unusual since a map is already a reference -- is there a reason?
Nope, that's just my Go inexperience of not realizing maps are already references.
In #20758 we updated osixia/openldap:1.3.0 to 1.5.0. I don't remember whether that changed anything for us, or just seemed like a sensible thing to do before debugging further. Either way we should probably update it in this branch too, since this branch ensures the ldap changes in #20758 are never going to merge.
I went ahead and upgraded us all the way to bitnami/openldap:2.6
, because this seems to be the only OpenLDAP image that's actually maintained.
I don't love that the LDAP config is in source:sdk/python/run_test_server.py despite only being used by tests in source:lib/controller/localdb -- wouldn't it make more sense to use updateConfig() to set this up where it's needed?
While it's only used by the localdb tests right now, I would argue it's reusable by other tests. If we write any more tests in this style, they're likely to also use the same Docker image, since we've established that's the only one actively maintained. Most of the configuration follows from that. Even the hostname can be duplicated if other tests follow the same pattern of creating an explicit Docker network, which they should.
I think we can delete source:lib/controller/localdb/login_pam_test.go now that
(*LoginDockerSuite)TestLoginPAM()
exists.
Agreed, and extended this to login_ldap_test.go
as well. And now that I look at this, I hope all this effort I've put into restructuring this test can actually pay off, because now we're in a better position to write more login tests for more login scenarios.
22406-ldap-docker-network @ 00ded9fe1cd1f0efc40c10c5cfca7d29488df235 - developer-run-tests: #4635
Updated by Brett Smith 6 days ago
b154986a8b89ba902315baa5113cc6b0160d672a updates arvados-server boot
following the run-tests.sh
changes. It also better isolates the tool from the user's PostgreSQL configuration. developer-run-tests: #4638
81a29285b540d59d245d032a01205c431bb69f80 is a small consistency cleanup following that. developer-run-tests: #4640
Hopefully some combination passes all tests.
Updated by Tom Clegg 5 days ago
Brett Smith wrote in #note-13:
- Why hardcode the database name here, rather than change the python oneliner to yq?
Because the Rails configuration loader does too. Added a comment about this.
I get that using a different name currently doesn't work because Rails ignores it, but why double down on that by hardcoding it in more places?
I'm not convinced the Rails special case is even doing anything useful.
# Special case for test database where there's no database.yml,
# because the Arvados config.yml doesn't have a concept of multiple
# rails environments.
#
if ::Rails.env.to_s == "test" && db_config["test"].nil?
$arvados_config["PostgreSQL"]["Connection"]["dbname"] = "arvados_test"
end
Is there a situation where someone would actually want this? Run Rails in test mode, use the database username/password/etc taken from a dev/prod config, but just change the database name to arvados_test? The story I'm getting from the git history is that it was intended to set defaults when database settings were not provided, but the "not provided" logic assumed settings were only ever provided via database.yml in test mode, and we subsequently switched to providing them via config.yml.
- 0e5e80d054cc19c145f5bc02dfc3bd6271fde316 (2019-04-11) -- "Don't override name of test db if one was provided in database.yml"
- 7b4ec6d3c3ed209a42f542e1b646b8e672847fea (2019-07-24) -- "Refactor run_test_server.py and run_test.sh to use config.yml"
Rest LGTM, thanks.
Updated by Tom Clegg 4 days ago
- Related to Bug #22544: Rails config loader should not ignore database name given in config file, even in test mode added
Updated by Brett Smith 4 days ago
Tom Clegg wrote in #note-15:
I get that using a different name currently doesn't work because Rails ignores it, but why double down on that by hardcoding it in more places?
I'm not convinced the Rails special case is even doing anything useful.
Per discussion, I have removed the special case everywhere, including Rails. Testing that a different name actually works is out of scope and can be part of #22445, but at least now we're removing cruft instead of accumulating it.
22406-ldap-docker-network @ 55c54bb3356fa8fa53f1b9bca6261db9f28ab2f9 - developer-run-tests: #4646 - Tests ran one commit earlier but the latest is just a small comment cleanup.