Project

General

Profile

Actions

Task #22006

closed

Some lib/controller integration tests are failing when run inside Docker

Added by Lucas Di Pentima 5 days ago. Updated 3 days ago.

Status:
Resolved
Priority:
Normal
Assigned To:

Description

Steps to reproduce (using branch 20758-gh-action-full-test)

  1. Build docker Image
    docker build . -t arvados-test -f docker/run-tests/Dockerfile
    
  2. Launch docker container
    mkdir -p /tmp/arvados_docker_tmpdir && \
      docker run --rm -v .:/usr/src/arvados \
        -v /tmp/arvados_docker_tmpdir:/tmp/arvados_docker_tmpdir \
        -v /var/run/docker.sock:/var/run/docker.sock \
        -e TMPDIR=/tmp/arvados_docker_tmpdir \
        -w /usr/src/arvados --privileged -ti arvados-test bash
    
  3. Run tests inside running container
    pg_isready || pg_ctlcluster 13 main start && \
      chgrp docker /var/run/docker.sock && \
      mkdir -p ./tmp && chown -R testuser . /tmp/arvados_docker_tmpdir && \
      python3 ./sdk/python/arvados_version.py && \
      su testuser -c 'WORKSPACE=$(pwd) build/run-tests.sh --temp $TMPDIR --only lib/controller'
    

(I sometimes get an error message like "WARNING: Nothing is listening on port 44471 (waited 10 seconds)." on the first run, but it works on subsequent executions)

The resulting run will output errors like:

----------------------------------------------------------------------
FAIL: integration_test.go:1174: IntegrationSuite.TestContainerInputOnDifferentCluster

user token: "v2/z1111-gj3su-rp20a2omqmyruiq/1bl62ztf24lskj908fkn5whjig55ey4knp4q8xa0n8lcqw5f4y" 
user UUID: "z1111-tpzed-qaw1vgyic2j18fx" 
[docker load]
[docker load output] The image busybox:uclibc already exists, renaming the old one with ID sha256:ce2a43b3c4deb43beea362b967c91672d685a3ce4d5c0e4a4439b3f70f32253a to empty string
Loaded image: busybox:uclibc

[arv-keepdocker]
[arv-keepdocker output]
2024-07-22 18:17:57 arvados.arv_put[33921] INFO: Creating new cache file at /home/testuser/.cache/arvados/arv-put/aa8276a33ce934d1f409b82bc0062c82
2024-07-22 18:17:57 arvados.arv_put[33921] INFO: Collection saved as 'Docker image busybox:uclibc sha256:38805'
z2222-4zz18-y8jo5ztiislorzv

container status: Queued, waiting for dispatch
container state changed to "Queued" 
container status: Locked, preparing runtime environment
container state changed to "Locked" 
container state changed to "Queued" 
container status: Queued, preparing runtime environment
container state changed to "Locked" 
container status: Queued, waiting for cloud resources: queue position 1
container state changed to "Queued" 
container status: Queued, waiting while cluster is running at capacity: queue position 1
container state changed to "Locked" 
container status: Locked, waiting for new instance to be ready
container status: Locked, preparing runtime environment
container state changed to "Queued" 
container status: Queued, preparing runtime environment
container status: Queued, waiting for cloud resources: queue position 1
integration_test.go:1183:
    outcoll, logcfs := s.runContainer(c, "z2222", ac.AuthToken, map[string]interface{}{
        "command":         []string{"ls", "/in"},
        "container_image": "busybox:uclibc",
        "cwd":             "/tmp",
        "environment":     map[string]string{},
        "mounts": map[string]arvados.Mount{
            "/in":  {Kind: "collection", PortableDataHash: z1coll.PortableDataHash},
            "/out": {Kind: "tmp", Capacity: 10000},
        },
        "output_path":         "/out",
        "runtime_constraints": arvados.RuntimeConstraints{RAM: 100000000, VCPUs: 1, KeepCacheRAM: 1 << 26},
        "priority":            1,
        "state":               arvados.ContainerRequestStateCommitted,
        "container_count_max": 1,
    }, -1)
integration_test.go:1294:
    c.Errorf("timed out, container state is %q", cr.State)
... Error: timed out, container state is "Committed" 

=== NO LOG COLLECTION saved for container

AFAICT, a-d-c's loopback driver is not working properly, and the test times out because the CR never runs.

Actions #1

Updated by Tom Clegg 4 days ago

Step 3 (pg_isready || ...) failed like this:

======= install sdk/python
Using /usr/src/arvados/sdk/python for version number calculation of /usr/src/arvados/sdk/python
Traceback (most recent call last):
  File "/usr/src/arvados/sdk/python/arvados_version.py", line 137, in <module>
    print(get_version())
  File "/usr/src/arvados/sdk/python/arvados_version.py", line 116, in get_version
    save_version(setup_dir, module, version)
  File "/usr/src/arvados/sdk/python/arvados_version.py", line 101, in save_version
    with Path(setup_dir, module, '_version.py').open('w') as fp:
  File "/usr/lib/python3.9/pathlib.py", line 1241, in open
    return io.open(self, mode, buffering, encoding, errors, newline,
  File "/usr/lib/python3.9/pathlib.py", line 1109, in _opener
    return self._accessor.open(self, flags, mode)
PermissionError: [Errno 13] Permission denied: '/usr/src/arvados/sdk/python/arvados/_version.py'
======= sdk/python install -- FAILED

It seems to work better when I retry after doing this:

root@624c77187f85:/usr/src/arvados# chown testuser:node sdk/python/arvados/_version.py
Actions #2

Updated by Tom Clegg 4 days ago

Looks like systemd-cat (in an environment where journald is not actually running) was the problem. Test passes for me with this change:

20758-gh-action-full-test @ 6ff1f7c6b45a2dea1221863797249eca6f6c51c5

Actions #3

Updated by Tom Clegg 4 days ago

  • Status changed from New to In Progress
Actions #4

Updated by Tom Clegg 3 days ago

  • Remaining (hours) set to 0.0
  • Status changed from In Progress to Resolved
  • Project changed from 40 to Arvados
Actions

Also available in: Atom PDF