Project

General

Profile

Actions

Idea #21408

closed

test-provision-debian11 fails loading workflow Docker image

Added by Brett Smith 4 months ago. Updated 3 months ago.

Status:
Resolved
Priority:
Normal
Assigned To:
-
Category:
Deployment
Start date:
Due date:
Story points:
-

Description

This started after merging #21230 and #21394.

Running test CWL workflow
INFO /usr/bin/cwl-runner 2.8.0.dev20240119143735, arvados-python-client 2.8.0.dev20240119143735, cwltool 3.1.20230601100705
INFO Resolved 'hasher-workflow.cwl' to 'file:///tmp/cluster_tests/hasher-workflow.cwl'
INFO hasher-workflow.cwl:36:7: Unknown hint WorkReuse
INFO hasher-workflow.cwl:50:7: Unknown hint WorkReuse
INFO hasher-workflow.cwl:64:7: Unknown hint WorkReuse
INFO Using cluster dbn11 (https://dbn11.local:443/)
DEBUG ENTER upload_job_order 1705679832.0055475
DEBUG ENTER scandeps 1705679832.007659
DEBUG EXIT scandeps 1705679832.0088325 0.0011734962463378906
DEBUG ENTER collect uuids 1705679832.0089567
DEBUG EXIT collect uuids 1705679832.0090437 8.702278137207031e-05
DEBUG ENTER collect uploads 1705679832.0090969
DEBUG EXIT collect uploads 1705679832.0091777 8.082389831542969e-05
DEBUG ENTER fetch_uuids 1705679832.0092268
DEBUG EXIT fetch_uuids 1705679832.0092647 3.790855407714844e-05
DEBUG ENTER mapper 1705679832.010263
INFO Upload local files: "test.txt" 
INFO Uploaded to f55e750025853f5b8ccae3ca79240f65+54 (dbn11-4zz18-efp6b3h7f97ysul)
DEBUG EXIT mapper 1705679832.0954368 0.0851738452911377
DEBUG ENTER collectloc 1705679832.0955882
DEBUG EXIT collectloc 1705679832.0957391 0.00015091896057128906
DEBUG ENTER setloc 1705679832.0959854
DEBUG EXIT setloc 1705679832.096145 0.0001595020294189453
DEBUG EXIT upload_job_order 1705679832.0962608 0.0907132625579834
INFO Uploading workflow dependencies
DEBUG ENTER upload_workflow_deps 1705679832.096431
DEBUG ENTER upload_docker 1705679832.096509
INFO ['docker', 'pull', 'arvados/jobs:2.8.0.dev20240119143735']
2.8.0.dev20240119143735: Pulling from arvados/jobs
[`docker pull` progress omitted]
Digest: sha256:f7991a0df2f3dd11b51a3b2aaba0ceb6af5c61195f5671daca6e963d4d8aa487
Status: Downloaded newer image for arvados/jobs:2.8.0.dev20240119143735
docker.io/arvados/jobs:2.8.0.dev20240119143735
INFO Uploading Docker image arvados/jobs:2.8.0.dev20240119143735
2024-01-19 15:57:30 arvados.arv_put[41618] INFO: Creating new cache file at /root/.cache/arvados/arv-put/f8be69c1db20b934481409a3f508ad24
2024-01-19 15:57:32 arvados.arv_put[41618] INFO: Collection saved as 'Docker image arvados jobs:2.8.0.dev20240119143735 sha256:b3d59'
dbn11-4zz18-cky0zrtpcygez8q
2024-01-19 15:57:32 arvados.cwl-runner.metrics[41618] DEBUG: EXIT upload_docker 1705679852.4451454 20.348636388778687 (X-Request-Id: req-z32hupda8gmghtycmu3f)
2024-01-19 15:57:32 arvados.cwl-runner.metrics[41618] DEBUG: EXIT upload_workflow_deps 1705679852.445314 20.348882913589478
2024-01-19 15:57:32 cwltool[41618] ERROR: Workflow error:
"filename 'b3d596985843bccad4c72223479cb9b8159cf6483fd0bfbe732ef152c10e7fcd.json' not found" 
Traceback (most recent call last):
  File "/usr/share/python3.9/dist/python3-arvados-cwl-runner/lib/python3.9/site-packages/arvados_cwl/arvdocker.py", line 141, in arv_docker_get_image
    arvados.commands.keepdocker.main(args, stdout=sys.stderr, install_sig_handlers=False, api=api_client)
  File "/usr/share/python3.9/dist/python3-arvados-cwl-runner/lib/python3.9/site-packages/arvados/commands/keepdocker.py", line 546, in main
    json_file = image_tar.extractfile(image_tar.getmember(json_filename))
  File "/usr/lib/python3.9/tarfile.py", line 1790, in getmember
    raise KeyError("filename %r not found" % name)
KeyError: "filename 'b3d596985843bccad4c72223479cb9b8159cf6483fd0bfbe732ef152c10e7fcd.json' not found" 

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/share/python3.9/dist/python3-arvados-cwl-runner/lib/python3.9/site-packages/cwltool/main.py", line 1301, in main
    (out, status) = real_executor(
  File "/usr/share/python3.9/dist/python3-arvados-cwl-runner/lib/python3.9/site-packages/arvados_cwl/executor.py", line 695, in arv_executor
    merged_map = upload_workflow_deps(self, tool, runtimeContext)
  File "/usr/share/python3.9/dist/python3-arvados-cwl-runner/lib/python3.9/site-packages/arvados_cwl/runner.py", line 741, in upload_workflow_deps
    upload_docker(arvrunner, tool, runtimeContext)
  File "/usr/share/python3.9/dist/python3-arvados-cwl-runner/lib/python3.9/site-packages/arvados_cwl/runner.py", line 557, in upload_docker
    upload_docker(arvrunner, s.embedded_tool, runtimeContext)
  File "/usr/share/python3.9/dist/python3-arvados-cwl-runner/lib/python3.9/site-packages/arvados_cwl/runner.py", line 551, in upload_docker
    arvados_cwl.arvdocker.arv_docker_get_image(arvrunner.api, docker_req, True, runtimeContext)
  File "/usr/share/python3.9/dist/python3-arvados-cwl-runner/lib/python3.9/site-packages/arvados_cwl/arvdocker.py", line 157, in arv_docker_get_image
    cached_lookups[dockerRequirement["dockerImageId"]] = pdh
  File "schema_salad/sourceline.py", line 250, in __exit__
cwltool.errors.WorkflowException: "filename 'b3d596985843bccad4c72223479cb9b8159cf6483fd0bfbe732ef152c10e7fcd.json' not found" 
Build step 'Execute shell' marked build as failure

Files

test-provision-debian11-583.log (1.61 MB) test-provision-debian11-583.log Brett Smith, 01/21/2024 09:07 PM

Related issues

Related to Arvados - Bug #21417: Stop trying to read image timestamp from docker metadata in arv-keepdockerResolvedBrett Smith02/02/2024Actions
Actions #2

Updated by Brett Smith 4 months ago

The file in question does exist in the image. Did something go awry saving/loading the tar archive?

$ docker save 'arvados/jobs:2.8.0.dev20240119143735' | tar -t | grep -F 'b3d596985843bccad4c72223479cb9b8159cf6483fd0bfbe732ef152c10e7fcd.json'; echo $?
b3d596985843bccad4c72223479cb9b8159cf6483fd0bfbe732ef152c10e7fcd.json
0
Actions #3

Updated by Brett Smith 4 months ago

Brett Smith wrote in #note-2:

The file in question does exist in the image. Did something go awry saving/loading the tar archive?

Doesn't seem like it, a rerun failed in the exact same way: test-provision-debian11: #584

Actions #4

Updated by Peter Amstutz 4 months ago

Brett Smith wrote in #note-3:

Brett Smith wrote in #note-2:

The file in question does exist in the image. Did something go awry saving/loading the tar archive?

Doesn't seem like it, a rerun failed in the exact same way: test-provision-debian11: #584

I can reproduce this locally, I just haven't been able to dig into it yet.

Actions #5

Updated by Peter Amstutz 4 months ago

root@05ac4094679e:/var/lib/arvados-arvbox/.cache/arvados/docker# tar tf './sha256:b3d596985843bccad4c72223479cb9b8159cf6483fd0bfbe732ef152c10e7fcd.tar'
blobs/
blobs/sha256/
blobs/sha256/12903f2a4aeda35394dc110df288a667dad7276f5dcceb595ba0a202f3441c84
blobs/sha256/2694fb8f3a9db9e88d1d79fe2690d011385264be9ccea4430079e7c52e60f778
blobs/sha256/2988be0dc4d0d29d9d0bc466524cd83cacfaf80c5d1189396d71ac3b7a931ed9
blobs/sha256/2fa2daa972a9827f3f601e6ea83f721451b87bf1ee0a2c6f8c8bb8963fbb2b9e
blobs/sha256/5b51bb2fc9b4a1d792a3957e67d1ad786d827a512a998005be7abc741ce7ed9d
blobs/sha256/87795966100c75db8a99441a415f004e5f7b2f037ec19d77370a0a42bcb0848d
blobs/sha256/9d0f57cd0d488542afca7714d11d8b25ecb09b16dbd18857e2ddc03f0a757388
blobs/sha256/b3d596985843bccad4c72223479cb9b8159cf6483fd0bfbe732ef152c10e7fcd
blobs/sha256/cabf5b2ebcdc289c7757efc0fbcf983da6f23426d5b24e38edb9880ceac73c75
blobs/sha256/e02922dd07706d31258ea0743d75ee0d1f9aa59d9c618c9ab957d072c4628dcc
blobs/sha256/e6cd39d4cea4051114c9bf1c61995efdb7b33a54c7b46b33b221f8371ae8c2b7
blobs/sha256/f88b3b7e08fcd32afc11571aee4747b7a9a36e3ea655fd8d729f14217eeeace3
blobs/sha256/fbfef1f52bf8a87e2b66659f24263926701e4eda4bfeb4aa2698344e50ea80d6
blobs/sha256/fc745bd5939c6cd55a206d0f45f99351501c3b874c2c0fe5fec245ff2233bfa1
index.json
manifest.json
oci-layout

root@05ac4094679e:/var/lib/arvados-arvbox/.cache/arvados/docker# docker --version
Docker version 25.0.0, build e758fe5
Actions #6

Updated by Peter Amstutz 4 months ago

https://docs.docker.com/engine/release-notes/25.0/

The docker image save tarball output is now OCI compliant. moby/moby#44598

The format inside the tarball changed. Arv-keepdocker looks inside the tarball to get the "created" timestamp. This is added to the Arvados metadata and used to resolve conflicts when there are two images with the same name, by choosing the one created more recently (independent of when the images were actually uploaded).

Either Keepdocker needs to be updated to handle OCI images, or we should remove the image inspection behavior altogether (I believe it already falls back to using the Arvados collection timestamp to resolve conflicts if there's no image_timestamp).

Image handling in general is long overdue to be revisited, but that's out of scope for this ticket.

Actions #7

Updated by Tom Clegg 3 months ago

  • Related to Bug #21417: Stop trying to read image timestamp from docker metadata in arv-keepdocker added
Actions #8

Updated by Brett Smith 3 months ago

  • Target version changed from To be scheduled to Development 2024-02-14 sprint
  • Status changed from New to Resolved
Actions

Also available in: Atom PDF