Project

General

Profile

Actions

Bug #21417

closed

Stop trying to read image timestamp from docker metadata in arv-keepdocker

Added by Tom Clegg 3 months ago. Updated 3 months ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
-
Story points:
0.5
Release relationship:
Auto

Description

This part of source:sdk/python/arvados/commands/keepdocker.py should go away so it doesn't crash on new image tarball formats:

        json_file = image_tar.extractfile(image_tar.getmember(json_filename))
        image_metadata = json.loads(json_file.read().decode('utf-8'))
        json_file.close()
        image_tar.close()
        link_base = {'head_uuid': coll_uuid, 'properties': {}}
        if 'created' in image_metadata:
            link_base['properties']['image_timestamp'] = image_metadata['created']

See #21408 for example.

(Tom & Peter discussed offline, came to the conclusion that saving the image timestamp is not important enough to justify maintaining the code.)


Subtasks 1 (0 open1 closed)

Task #21439: Review 21417-keepdocker-oci-layoutResolvedPeter Amstutz02/02/2024Actions

Related issues

Related to Arvados - Idea #21408: test-provision-debian11 fails loading workflow Docker imageResolvedActions
Related to Arvados - Bug #21431: Figure out compatibility issues with recently released Docker 25ResolvedBrett SmithActions
Related to Arvados - Bug #21657: `arvados-client diagnostics` fails trying to introspect OCIv2 Docker imageResolvedTom CleggActions
Blocks Arvados - Support #21394: update CWL conformance tests for 1.2.1ResolvedPeter AmstutzActions
Actions #1

Updated by Tom Clegg 3 months ago

  • Related to Idea #21408: test-provision-debian11 fails loading workflow Docker image added
Actions #2

Updated by Tom Clegg 3 months ago

  • Description updated (diff)
Actions #3

Updated by Brett Smith 3 months ago

Tom Clegg wrote:

(Tom & Peter discussed offline, came to the conclusion that saving the image timestamp is not important enough to justify maintaining the code.)

This sounds fine, but does it cause any user-visible changes that are significant enough to warrant documenting?

Actions #4

Updated by Peter Amstutz 3 months ago

  • Related to Bug #21431: Figure out compatibility issues with recently released Docker 25 added
Actions #5

Updated by Peter Amstutz 3 months ago

  • Related to Support #21394: update CWL conformance tests for 1.2.1 added
Actions #6

Updated by Peter Amstutz 3 months ago

  • Related to deleted (Support #21394: update CWL conformance tests for 1.2.1)
Actions #7

Updated by Peter Amstutz 3 months ago

Actions #8

Updated by Brett Smith 3 months ago

  • Assigned To set to Brett Smith
Actions #9

Updated by Tom Clegg 3 months ago

Brett Smith wrote in #note-3:

any user-visible changes that are significant enough to warrant documenting?

As far as either of us could remember, the timestamp is used to sort entries in the "arv keep docker" list, and by railsapi to choose from multiple matching images when submitting a CR. So if we remove it, the most recently uploaded image would be displayed first / selected, instead of the most recently built. We didn't get as far as checking whether the existing behavior is documented. Updating/adding that documentation seems like a good idea to me.

I'm not actually sure whether it affects how cwl-runner selects an image. That's probably worth checking.

Actions #10

Updated by Peter Amstutz 3 months ago

arvados-cwl-runner calls keepdocker.list_images_in_arv which returns the images in "descending order of preference" takes the first item in the list.

The order of preference is sorted using a tuple of (image_timestamp, created_at).

If the image timestamp is not found, it gets a default value and it only sorts on created_at.

Actions #11

Updated by Peter Amstutz 3 months ago

Also somewhat relevant, arvados-cwl-runner starts by looking in the project that it is running in, and only expands the search to the whole system if it doesn't find it. This avoids surprises when someone uploads an image with the same name to a different project. So the arvados-cwl-runner behavior doesn't exactly match the API server's behavior any more.

Actions #12

Updated by Brett Smith 3 months ago

  • Status changed from New to In Progress

Not tested, but here's the basic gist of how to update the code to correctly, simultaneously support both the old and new formats:

diff --git a/sdk/python/arvados/commands/keepdocker.py b/sdk/python/arvados/commands/keepdocker.py
index 7b7367080f..05153c45ee 100644
--- a/sdk/python/arvados/commands/keepdocker.py
+++ b/sdk/python/arvados/commands/keepdocker.py
@@ -537,16 +537,11 @@ def main(arguments=None, stdout=sys.stdout, install_sig_handlers=True, api=None)

         # Read the image metadata and make Arvados links from it.
         image_file.seek(0)
-        image_tar = tarfile.open(fileobj=image_file)
-        image_hash_type, _, raw_image_hash = image_hash.rpartition(':')
-        if image_hash_type:
-            json_filename = raw_image_hash + '.json'
-        else:
-            json_filename = raw_image_hash + '/json'
-        json_file = image_tar.extractfile(image_tar.getmember(json_filename))
-        image_metadata = json.loads(json_file.read().decode('utf-8'))
-        json_file.close()
-        image_tar.close()
+        with tarfile.open(fileobj=image_file) as image_tar:
+            with image_tar.extractfile('manifest.json') as manifest_file:
+                image_manifest = json.load(manifest_file)
+            with image_tar.extractfile(image_manifest['Config']) as config_file:
+                image_metadata = json.load(config_file)
         link_base = {'head_uuid': coll_uuid, 'properties': {}}
         if 'created' in image_metadata:
             link_base['properties']['image_timestamp'] = image_metadata['created']

I don't disagree that saving the image timestamp is more trouble than it's worth. But given that we're already doing it, I think this change is small and straightforward enough (less code!) that it's easier to do this than it is to worry about how multiple clients are going to react when we have images uploaded with a mix of having and not having timestamps. Especially since we have a mix of old reader clients (e.g., workflows registered with previous versions of a-c-r) with new keepdocker.

Any objections?

Actions #13

Updated by Brett Smith 3 months ago

Brett Smith wrote in #note-12:

I don't disagree that saving the image timestamp is more trouble than it's worth. But given that we're already doing it, I think this change is small and straightforward enough (less code!) that it's easier to do this than it is to worry about how multiple clients are going to react when we have images uploaded with a mix of having and not having timestamps. Especially since we have a mix of old reader clients (e.g., workflows registered with previous versions of a-c-r) with new keepdocker.

Any objections?

No objections at stand-up.

Actions #14

Updated by Brett Smith 3 months ago

21417-keepdocker-oci-layout @ 5f1b2148d042323bd1a6c963aae1bdbe2ef8e73e - developer-run-tests: #4016 - Workbench 2 failing because of #21427.

  • All agreed upon points are implemented / addressed.
    • Yes, per discussion above
  • Anything not implemented (discovered or discussed during work) has a follow-up story.
    • N/A
  • Code is tested and passing, both automated and manual, what manual testing was done is described
    • See above. This branch adds tests to verify the code correctly reads archives with both the old and new layouts.
  • Documentation has been updated.
    • N/A, this is a pure "works as before with newer versions" change
  • Behaves appropriately at the intended scale (describe intended scale).
    • No scale change
  • Considered backwards and forwards compatibility issues between client and server.
    • Code still works with all versions of Docker we currenly support, per test comment above
  • Follows our coding standards and GUI style guidelines.
    • Yes
Actions #15

Updated by Peter Amstutz 3 months ago

Brett Smith wrote in #note-14:

21417-keepdocker-oci-layout @ 5f1b2148d042323bd1a6c963aae1bdbe2ef8e73e - developer-run-tests: #4016 - Workbench 2 failing because of #21427.

  • All agreed upon points are implemented / addressed.
    • Yes, per discussion above
  • Anything not implemented (discovered or discussed during work) has a follow-up story.
    • N/A
  • Code is tested and passing, both automated and manual, what manual testing was done is described
    • See above. This branch adds tests to verify the code correctly reads archives with both the old and new layouts.
  • Documentation has been updated.
    • N/A, this is a pure "works as before with newer versions" change
  • Behaves appropriately at the intended scale (describe intended scale).
    • No scale change
  • Considered backwards and forwards compatibility issues between client and server.
    • Code still works with all versions of Docker we currenly support, per test comment above
  • Follows our coding standards and GUI style guidelines.
    • Yes

This LGMTM.

Actions #16

Updated by Brett Smith 3 months ago

  • Status changed from In Progress to Resolved
Actions #17

Updated by Brett Smith 20 days ago

  • Related to Bug #21657: `arvados-client diagnostics` fails trying to introspect OCIv2 Docker image added
Actions

Also available in: Atom PDF