Bug #21417
closedStop trying to read image timestamp from docker metadata in arv-keepdocker
Description
This part of source:sdk/python/arvados/commands/keepdocker.py should go away so it doesn't crash on new image tarball formats:
json_file = image_tar.extractfile(image_tar.getmember(json_filename)) image_metadata = json.loads(json_file.read().decode('utf-8')) json_file.close() image_tar.close() link_base = {'head_uuid': coll_uuid, 'properties': {}} if 'created' in image_metadata: link_base['properties']['image_timestamp'] = image_metadata['created']
See #21408 for example.
(Tom & Peter discussed offline, came to the conclusion that saving the image timestamp is not important enough to justify maintaining the code.)
Updated by Tom Clegg 11 months ago
- Related to Idea #21408: test-provision-debian11 fails loading workflow Docker image added
Updated by Brett Smith 11 months ago
Tom Clegg wrote:
(Tom & Peter discussed offline, came to the conclusion that saving the image timestamp is not important enough to justify maintaining the code.)
This sounds fine, but does it cause any user-visible changes that are significant enough to warrant documenting?
Updated by Peter Amstutz 11 months ago
- Related to Bug #21431: Figure out compatibility issues with recently released Docker 25 added
Updated by Peter Amstutz 11 months ago
- Related to Support #21394: update CWL conformance tests for 1.2.1 added
Updated by Peter Amstutz 11 months ago
- Related to deleted (Support #21394: update CWL conformance tests for 1.2.1)
Updated by Peter Amstutz 11 months ago
- Blocks Support #21394: update CWL conformance tests for 1.2.1 added
Updated by Tom Clegg 11 months ago
Brett Smith wrote in #note-3:
any user-visible changes that are significant enough to warrant documenting?
As far as either of us could remember, the timestamp is used to sort entries in the "arv keep docker" list, and by railsapi to choose from multiple matching images when submitting a CR. So if we remove it, the most recently uploaded image would be displayed first / selected, instead of the most recently built. We didn't get as far as checking whether the existing behavior is documented. Updating/adding that documentation seems like a good idea to me.
I'm not actually sure whether it affects how cwl-runner selects an image. That's probably worth checking.
Updated by Peter Amstutz 11 months ago
arvados-cwl-runner calls keepdocker.list_images_in_arv
which returns the images in "descending order of preference" takes the first item in the list.
The order of preference is sorted using a tuple of (image_timestamp, created_at).
If the image timestamp is not found, it gets a default value and it only sorts on created_at
.
Updated by Peter Amstutz 11 months ago
Also somewhat relevant, arvados-cwl-runner starts by looking in the project that it is running in, and only expands the search to the whole system if it doesn't find it. This avoids surprises when someone uploads an image with the same name to a different project. So the arvados-cwl-runner behavior doesn't exactly match the API server's behavior any more.
Updated by Brett Smith 11 months ago
- Status changed from New to In Progress
Not tested, but here's the basic gist of how to update the code to correctly, simultaneously support both the old and new formats:
diff --git a/sdk/python/arvados/commands/keepdocker.py b/sdk/python/arvados/commands/keepdocker.py
index 7b7367080f..05153c45ee 100644
--- a/sdk/python/arvados/commands/keepdocker.py
+++ b/sdk/python/arvados/commands/keepdocker.py
@@ -537,16 +537,11 @@ def main(arguments=None, stdout=sys.stdout, install_sig_handlers=True, api=None)
# Read the image metadata and make Arvados links from it.
image_file.seek(0)
- image_tar = tarfile.open(fileobj=image_file)
- image_hash_type, _, raw_image_hash = image_hash.rpartition(':')
- if image_hash_type:
- json_filename = raw_image_hash + '.json'
- else:
- json_filename = raw_image_hash + '/json'
- json_file = image_tar.extractfile(image_tar.getmember(json_filename))
- image_metadata = json.loads(json_file.read().decode('utf-8'))
- json_file.close()
- image_tar.close()
+ with tarfile.open(fileobj=image_file) as image_tar:
+ with image_tar.extractfile('manifest.json') as manifest_file:
+ image_manifest = json.load(manifest_file)
+ with image_tar.extractfile(image_manifest['Config']) as config_file:
+ image_metadata = json.load(config_file)
link_base = {'head_uuid': coll_uuid, 'properties': {}}
if 'created' in image_metadata:
link_base['properties']['image_timestamp'] = image_metadata['created']
I don't disagree that saving the image timestamp is more trouble than it's worth. But given that we're already doing it, I think this change is small and straightforward enough (less code!) that it's easier to do this than it is to worry about how multiple clients are going to react when we have images uploaded with a mix of having and not having timestamps. Especially since we have a mix of old reader clients (e.g., workflows registered with previous versions of a-c-r) with new keepdocker.
Any objections?
Updated by Brett Smith 11 months ago
Brett Smith wrote in #note-12:
I don't disagree that saving the image timestamp is more trouble than it's worth. But given that we're already doing it, I think this change is small and straightforward enough (less code!) that it's easier to do this than it is to worry about how multiple clients are going to react when we have images uploaded with a mix of having and not having timestamps. Especially since we have a mix of old reader clients (e.g., workflows registered with previous versions of a-c-r) with new keepdocker.
Any objections?
No objections at stand-up.
Updated by Brett Smith 11 months ago
21417-keepdocker-oci-layout @ 5f1b2148d042323bd1a6c963aae1bdbe2ef8e73e - developer-run-tests: #4016 - Workbench 2 failing because of #21427.
- All agreed upon points are implemented / addressed.
- Yes, per discussion above
- Anything not implemented (discovered or discussed during work) has a follow-up story.
- N/A
- Code is tested and passing, both automated and manual, what manual testing was done is described
- See above. This branch adds tests to verify the code correctly reads archives with both the old and new layouts.
- Documentation has been updated.
- N/A, this is a pure "works as before with newer versions" change
- Behaves appropriately at the intended scale (describe intended scale).
- No scale change
- Considered backwards and forwards compatibility issues between client and server.
- Code still works with all versions of Docker we currenly support, per test comment above
- Follows our coding standards and GUI style guidelines.
- Yes
Updated by Peter Amstutz 11 months ago
Brett Smith wrote in #note-14:
21417-keepdocker-oci-layout @ 5f1b2148d042323bd1a6c963aae1bdbe2ef8e73e - developer-run-tests: #4016 - Workbench 2 failing because of #21427.
- All agreed upon points are implemented / addressed.
- Yes, per discussion above
- Anything not implemented (discovered or discussed during work) has a follow-up story.
- N/A
- Code is tested and passing, both automated and manual, what manual testing was done is described
- See above. This branch adds tests to verify the code correctly reads archives with both the old and new layouts.
- Documentation has been updated.
- N/A, this is a pure "works as before with newer versions" change
- Behaves appropriately at the intended scale (describe intended scale).
- No scale change
- Considered backwards and forwards compatibility issues between client and server.
- Code still works with all versions of Docker we currenly support, per test comment above
- Follows our coding standards and GUI style guidelines.
- Yes
This LGMTM.
Updated by Brett Smith 11 months ago
- Status changed from In Progress to Resolved
Applied in changeset arvados|d178cf35bcf22fc8eacafd56750110b93c254bc7.
Updated by Brett Smith 9 months ago
- Related to Bug #21657: `arvados-client diagnostics` fails trying to introspect OCIv2 Docker image added