Bug #7154
closed[Various] Various copy operations lose Docker image metadata
Description
The Arvados API server and clients identify Docker images by docker_image_hash
and docker_image_repo+tag
links associated with collections. If links don't point to this collection, it's undiscoverable: jobs can't refer to the image by this metadata, and arv keep docker
won't list it. (You can still use it in a job by specifying the collection content address as your docker_image
.)
At an API level, we can't prevent users from copying Docker image collections without copying the associated metadata. However, a few high-level copy tools also lose the metadata, which is not what users expect:
- Basically all of the Workbench copy mechanisms.
- "Copy to project…" from the collection page.
- Selecting the collection in a project and copying it from the pulldown menu.
- On the collection page, select the single Docker image file in it, and create a new collection from that (although maybe users don't/shouldn't expect this to work)
- arv-copy by collection content address: In this case, arv-copy searches for all collections with that content address, and chooses one to copy based on which one is likely to have the best name. If it chooses a copy that's already lost the metadata links, it won't create any links on the destination either. It should probably copy over metadata from any collection with a matching content address.
Maybe we should just fix all these individual bugs in the copy tools. But I wanted to raise the question: should we be identifying Docker images a different way that less likely to be lost? Maybe through properties on the collection, or something like that?