Feature #4223

[SDKs] arv keep docker --download to get image from Arvados

Added by Peter Amstutz over 6 years ago. Updated about 4 years ago.

In Progress
Assigned To:
Target version:
Start date:
Due date:
% Done:


Estimated time:
(Total: 0.00 h)
Story points:


Add "arv keep docker --download" to fetch a Docker image from Arvados and load it locally.


Task #5689: Review 4223-arv-keepdocker-downloadIn ProgressPeter Amstutz

Related issues

Related to Arvados - Bug #5752: [SDKs] Fix unreliable image timestamp recording in arv keep dockerResolved04/16/2015

Related to Arvados - Feature #7706: [SDK] arv keep docker can pull a docker image from arvados to your local repositoryDuplicate11/02/2015


#1 Updated by Peter Amstutz over 6 years ago

  • Subject changed from [SDKs] Docker --pull should pull from Arvados, not Docker registry. to [SDKs] arv keep docker --pull should pull from Arvados
  • Description updated (diff)

#2 Updated by Brett Smith over 6 years ago

  • Category set to SDKs
  • Target version set to Arvados Future Sprints

#3 Updated by Peter Amstutz about 6 years ago

  • Subject changed from [SDKs] arv keep docker --pull should pull from Arvados to [SDKs] arv keep docker --download to get image from Arvados
  • Assigned To set to Peter Amstutz
  • Target version changed from Arvados Future Sprints to 2015-04-29 sprint
  • Story points set to 0.5

#4 Updated by Peter Amstutz about 6 years ago

  • Description updated (diff)

Add --download option to arv-keepdocker to fetch a Docker image from Keep and load it locally. Refactor keepdocker to enable SDK users to better access upload/download functionality. Fix API requests to use list_all() so that all results are returned.

#5 Updated by Peter Amstutz about 6 years ago

  • Status changed from New to In Progress

#6 Updated by Tom Clegg about 6 years ago

Docs and interface

Not sure of the background wrt prioritizing this with thin requirements/expectations, but for starters, the story should be updated to say why "arv keep docker --download" is important even though "docker load < ~/keep/by_id/collectionuuid/dockerimage.tar" exists.

The branch needs some documentation for users, too, to explain when/how this is meant to be used. "Fetch docker image from Arvados and load locally" in the usage message is the extent of the docs I can find, and it's a bit terse. I think I understand what this is for, but even if I'm right, we need to tell our users about it, so they know it's there when they need it. Is there somewhere in /doc where it would be appropriate to mention this?

Given that "arv keep docker" now downloads from two different things (docker and keep) and uploads to one (keep), I'm not sure "upload" and "download" alone are sufficient to explain a given operating mode. Presumably, we'll eventually add more modes like "push image from keep to docker" too, and then we'll really like having a good vocabulary.
  • Should we consider switching to the terms "get" and "put" here -- matching "arv put" and "arv get" -- instead of "download" and "upload", which tend to sound (at least to me) like they do things out there on the internet with non-Arvados entities?
  • Or perhaps "arv keep docker load" instead of --download, since this is essentially "docker load" using Keep instead of POSIX?
The new code at 743f59e seems to support three modes:
  • --pull --upload
  • --no-pull --upload
  • --download
It might be good to address some of the other combinations a user can type:
  • --pull --download (probably throw a usage error instead of just ignoring --pull and fetching from Keep?)
  • --no-pull --download (usage error? or "of course --no-pull, ignoring that part"?)
  • --pull --no-upload (I assume this would be easy, and we might as well allow it, so people can use the same tool for this use case too, instead of switching to regular "docker pull". OTOH I'm not keen on --pull --no-upload and switching to "docker pull" isn't that hard, so maybe leave this alone?)
  • --pull --download (usage error, presumably? Can't think of an obvious meaning for this off hand.)
  • --download --force (usage error? "force" seems to apply only on upload. Tangent: "force" looks like it was a bad choice here, since it's far from obvious which of the various conflicts will be "forced" through...)
The guesswork in main(), where --image foo is interpreted as various different things depending on whether any results turn up for previous searches, seems sketchy to me. I think it would be better to do something like this:
  • Leave the --image argument so it still just refers to a docker image
  • Let the --tag argument continue to specify a tag to search for
  • Allow specifying a collection with --collection X (we can reliably distinguish UUID from PDH here, but --uuid and --portable-data-hash might be worth considering as alternatives/synonyms)
  • Avoid doing different searches depending on whether previous searches turned up any results. (If there's a use case for "I don't know what kind of ID this is but I do expect it to turn up a docker image" we should consider that feature, but surely it shouldn't be the only way to specify an image.)


Seems like we should be able to add something...


Couple of initial observations:

The new list_images_in_arv() function signature is rather unwieldy. It accepts three image arguments (image_name, image_tag, and image_hash) but image_hash is ignored if image_name is provided, etc. The "if image_name or image_hash" part might be a clue this function is trying to do too many different things.

64000 and 1024*1024 should be named constants.

_group.add_argument('--upload') should probably say action='store_false', default=False, dest='download' if that's the way we're using the args.download flag. Surely the current code works because of the mutually exclusive group, but still, creating an args.upload flag that is either True or False but never gets read seems like asking for trouble/confusion later.

The usage messages for --tag and --image should be updated now that they're used for --download mode as well.

It looks like arv keep docker --download {collection_uuid} will fail "Docker image '%s' not found" if the collection isn't tagged as a docker image. It seems like it would be better to use it anyway, and either [a] treat the missing tag as a warning instead of an error, or [b] skip the tag lookup entirely, since the collection UUID already specifies exactly one (or zero) available image. This would also permit downloading a docker image using a portable_data_hash, which seems like a really useful feature for careful scripts/people to use.

It looks like arv keep docker --download --image X --tag Y looks in Arvados for an image called X (with any tag), downloads it, and then -- only if X was a collection uuid -- pushes tag Y to the upstream docker repository. Am I following correctly, and is this intentional? I would have guessed this command would restrict its search to images that are tagged with Y, which is rather different, and it seems especially surprising for the tagging behavior to change based on whether X was a collection UUID or a PDH or a docker image id.

#7 Updated by Tom Clegg almost 6 years ago

  • Target version changed from 2015-04-29 sprint to 2015-05-20 sprint

#8 Updated by Tom Clegg almost 6 years ago

  • Target version changed from 2015-05-20 sprint to Arvados Future Sprints

#9 Updated by Peter Amstutz almost 5 years ago

  • Assigned To deleted (Peter Amstutz)

#10 Updated by Peter Amstutz about 4 years ago

I still find myself wanting this feature occasionally, although we may want to abandon this particular branch.

Also available in: Atom PDF