Bug #9429


[API] Docker image name resolution should not disqualify a well-formed tag link merely because an older badly-formed link exists with the same name and target

Added by Tom Clegg about 6 years ago. Updated over 2 years ago.

Assigned To:
Target version:
Start date:
Due date:
% Done:


Estimated time:
Story points:


The docker image name resolution code in source:services/api/app/models/collection.rb tries to find the latest tag link for the given name.

"Latest" is defined as:
  • The one with the most recent properties["image_timestamp"], if any matching tag links have a parseable timestamp stored there
  • Otherwise, the one with the most recent created_at.

(This is the definition of "latest" used by the arv-keepdocker tool to sort results (see source:sdk/python/arvados/commands/ and seems pragmatic/intuitively correct.)

The API implementation goes something like this:
  • Get tag links from the database, newest first.
  • Interpret each tag as "at time X, {head_uuid} was the most recent target for tag {name}" (where X is a tuple of timestamps, [image_timestamp, tag_creation_timestamp]).
  • Update a hash ("uuid_timestamps") to reflect "timestamp for target {head_uuid} is X".
  • After processing all tag links this way, sort uuid_timestamps by timestamp-tuple, and take the most recent one.

This works as long as no single collection appears as head_uuid in multiple tags.

However, if the same head_uuid does appear in multiple tags:
  • uuid_timestamps ends up with the timestamp-tuple that comes from the oldest tag, where "oldest" is determined solely by tag creation time (because we process the tags in order "created_at desc", and overwrite existing values)
  • uuid_timestamps should end up with the most recent timestamp-tuple from all matching tags. This would agree with the definition of "latest" given above.
Examples of where this bug would cause bad behavior:
  • A tag name is attached to collection A, then an incomplete tag link is created (perhaps manually) pointing to B, then a complete tag link is created (using arv-keepdocker) pointing to B. arv-keepdocker will correctly report the latest is B, but API will incorrectly choose A because the oldest tag pointing to B is incomplete.
  • A tag name is attached first to collection A, then to collection B, then to collection A. The correct answer is A, but the API will choose B, because the oldest tag indicating A is older than the oldest tag indicating B. (Note this scenario is rather contrived.)

The fix seems simple: don't overwrite an entry in the uuid_timestamps hash with a timestamp tuple that's older than the existing entry.

This should make the API's choice of "latest" correspond to arv-keepdocker's sort order.

Related issues

Related to Arvados - Feature #4543: [SDKs] it probably shouldn't be possible to have many docker images in Keep with the same docker repository name and tagClosed

Actions #1

Updated by Tom Clegg about 6 years ago

  • Description updated (diff)
Actions #2

Updated by Peter Amstutz over 2 years ago

  • Status changed from New to Closed

Also available in: Atom PDF