Project

General

Profile

Bug #9429

Updated by Tom Clegg almost 8 years ago

The docker image name resolution code in source:services/api/app/models/collection.rb tries to find the latest tag link for the given name. 

 "Latest" is defined as: 
 * The one with the most recent @properties["image_timestamp"]@, if any matching tag links have a parseable timestamp stored there 
 * Otherwise, the one with the most recent @created_at@. 

 (This is the definition of "latest" used by the arv-keepdocker tool to sort results (see source:sdk/python/arvados/commands/keepdocker.py) and seems pragmatic/intuitively correct.) 

 The API implementation goes something like this: 
 * Get tag links from the database, newest first. 
 * Interpret each tag as "at time X, {head_uuid} was the most recent target for tag {name}" (where X is a tuple of timestamps, [image_timestamp, tag_creation_timestamp]). 
 * Update a hash ("uuid_timestamps") to reflect "timestamp for target {head_uuid} is X". 
 * After processing all tag links this way, sort uuid_timestamps by timestamp-tuple, and take the most recent one. 

 This works as long as no single collection appears as head_uuid in multiple tags. 

 However, if the same head_uuid _does_ appear in multiple tags: 
 * uuid_timestamps ends up with the timestamp-tuple that comes from the _oldest tag_, where "oldest" is determined solely by tag creation time (because we process the tags in order "created_at desc", and overwrite existing values) 
 * uuid_timestamps _should_ end up with the _most recent timestamp-tuple_ from all matching tags. This would agree with the definition of "latest" given above. 

 Examples of where this bug would cause bad behavior: 
 * A tag name is attached to collection A, then an incomplete tag link is created (perhaps manually) pointing to B, then a complete tag link is created (using arv-keepdocker) pointing to B. arv-keepdocker will correctly report the latest is B, but API will incorrectly choose A because the oldest tag pointing to B is incomplete. 
 * A tag name is attached first to collection A, then to collection B, then to collection A. The correct answer is A, but the API will choose B, because the oldest tag indicating A is older than the oldest tag indicating B. (Note this scenario is rather contrived.) 

 The fix seems simple: don't overwrite an entry in the uuid_timestamps hash with a timestamp tuple that's older than the existing entry. 

 This should make the API's choice of "latest" correspond to arv-keepdocker's sort order.

Back