Project

General

Profile

Actions

Bug #11097

closed

[API] Reuse containers even when multiple matching containers exist with differing outputs

Added by Tom Clegg about 7 years ago. Updated about 7 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
API
Target version:
Story points:
0.5

Description

Background

Sometimes, running the same container twice on the same inputs can result in two successes with two different outputs. This can mean a number of things, including
  • undetected failure in one or both cases, perhaps resulting in bogus output
  • both outputs are correct, but have non-meaningful differences (like an "output produced at {timestamp}" comment in an output file)

The second case is common in practice.

Currently, the API server disables the container re-use logic entirely when it detects that two re-use candidates produced different outputs. This causes the following undesirable pattern:
  1. Run container "X" as part of a workflow w1
  2. Re-use container "X" automatically in subsequent workflows w2..w5, saving time
  3. Run workflow w4 with re-use disabled, e.g., to get runtime stats or verify reproducibility -- this runs container "X1" which is identical to "X" but produces different (but still correct) output
  4. Run workflow w5..w9 with re-use enabled
  5. Oops, even when re-running workflow w5, container "X" is not eligible for reuse ever again, because "X1" exists.

Desired behavior

Use the oldest matching container whose output and log collections exist, aren't trashed, and are readable by the current user.

If we used the newest matching container, we would have the following problem:
  1. Run container X, producing out1
  2. Run workflows w1..w9 that reuse X and do a lot of downstream work on out1
  3. Re-run workflows w1..w9 → lots of reused containers
  4. Re-run container X1, producing out2
  5. Re-run workflows w1..w9 → arvados chooses X1 now, so all downstream work has to be redone
Using the oldest matching container fixes the problems given above, while admitting the converse problem:
  1. Run container "X"
  2. Notice that container "X" exited 0 but produced bogus output because of a bug in the container process or Arvados itself
  3. Run container again with re-use disabled: "X1" produces correct output
  4. Run a workflow that makes use of this container
  5. Oops, the workflow gets the bogus "X" output instead of the newer "X1" output

This is the lesser evil in that re-running the same container -- i.e., without fixing the underlying problem that allowed it to exit 0 with bogus output -- is not a viable solution anyway.

Implementation

Disable this check in source:services/api/app/models/container.rb

    if outputs.count.count != 1
      Rails.logger.debug("Found #{outputs.count.length} different outputs")

Subtasks 2 (0 open2 closed)

Task #11140: Update testsResolvedTom Clegg02/13/2017Actions
Task #11111: Review 11097-reuse-impureResolvedRadhika Chippada02/13/2017Actions
Actions

Also available in: Atom PDF