Bug #11097
closed[API] Reuse containers even when multiple matching containers exist with differing outputs
Description
Background¶
Sometimes, running the same container twice on the same inputs can result in two successes with two different outputs. This can mean a number of things, including- undetected failure in one or both cases, perhaps resulting in bogus output
- both outputs are correct, but have non-meaningful differences (like an "output produced at {timestamp}" comment in an output file)
The second case is common in practice.
Currently, the API server disables the container re-use logic entirely when it detects that two re-use candidates produced different outputs. This causes the following undesirable pattern:- Run container "X" as part of a workflow w1
- Re-use container "X" automatically in subsequent workflows w2..w5, saving time
- Run workflow w4 with re-use disabled, e.g., to get runtime stats or verify reproducibility -- this runs container "X1" which is identical to "X" but produces different (but still correct) output
- Run workflow w5..w9 with re-use enabled
- Oops, even when re-running workflow w5, container "X" is not eligible for reuse ever again, because "X1" exists.
Desired behavior¶
Use the oldest matching container whose output and log collections exist, aren't trashed, and are readable by the current user.
If we used the newest matching container, we would have the following problem:- Run container X, producing out1
- Run workflows w1..w9 that reuse X and do a lot of downstream work on out1
- Re-run workflows w1..w9 → lots of reused containers
- Re-run container X1, producing out2
- Re-run workflows w1..w9 → arvados chooses X1 now, so all downstream work has to be redone
- Run container "X"
- Notice that container "X" exited 0 but produced bogus output because of a bug in the container process or Arvados itself
- Run container again with re-use disabled: "X1" produces correct output
- Run a workflow that makes use of this container
- Oops, the workflow gets the bogus "X" output instead of the newer "X1" output
This is the lesser evil in that re-running the same container -- i.e., without fixing the underlying problem that allowed it to exit 0 with bogus output -- is not a viable solution anyway.
Implementation¶
Disable this check in source:services/api/app/models/container.rb
if outputs.count.count != 1
Rails.logger.debug("Found #{outputs.count.length} different outputs")
Updated by Tom Clegg almost 8 years ago
11097-reuse-impure @ 264ffa31bae106bb6c36643e13186289b6cd0e18
...fails a few tests -- but perhaps only because it changes the behavior as intended.
Updated by Tom Clegg almost 8 years ago
- Target version changed from 2017-02-15 sprint to Arvados Future Sprints
Updated by Tom Clegg almost 8 years ago
- Target version changed from Arvados Future Sprints to 2017-03-01 sprint
Updated by Tom Clegg almost 8 years ago
4f0e07d462b7860bb10686c27fac16970220377f with updated test case.
Updated by Radhika Chippada almost 8 years ago
- I think moving “select_readable_pdh” to the line above the declaration of “candidates” at line 85 would help improve readability since the rest of the clauses are building on "candidates"
- http://doc.arvados.org/api/methods/container_requests.html#container_reuse needs to be updated regarding the reuse of a “completed” container. Currently, it still stays there won’t be any reuse when multiple completed jobs are found
- We talked about potentially removing output or log on the oldest completed container, if it is not desirable that it be reused. However, it appears that the output or log on a container in completed state can no longer be updated. So how can this be done? Do you mean that either one of these be removed from keep? Do we need to add a blurb about this also in the above documentation?
Updated by Tom Clegg almost 8 years ago
Radhika Chippada wrote:
- I think moving “select_readable_pdh” to the line above the declaration of “candidates” at line 85 would help improve readability since the rest of the clauses are building on "candidates"
Indeed, rearranged this.
- http://doc.arvados.org/api/methods/container_requests.html#container_reuse needs to be updated regarding the reuse of a “completed” container. Currently, it still stays there won’t be any reuse when multiple completed jobs are found
Updated, thanks.
- We talked about potentially removing output or log on the oldest completed container, if it is not desirable that it be reused. However, it appears that the output or log on a container in completed state can no longer be updated. So how can this be done? Do you mean that either one of these be removed from keep? Do we need to add a blurb about this also in the above documentation?
Yes, trashing the output or log collection would accomplish this. I added to the docs "...whose log and output collection are still available". Documenting the "poking re-use in the eye" procedure seems worthwhile too but it's more of a workflow trick than API documentation -- e.g., you could make use of that information even if you only use Workbench and don't know what an API is. Wiki?
Updated by Radhika Chippada almost 8 years ago
Yes, trashing the output or log collection would accomplish this ... "poking re-use in the eye" procedure seems worthwhile too but it's more of a workflow trick than API documentation -- e.g., you could make use of that information even if you only use Workbench and don't know what an API is. Wiki?
I'd imagine someone would ask how to do this in no time. So, please add a note wherever you think appropriate. Thanks.
LGTM
Updated by Tom Clegg almost 8 years ago
- Status changed from In Progress to Resolved
- % Done changed from 50 to 100
Applied in changeset arvados|commit:0c529ed05805507b4d2c903b9587e9b61cec5ee6.