Idea #9277
closed[Crunch2] System-owned container outputs should be garbage-collected
Description
Background¶
When a container finishes, crunch-run creates an output collection and records its PDH in the container record.
On the API server, the container update triggers a hook that creates one copy of the output collection for each of N container requests that have been waiting on this container.
The original output collection is owned by root and there is no cleanup process has trash_at set to now+defaultTrashLifetime.
Proposed fix¶
In crunch-run, create the output collection with trash_at=now.
In the API server hook that creates a collection for each relevant container request, when looking up the container output manifest, make sure to include trashed collections in the search.
Updated by Peter Amstutz about 8 years ago
- Subject changed from [Crunch2] Behavior for output collections to [Crunch2] Intermediate output collections have expiry set
- Description updated (diff)
Updated by Tom Clegg almost 8 years ago
- Description updated (diff)
- Category set to API
Updated by Tom Clegg almost 8 years ago
- Subject changed from [Crunch2] Intermediate output collections have expiry set to [Crunch2] System-owned container outputs should be garbage-collected
Updated by Tom Morris almost 8 years ago
- Target version set to 2017-03-01 sprint
- Story points set to 0.5
Updated by Peter Amstutz almost 8 years ago
Even better than trash_at=now
, it looks like we can just set is_trashed: true
9277-trash-container-outputs @ 72873affa7f249faa16d5d21200e935d27aea911
Updated by Peter Amstutz almost 8 years ago
- Status changed from New to In Progress
Updated by Peter Amstutz almost 8 years ago
- Status changed from In Progress to Resolved
- % Done changed from 0 to 100
Applied in changeset arvados|commit:21598295f38998d8028aaa117f192de6b5758808.
Updated by Peter Amstutz almost 8 years ago
- Status changed from Resolved to Feedback
def validate_output # Output must exist and be readable by the current user. This is so # that a container cannot "claim" a collection that it doesn't otherwise # have access to just by setting the output field to the collection PDH. if output_changed? c = Collection. readable_by(current_user). where(portable_data_hash: self.output). first if !c errors.add :output, "collection must exist and be readable by current user." end end end
This also needs to be unscoped.
Updated by Peter Amstutz almost 8 years ago
Additional fixes on 9277-container-output
Tests:
Updated by Radhika Chippada almost 8 years ago
It would be nice (if it doesn't already exist) to add a test such as "error setting output on running container if trashed collection is not readable". Thanks.
LGTM
Updated by Peter Amstutz almost 8 years ago
- Status changed from Feedback to Resolved
- % Done changed from 50 to 100
Applied in changeset arvados|commit:cca78a112d6a2aa4f76c1956cfe8ec2d43a68759.