Project

General

Profile

Actions

Feature #11100

closed

[CWL] Intermediary collection handling can be specified

Added by Tom Morris about 7 years ago. Updated almost 7 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
-
Target version:
Story points:
3.0

Description

Background

Workflows produce a lot of intermediate collections. For production workflows that are rarely re-run, the job reuse benefits are minimal, instead this is just clutter and takes up storage space that the user would rather not pay for. This is also necessary to support a roll-in/roll-out use case where a cluster only has sufficient storage to store a few complete runs and input and output data are transferred from/to somewhere else.

Requirements

Should be able to specify default behavior (retain or trash) but override behavior for output of specific steps.

The final output is always retained. Input should be unaffected.

Intermediate collections need to live as long as they are in use by downstream steps. When intermediate collections are no longer needed by downstream steps, they should be trashed.

Design

arvados-cwl-runner submits container requests; when the container completes a collection is created and reported in output_uuid. Arvados-cwl-runner can then set the trash_at field on the collection.

  • API server
    • Add a "output_ttl" field to container request. This value is in seconds. When the output collection is created for the container request, it should have trash_at and delete_at set now + output_ttl (assume that tokens are issued with expiry times less than trash_at). A value of <= 0 means don't set trash_at.
    • Add tests.
    • Update documentation
  • CWL runner
    • When "intermediate output TTL" is provided, container requests are submitted with output_ttl set
    • Default behavior is output_ttl is None or 0 (to be consistent with current behavior.)
    • When workflow completes successfully, everything marked as intermediate should be trashed immediately. Do not do this on workflow failure.
    • Provide command line option to indicate that things shouldn't be delete immediately
    • Custom Arvados CWL hint to specify treatment of individual step outputs
    • Update documentation

Subtasks 3 (0 open3 closed)

Task #11370: Update cwl runner after API feature is mergedResolvedPeter Amstutz05/22/2017Actions
Task #11389: Review 11100-cwl-set-output-ttlResolvedLucas Di Pentima05/22/2017Actions
Task #11388: Review 11100-cr-output-ttlResolvedPeter Amstutz04/04/2017Actions

Related issues

Related to Arvados - Idea #9277: [Crunch2] System-owned container outputs should be garbage-collectedResolvedPeter Amstutz02/16/2017Actions
Related to Arvados - Idea #9589: [Workbench] Update collection interface for collections with non-nil trash_atClosedRadhika Chippada07/13/2016Actions
Actions

Also available in: Atom PDF