Feature #11100

[CWL] Intermediary collection handling can be specified

Added by Tom Morris 2 months ago. Updated 2 days ago.

Status:In ProgressStart date:03/30/2017
Priority:NormalDue date:
Assignee:Peter Amstutz% Done:

50%

Category:-
Target version:2017-05-10 sprint
Story points3.0Remaining (hours)0.00 hour
Velocity based estimate-

Description

Background

Workflows produce a lot of intermediate collections. For production workflows that are rarely re-run, the job reuse benefits are minimal, instead this is just clutter and takes up storage space that the user would rather not pay for. This is also necessary to support a roll-in/roll-out use case where a cluster only has sufficient storage to store a few complete runs and input and output data are transferred from/to somewhere else.

Requirements

Should be able to specify default behavior (retain or trash) but override behavior for output of specific steps.

The final output is always retained. Input should be unaffected.

Intermediate collections need to live as long as they are in use by downstream steps. When intermediate collections are no longer needed by downstream steps, they should be trashed.

Design

arvados-cwl-runner submits container requests; when the container completes a collection is created and reported in output_uuid. Arvados-cwl-runner can then set the trash_at field on the collection.

  • API server
    • Add a "output_ttl" field to container request. This value is in seconds. When the output collection is created for the container request, it should have trash_at and delete_at set now + output_ttl (assume that tokens are issued with expiry times less than trash_at). A value of <= 0 means don't set trash_at.
    • Add tests.
    • Update documentation
  • CWL runner
    • When "intermediate output TTL" is provided, container requests are submitted with output_ttl set
    • Default behavior is output_ttl is None or 0 (to be consistent with current behavior.)
    • When workflow completes successfully, everything marked as intermediate should be trashed immediately. Do not do this on workflow failure.
    • Provide command line option to indicate that things shouldn't be delete immediately
    • Custom Arvados CWL hint to specify treatment of individual step outputs
    • Update documentation

Subtasks

Task #11389: Review a-c-r changesNewTom Clegg

Task #11370: Update cwl runner after API feature is mergedNewPeter Amstutz

Task #11388: Review 11100-cr-output-ttlResolvedPeter Amstutz


Related issues

Related to Arvados - Story #9277: [Crunch2] System-owned container outputs should be garbag... Resolved 02/16/2017
Related to Arvados - Story #9589: [Workbench] Update collection interface for collections w... New 07/13/2016

Associated revisions

Revision ff8d14ac
Added by Tom Clegg 29 days ago

Merge branch '7709-api-rails4' (partial)

refs #7709
refs #11100

Revision 77f5a84c
Added by Tom Clegg 24 days ago

Merge branch '11100-cr-output-ttl'

refs #11100

History

#1 Updated by Peter Amstutz 2 months ago

  • Description updated (diff)

#2 Updated by Tom Morris about 1 month ago

  • Assignee set to Peter Amstutz

#3 Updated by Peter Amstutz about 1 month ago

  • Description updated (diff)

#4 Updated by Peter Amstutz about 1 month ago

  • Description updated (diff)

#5 Updated by Peter Amstutz about 1 month ago

  • Description updated (diff)
  • Add a "output_ttl" on container request which means output will have trash_at and delete_at set now + output_ttl (assume that tokens are issued with expiry times less than trash_at)
  • Intermediate outputs can be used/reused before they are trashed
  • CWL runner intermediate steps are submitted with output_ttl
  • CWL options to control default treatment of step outputs

#6 Updated by Peter Amstutz about 1 month ago

  • Description updated (diff)

#7 Updated by Tom Morris about 1 month ago

  • Assignee deleted (Peter Amstutz)
  • Target version changed from Arvados Future Sprints to 2017-04-12 sprint
  • Story points set to 3.0

#8 Updated by Peter Amstutz about 1 month ago

  • Description updated (diff)

#9 Updated by Peter Amstutz about 1 month ago

  • Project changed from Arvados Private to Arvados

#10 Updated by Tom Clegg about 1 month ago

  • Assignee set to Tom Clegg

#11 Updated by Tom Clegg 29 days ago

Currently it's an error to set delete_at to a time earlier than blob_signature_ttl. Some time passes between when the container request chooses a delete_at and when the collection validation checks whether delete_at is valid. This means the container request code has to add a few seconds of grace period to avoid failing during slow times. Perhaps it would be better to change the rule so, if a client sets delete_at to a value too soon to achieve safely, it automatically gets extended to the earliest possible time?

This behavior doesn't seem to be documented in either of the likely places

#12 Updated by Tom Clegg 29 days ago

#13 Updated by Peter Amstutz 29 days ago

Tom Clegg wrote:

Currently it's an error to set delete_at to a time earlier than blob_signature_ttl. Some time passes between when the container request chooses a delete_at and when the collection validation checks whether delete_at is valid. This means the container request code has to add a few seconds of grace period to avoid failing during slow times. Perhaps it would be better to change the rule so, if a client sets delete_at to a value too soon to achieve safely, it automatically gets extended to the earliest possible time?

So instead of a validation error, it would just adjust it to a valid value. Yes.

#14 Updated by Tom Clegg 29 days ago

  • Status changed from New to In Progress

#15 Updated by Tom Clegg 29 days ago

With "earliest possible delete_at":

11100-cr-output-ttl @ 65121f8db54a1ed15207d050e1f48c5fc26d646b

#16 Updated by Peter Amstutz 28 days ago

Documentation of the output_ttl should specify units.

Should the code to adjust delete_at to the earliest valid time be performed before_validation so that the actual validation doesn't modify the record? (Not sure if rails is opinionated about validations being pure). (default_trash_interval does something similar and is in before_validation).

Rest LGTM

#17 Updated by Tom Clegg 28 days ago

Peter Amstutz wrote:

Documentation of the output_ttl should specify units.

Indeed. Got it in Containers API but missed it in docs. Fixed.

Should the code to adjust delete_at to the earliest valid time be performed before_validation so that the actual validation doesn't modify the record? (Not sure if rails is opinionated about validations being pure). (default_trash_interval does something similar and is in before_validation).

Not sure about Rails either, but yes, that seems neater. Fixed.

Also added a bit to acknowledge that no tokens will expire before the previous (hence previously validated) value of delete_at.

11100-cr-output-ttl @ ff3bb22d4b2bff5666907a6eeb6cd68cd3cbe22b

#18 Updated by Tom Clegg 24 days ago

  • Assignee changed from Tom Clegg to Peter Amstutz

#19 Updated by Peter Amstutz 16 days ago

  • Target version changed from 2017-04-12 sprint to 2017-04-26 sprint

#20 Updated by Peter Amstutz 2 days ago

  • Target version changed from 2017-04-26 sprint to 2017-05-10 sprint

Also available in: Atom PDF