Project

General

Profile

Feature #11100

Updated by Peter Amstutz about 7 years ago

h1. Background 

 Workflows produce a lot of intermediate collections.    For production workflows that are rarely re-run, the job reuse benefits are minimal, instead this is just clutter and takes up storage space that the user would rather not pay for.    This is also necessary to support a roll-in/roll-out use case where a cluster only has sufficient storage to store a few complete runs and input and output data are transferred from/to somewhere else. 

 h1. Requirements 

 Should be able to specify default behavior (retain or trash) but override behavior for output of specific steps. 

 The final output is always retained.    Input should be unaffected. 

 Intermediate collections need to live as long as they are in use by downstream steps.    When intermediate collections are no longer needed by downstream steps, they should be trashed. 

 h1. Design 

 arvados-cwl-runner submits container requests; when the container completes a collection is created and reported in output_uuid.    Arvados-cwl-runner can then set the trash_at field on the collection. 

 * API server 
 ** Add a "output_ttl" field to container request.    This value is in seconds.    When the output collection is created for the container request, it should have trash_at and delete_at set now + output_ttl (assume that tokens are issued with expiry times less than trash_at).    A value of <= null or 0 means don't set trash_at. 
 ** delete.    Add tests. 
 ** Update documentation 

 * CWL runner 
 ** When "intermediate output TTL" is provided, container Container requests are submitted with output_ttl set when "delete intermediate steps" is enabled  
 ** Default behavior is output_ttl is None or 0 to not delete intermediates (to be consistent with current behavior.) (output_ttl is None or 0) 
 ** When workflow completes successfully, everything marked as intermediate should be trashed immediately.    Do not do this on workflow failure. 
 ** Provide command Command line option options to indicate that things shouldn't be delete immediately desired behavior for workflow 
 ** Custom custom Arvados CWL hint to specify treatment of individual step outputs 
 ** Update documentation 

Back