Project

General

Profile

Feature #11100

Updated by Peter Amstutz almost 8 years ago

h1. Background 

 Workflows produce a lot of intermediate collections.    For production workflows that are rarely re-run, the job reuse benefits are minimal, instead this is just clutter and takes up storage space that the user would rather not pay for.    This is also necessary to support a roll-in/roll-out use case where a cluster only has sufficient storage to store a few complete runs and input and output data are transferred from/to somewhere else. 

 h1. Requirements 

 Should be able to specify default behavior (retain or trash) but override behavior for output of specific steps. 

 The final output is always retained.    Input should be unaffected. 

 Intermediate collections need to live as long as they are in use by downstream steps.    When intermediate collections are no longer needed by downstream steps, they should be trashed. 

 h1. Design 

 arvados-cwl-runner submits container requests; when the container completes a collection is created and reported in output_uuid.    Arvados-cwl-runner can then set the trash_at field on the collection. 

 * API server 
 ** Add a "output_ttl" field A simple approach is for arvados-cwl-runner to container request. immediately set the trash_at time to now + 2 weeks (or some configurable time that is longer than the runtime of the workflow).    This value ensures that the collection remains accessible to downstream steps (because it is in seconds. not yet trashed) but still gets deleted eventually.    When This is the easiest solution to implement, but has the drawback that intermediate outputs hang around for much longer than necessary.    There is a small race condition between finalizing the container request and marking the output collection as future trash; if cwl-runner is created for the container request, terminated abruptly it should won't have trash_at a chance to mark it as future trash and delete_at set now + output_ttl (assume it will linger. 

 * A second approach is to record all the output collections and trash them in a batch at the end.    This reduces the time that tokens are issued collections hang around.    However, if cwl-runner is terminated abruptly it won't have a chance to clean up.    This could be combined with expiry times less than trash_at). the previous approach. 

 * A third approach is to track collection lifetimes inside the workflow engine.    A value of null Collections are trashed once there are no more running containers or 0 means don't delete. pending downstream steps which reference the collection.    Add tests. This solution minimizes the size of the working set but is more complex to implement, and has the same problem if cwl-runner is terminated abruptly.    Also, if there is significant time between trashing the collection and actually deleting blocks (e.g. 2 weeks) this effectively degenerates to the previous case. 

 * CWL runner 
 ** A forth approach is to move responsibility for cleaning up to the API server.    Container requests have a "requested by container" field.    When a parent container terminates, all container requests initiated by that container are submitted with output_ttl set when "delete intermediate steps" is enabled  
 ** Default behavior is terminated.    This could be extended to not delete intermediates (to include trashing the output collection of these container requests.    This requires a new flag to be consistent with current behavior.) (output_ttl added to container requests to indicate if its output is None temporary or 0) 
 ** Command line options not (or alternately, Tom suggested overloading the semantics of "output_name" so that an empty output_name indicates temporary output, and a provided "output_name" indicates retained.)    This has the benefit that cleanup happens regardless of whether cwl-runner was able to indicate desired behavior for workflow 
 ** custom Arvados CWL hint terminate gracefully or not. 

 This may interact badly with container request retries.    A cwl-runner run might terminate because of node failure; a new container is automatically submitted which relies on container reuse to specify treatment of individual step be able to pick up where it left off.    If the previous container's outputs 
 were automatically cleaned up, it may be unable to resume the workflow.

Back