Project

General

Profile

Actions

Idea #9397

closed

[Crunch2] Support prepopulating the output directory - CWL InitialWorkDirRequirement

Added by Brett Smith almost 8 years ago. Updated about 7 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
-
Target version:
Start date:
01/31/2017
Due date:
Story points:
1.0
Release:
Release relationship:
Auto

Description

This is needed to support the CWL InitialWorkDirRequirement which is needed for full CWL compliance Crunch1 feature parity.

Minimum required functionality

When a container's output_path is a tmp mount backed by local disk, this output directory can be pre-populated with content from existing collections.
  • Initial content is specified in the container request by mounting collections at mount points that are subdirectories of output_path.
  • Mount points underneath output_path must not have "writable":true -- if any of them do, the API refuses to create/update the container request, and (just in case the API does not catch this problem) crunch-run fails the container.
  • When the container starts, these existing collections and files are readable at the specified mount points.
  • When the container finishes, the mounted collections/files are included in the output collection at the specified mount points. IOW, the container's output is equal to what the container sees in output_path just before it exits. Except: If a mount has "exclude_from_output":true then it is omitted from the container's output collection.
  • If a process in the container tries to modify, remove, or rename these mount points or anything underneath them, the operation fails and the container output is unaffected (as are the underlying collections used to pre-populate).

Implementation

  1. In crunchrun.go SetupMounts(), sort the keys in "runner.Container.Mounts" such that parents are processed before children (e.g., alphanumerically or by length).
  2. The inconsistency between the crunch-run and the spec noted in #note-10 needs to be fixed to follow the spec
  3. In crunchrun.go CaptureOutput(), after getting manifestText, go through runner.Container.Mounts and search for mount points beginning with runner.Container.OutputPath that do not have "exclude_from_output":true.
  4. For each such file and directory, add the relevant manifest fragment to the container output manifest, modifying stream/file names as needed.

The last one may be the most complicated part of the ticket just due to the fact that there is much less infrastructure for manipulating collections in Go compared to the Python SDK.


Subtasks 5 (0 open5 closed)

Task #11018: Add support for InitialWorkDirRequirement to arvados-cwl-runnerResolvedPeter Amstutz01/31/2017Actions
Task #11051: Review 9397-cwl-initialworkdir-crunchv2ResolvedPeter Amstutz01/31/2017Actions
Task #11028: Update documentationResolvedRadhika Chippada01/31/2017Actions
Task #10817: Review 9397-prepopulate-output-directory-paResolvedRadhika Chippada02/01/2017Actions
Task #11072: Review 9397-go-manifestResolvedTom Clegg01/31/2017Actions

Related issues

Related to Arvados - Bug #9674: [CWL] InitialWorkDirRequirement not working as expectedResolvedPeter Amstutz07/27/2016Actions
Related to Arvados - Idea #7582: [CWL] binary run-command shim for CWLResolvedPeter Amstutz10/16/2015Actions
Actions

Also available in: Atom PDF