[CWL] Limit expansion of Directory inputs
|Status:||In Progress||Start date:||03/08/2017|
|Assignee:||Peter Amstutz||% Done:|
|Target version:||2017-04-12 sprint|
|Story points||0.5||Remaining (hours)||0.00 hour|
|Velocity based estimate||-|
Currently, the default behavior of cwltool and arvados-cwl-runner is to recursively expand directory listings to enumerate all files. For directory trees with hundreds of thousands of files, this is very expensive in terms of both time and memory consumption.
Rework cwltool behavior to accommodate directories which are not expanded by default. Allow the user to explicitly request by a hint whether or not to expand directory listings.
Submit PR to CWL v1.1 spec to standardize feature enabling user to specify whether and how to expand directory listings.
#10 Updated by Peter Amstutz 2 days ago
- Bump cwltool version, add support for cwltool:LoadListing hint which controls expansion of directory listings (no_listing, shallow_listing, deep_listing), still defaults to deep_listing behavio for compatibility
- Because this changes how directories are handled (no longer assumes that a directory object has a fully enumerated list of files) this required updates to how directory uploads & path mappings are handled
- Also explicitly checks that the things it tries to upload are file URIs, fixes #11257
- Rework how Arvados-specific CWL extensions are declared, so that document validation is properly aware of the Arvados extensions (this fixes the phone-home bug #11333)