Idea #10401
closed[CWL] Limit expansion of Directory inputs
Description
Currently, the default behavior of cwltool and arvados-cwl-runner is to recursively expand directory listings to enumerate all files. For directory trees with hundreds of thousands of files, this is very expensive in terms of both time and memory consumption.
Rework cwltool behavior to accommodate directories which are not expanded by default. Allow the user to explicitly request by a hint whether or not to expand directory listings.
Submit PR to CWL v1.1 spec to standardize feature enabling user to specify whether and how to expand directory listings.
Updated by Peter Amstutz about 8 years ago
- Subject changed from [CWL] Limit expansion of Directory objects to [CWL] Limit expansion of Directory inputs
Updated by Peter Amstutz almost 8 years ago
- Description updated (diff)
- Story points set to 2.0
Updated by Peter Amstutz almost 8 years ago
- Target version set to Arvados Future Sprints
Updated by Peter Amstutz almost 8 years ago
- Status changed from New to In Progress
Updated by Tom Morris almost 8 years ago
- Target version changed from Arvados Future Sprints to 2017-03-15 sprint
Updated by Peter Amstutz almost 8 years ago
- Target version changed from 2017-03-15 sprint to 2017-03-29 sprint
Updated by Peter Amstutz almost 8 years ago
10401-run-upload-dir
Add support for uploading Directories in the "uploadfiles" method of arv-run (used by a-c-r)
Updated by Peter Amstutz almost 8 years ago
10401-limit-dir-expansion
- Bump cwltool version, add support for cwltool:LoadListing hint which controls expansion of directory listings (no_listing, shallow_listing, deep_listing), still defaults to deep_listing behavio for compatibility
- Because this changes how directories are handled (no longer assumes that a directory object has a fully enumerated list of files) this required updates to how directory uploads & path mappings are handled
- Also explicitly checks that the things it tries to upload are file URIs, fixes #11257
- Rework how Arvados-specific CWL extensions are declared, so that document validation is properly aware of the Arvados extensions (this fixes the phone-home bug #11333)
Updated by Peter Amstutz almost 8 years ago
- Target version changed from 2017-03-29 sprint to 2017-04-12 sprint
Updated by Peter Amstutz almost 8 years ago
- Story points changed from 2.0 to 0.5
Updated by Tom Clegg almost 8 years ago
Not sure what to make of this. Master (b9236fbe81426446e1b541a45e219bbe513fe8d0) passes.
====================================================================== ERROR: test_run (tests.test_job.TestWorkflow) ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/tom/src/arvados/tmp/VENVDIR/local/lib/python2.7/site-packages/mock/mock.py", line 1305, in patched return func(*args, **keywargs) File "/home/tom/src/arvados/sdk/cwl/tests/test_job.py", line 262, in test_run tool, metadata = document_loader.resolve_ref("tests/wf/scatter2.cwl") File "/home/tom/src/arvados/tmp/VENVDIR/local/lib/python2.7/site-packages/schema_salad/ref_resolver.py", line 504, in resolve_ref doc if doc else obj, doc_url, checklinks=checklinks) File "/home/tom/src/arvados/tmp/VENVDIR/local/lib/python2.7/site-packages/schema_salad/ref_resolver.py", line 828, in resolve_all self.validate_links(document, u"", all_doc_ids) File "/home/tom/src/arvados/tmp/VENVDIR/local/lib/python2.7/site-packages/schema_salad/ref_resolver.py", line 988, in validate_links raise errors[0] ValidationException: tests/wf/scatter2.cwl:18:1: checking field `steps` tests/wf/scatter2.cwl:19:3: checking object `tests/wf/scatter2.cwl#scatterstep` tests/wf/scatter2.cwl:35:7: checking field `steps` tests/wf/scatter2.cwl:36:9: checking object `tests/wf/scatter2.cwl#scatterstep/mysub/sleep1` tests/wf/scatter2.cwl:37:11: checking field `in` tests/wf/scatter2.cwl:38:13: checking object `tests/wf/scatter2.cwl#scatterstep/mysub/sleep1/sleeptime` tests/wf/scatter2.cwl:48:15: object id `tests/wf/scatter2.cwl#scatterstep/mysub/sleep1/sleeptime` previously defined tests/wf/scatter2.cwl:26:5: checking object `tests/wf/scatter2.cwl#scatterstep/mysub` tests/wf/scatter2.cwl:35:7: checking field `steps` tests/wf/scatter2.cwl:36:9: checking object `tests/wf/scatter2.cwl#scatterstep/mysub/sleep1` tests/wf/scatter2.cwl:37:11: checking field `in` tests/wf/scatter2.cwl:38:13: checking object `tests/wf/scatter2.cwl#scatterstep/mysub/sleep1/sleeptime` tests/wf/scatter2.cwl:48:15: object id `tests/wf/scatter2.cwl#scatterstep/mysub/sleep1/sleeptime` previously defined ====================================================================== FAIL: test_statfile (tests.test_pathmapper.TestPathmap) Test pathmapper handling ArvFile references. ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/tom/src/arvados/tmp/VENVDIR/local/lib/python2.7/site-packages/mock/mock.py", line 1305, in patched return func(*args, **keywargs) File "/home/tom/src/arvados/sdk/cwl/tests/test_pathmapper.py", line 97, in test_statfile p._pathmap) AssertionError: {'tests/hw.py': MapperEnt(resolved='keep:99999999999999999999999999999991+99/hw. [truncated]... != {'tests/hw.py': MapperEnt(resolved='tests/hw.py', target='tests/hw.py', type='Fi [truncated]... - {'tests/hw.py': MapperEnt(resolved='keep:99999999999999999999999999999991+99/hw.py', target='/test/99999999999999999999999999999991+99/hw.py', type='File', staged=True)} + {'tests/hw.py': MapperEnt(resolved='tests/hw.py', target='tests/hw.py', type='File', staged=True)} ====================================================================== FAIL: test_upload (tests.test_pathmapper.TestPathmap) Test pathmapper uploading files. ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/tom/src/arvados/tmp/VENVDIR/local/lib/python2.7/site-packages/mock/mock.py", line 1305, in patched return func(*args, **keywargs) File "/home/tom/src/arvados/sdk/cwl/tests/test_pathmapper.py", line 57, in test_upload p._pathmap) AssertionError: {'tests/hw.py': MapperEnt(resolved='keep:99999999999999999999999999999991+99/hw. [truncated]... != {'tests/hw.py': MapperEnt(resolved='tests/hw.py', target='tests/hw.py', type='Fi [truncated]... - {'tests/hw.py': MapperEnt(resolved='keep:99999999999999999999999999999991+99/hw.py', target='/test/99999999999999999999999999999991+99/hw.py', type='File', staged=True)} + {'tests/hw.py': MapperEnt(resolved='tests/hw.py', target='tests/hw.py', type='File', staged=True)} ----------------------------------------------------------------------
Updated by Peter Amstutz over 7 years ago
Merged master / tests fixed, now @ 76e3d80bcc1548daba7bca97711684e5d68c1624
Updated by Tom Clegg over 7 years ago
Code LGTM.
(from discussion offline)- should add a test.
- cwl spec PR should still happen (but not necessarily before merging the feature).
Updated by Peter Amstutz over 7 years ago
Tom Clegg wrote:
Code LGTM.
(from discussion offline)
- should add a test.
- cwl spec PR should still happen (but not necessarily before merging the feature).
Tests added.
Arvados CWL extensions document updated
Now @ 9ea73e2a66dcb1311732d43ea052412546c625fe
PR for CWL v1.1 will come later.
Updated by Peter Amstutz over 7 years ago
- Status changed from In Progress to Resolved
- % Done changed from 67 to 100
Applied in changeset arvados|commit:49a841a5179307f0c8d84f647a71f44fb2b4b26d.