Story #10401

[CWL] Limit expansion of Directory inputs

Added by Peter Amstutz 10 months ago. Updated 4 months ago.

Status:ResolvedStart date:03/08/2017
Priority:NormalDue date:
Assignee:Peter Amstutz% Done:

100%

Category:-
Target version:2017-04-12 sprint
Story points0.5Remaining (hours)0.00 hour
Velocity based estimate-

Description

Currently, the default behavior of cwltool and arvados-cwl-runner is to recursively expand directory listings to enumerate all files. For directory trees with hundreds of thousands of files, this is very expensive in terms of both time and memory consumption.

Rework cwltool behavior to accommodate directories which are not expanded by default. Allow the user to explicitly request by a hint whether or not to expand directory listings.

Submit PR to CWL v1.1 spec to standardize feature enabling user to specify whether and how to expand directory listings.


Subtasks

Task #11194: Review 10401-limit-dir-expansionResolvedPeter Amstutz

Task #11225: Update cwltool ResolvedPeter Amstutz

Task #11340: Review 10401-run-upload-dirResolvedTom Clegg

Associated revisions

Revision ca19a29f
Added by Peter Amstutz 5 months ago

Merge branch '10401-run-upload-dir' refs #10401

Revision 922e79be
Added by Peter Amstutz 5 months ago

Update cwl test_submit refs #10401

Revision 49a841a5
Added by Peter Amstutz 4 months ago

Merge branch '10401-limit-dir-expansion' closes #10401

Revision defb299d
Added by Peter Amstutz 4 months ago

Add missing test file. refs #10401

Revision 2001423a
Added by Peter Amstutz 4 months ago

Fix regression, restore a line that shouldn't have been removed. refs #10401

History

#1 Updated by Peter Amstutz 10 months ago

  • Subject changed from [CWL] Limit expansion of Directory objects to [CWL] Limit expansion of Directory inputs

#2 Updated by Peter Amstutz 6 months ago

  • Description updated (diff)
  • Story points set to 2.0

#3 Updated by Peter Amstutz 6 months ago

  • Target version set to Arvados Future Sprints

#4 Updated by Peter Amstutz 6 months ago

  • Status changed from New to In Progress

#5 Updated by Tom Morris 6 months ago

  • Target version changed from Arvados Future Sprints to 2017-03-15 sprint

#6 Updated by Peter Amstutz 6 months ago

  • Assignee set to Peter Amstutz

#7 Updated by Peter Amstutz 5 months ago

  • Target version changed from 2017-03-15 sprint to 2017-03-29 sprint

#8 Updated by Peter Amstutz 5 months ago

10401-run-upload-dir

Add support for uploading Directories in the "uploadfiles" method of arv-run (used by a-c-r)

#9 Updated by Tom Clegg 5 months ago

10401-run-upload-dir @ 7e9156a LGTM

#10 Updated by Peter Amstutz 5 months ago

10401-limit-dir-expansion

  • Bump cwltool version, add support for cwltool:LoadListing hint which controls expansion of directory listings (no_listing, shallow_listing, deep_listing), still defaults to deep_listing behavio for compatibility
  • Because this changes how directories are handled (no longer assumes that a directory object has a fully enumerated list of files) this required updates to how directory uploads & path mappings are handled
  • Also explicitly checks that the things it tries to upload are file URIs, fixes #11257
  • Rework how Arvados-specific CWL extensions are declared, so that document validation is properly aware of the Arvados extensions (this fixes the phone-home bug #11333)

#11 Updated by Peter Amstutz 5 months ago

  • Target version changed from 2017-03-29 sprint to 2017-04-12 sprint

#12 Updated by Peter Amstutz 5 months ago

  • Story points changed from 2.0 to 0.5

#13 Updated by Tom Clegg 5 months ago

Not sure what to make of this. Master (b9236fbe81426446e1b541a45e219bbe513fe8d0) passes.

======================================================================
ERROR: test_run (tests.test_job.TestWorkflow)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/tom/src/arvados/tmp/VENVDIR/local/lib/python2.7/site-packages/mock/mock.py", line 1305, in patched
    return func(*args, **keywargs)
  File "/home/tom/src/arvados/sdk/cwl/tests/test_job.py", line 262, in test_run
    tool, metadata = document_loader.resolve_ref("tests/wf/scatter2.cwl")
  File "/home/tom/src/arvados/tmp/VENVDIR/local/lib/python2.7/site-packages/schema_salad/ref_resolver.py", line 504, in resolve_ref
    doc if doc else obj, doc_url, checklinks=checklinks)
  File "/home/tom/src/arvados/tmp/VENVDIR/local/lib/python2.7/site-packages/schema_salad/ref_resolver.py", line 828, in resolve_all
    self.validate_links(document, u"", all_doc_ids)
  File "/home/tom/src/arvados/tmp/VENVDIR/local/lib/python2.7/site-packages/schema_salad/ref_resolver.py", line 988, in validate_links
    raise errors[0]
ValidationException: tests/wf/scatter2.cwl:18:1: checking field `steps`
tests/wf/scatter2.cwl:19:3:   checking object `tests/wf/scatter2.cwl#scatterstep`
tests/wf/scatter2.cwl:35:7:     checking field `steps`
tests/wf/scatter2.cwl:36:9:       checking object `tests/wf/scatter2.cwl#scatterstep/mysub/sleep1`
tests/wf/scatter2.cwl:37:11:         checking field `in`
tests/wf/scatter2.cwl:38:13:           checking object `tests/wf/scatter2.cwl#scatterstep/mysub/sleep1/sleeptime`
tests/wf/scatter2.cwl:48:15:             object id `tests/wf/scatter2.cwl#scatterstep/mysub/sleep1/sleeptime` previously defined
tests/wf/scatter2.cwl:26:5:     checking object `tests/wf/scatter2.cwl#scatterstep/mysub`
tests/wf/scatter2.cwl:35:7:       checking field `steps`
tests/wf/scatter2.cwl:36:9:         checking object `tests/wf/scatter2.cwl#scatterstep/mysub/sleep1`
tests/wf/scatter2.cwl:37:11:           checking field `in`
tests/wf/scatter2.cwl:38:13:             checking object `tests/wf/scatter2.cwl#scatterstep/mysub/sleep1/sleeptime`
tests/wf/scatter2.cwl:48:15:               object id `tests/wf/scatter2.cwl#scatterstep/mysub/sleep1/sleeptime` previously defined

======================================================================
FAIL: test_statfile (tests.test_pathmapper.TestPathmap)
Test pathmapper handling ArvFile references.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/tom/src/arvados/tmp/VENVDIR/local/lib/python2.7/site-packages/mock/mock.py", line 1305, in patched
    return func(*args, **keywargs)
  File "/home/tom/src/arvados/sdk/cwl/tests/test_pathmapper.py", line 97, in test_statfile
    p._pathmap)
AssertionError: {'tests/hw.py': MapperEnt(resolved='keep:99999999999999999999999999999991+99/hw. [truncated]... != {'tests/hw.py': MapperEnt(resolved='tests/hw.py', target='tests/hw.py', type='Fi [truncated]...
- {'tests/hw.py': MapperEnt(resolved='keep:99999999999999999999999999999991+99/hw.py', target='/test/99999999999999999999999999999991+99/hw.py', type='File', staged=True)}
+ {'tests/hw.py': MapperEnt(resolved='tests/hw.py', target='tests/hw.py', type='File', staged=True)}

======================================================================
FAIL: test_upload (tests.test_pathmapper.TestPathmap)
Test pathmapper uploading files.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/tom/src/arvados/tmp/VENVDIR/local/lib/python2.7/site-packages/mock/mock.py", line 1305, in patched
    return func(*args, **keywargs)
  File "/home/tom/src/arvados/sdk/cwl/tests/test_pathmapper.py", line 57, in test_upload
    p._pathmap)
AssertionError: {'tests/hw.py': MapperEnt(resolved='keep:99999999999999999999999999999991+99/hw. [truncated]... != {'tests/hw.py': MapperEnt(resolved='tests/hw.py', target='tests/hw.py', type='Fi [truncated]...
- {'tests/hw.py': MapperEnt(resolved='keep:99999999999999999999999999999991+99/hw.py', target='/test/99999999999999999999999999999991+99/hw.py', type='File', staged=True)}
+ {'tests/hw.py': MapperEnt(resolved='tests/hw.py', target='tests/hw.py', type='File', staged=True)}

----------------------------------------------------------------------

#14 Updated by Peter Amstutz 4 months ago

Merged master / tests fixed, now @ 76e3d80bcc1548daba7bca97711684e5d68c1624

#15 Updated by Tom Clegg 4 months ago

Code LGTM.

(from discussion offline)
  • should add a test.
  • cwl spec PR should still happen (but not necessarily before merging the feature).

#16 Updated by Peter Amstutz 4 months ago

Tom Clegg wrote:

Code LGTM.

(from discussion offline)
  • should add a test.
  • cwl spec PR should still happen (but not necessarily before merging the feature).

Tests added.

Arvados CWL extensions document updated

Now @ 9ea73e2a66dcb1311732d43ea052412546c625fe

PR for CWL v1.1 will come later.

#17 Updated by Tom Clegg 4 months ago

LGTM

#18 Updated by Peter Amstutz 4 months ago

  • Status changed from In Progress to Resolved
  • % Done changed from 67 to 100

Applied in changeset arvados|commit:49a841a5179307f0c8d84f647a71f44fb2b4b26d.

Also available in: Atom PDF