Project

General

Profile

Actions

Idea #10401

closed

[CWL] Limit expansion of Directory inputs

Added by Peter Amstutz about 8 years ago. Updated over 7 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
-
Target version:
Start date:
03/08/2017
Due date:
Story points:
0.5

Description

Currently, the default behavior of cwltool and arvados-cwl-runner is to recursively expand directory listings to enumerate all files. For directory trees with hundreds of thousands of files, this is very expensive in terms of both time and memory consumption.

Rework cwltool behavior to accommodate directories which are not expanded by default. Allow the user to explicitly request by a hint whether or not to expand directory listings.

Submit PR to CWL v1.1 spec to standardize feature enabling user to specify whether and how to expand directory listings.


Subtasks 3 (0 open3 closed)

Task #11194: Review 10401-limit-dir-expansionResolvedPeter Amstutz03/31/2017Actions
Task #11225: Update cwltool ResolvedPeter Amstutz03/08/2017Actions
Task #11340: Review 10401-run-upload-dirResolvedTom Clegg03/08/2017Actions
Actions #1

Updated by Peter Amstutz about 8 years ago

  • Subject changed from [CWL] Limit expansion of Directory objects to [CWL] Limit expansion of Directory inputs
Actions #2

Updated by Peter Amstutz almost 8 years ago

  • Description updated (diff)
  • Story points set to 2.0
Actions #3

Updated by Peter Amstutz almost 8 years ago

  • Target version set to Arvados Future Sprints
Actions #4

Updated by Peter Amstutz almost 8 years ago

  • Status changed from New to In Progress
Actions #5

Updated by Tom Morris almost 8 years ago

  • Target version changed from Arvados Future Sprints to 2017-03-15 sprint
Actions #6

Updated by Peter Amstutz almost 8 years ago

  • Assigned To set to Peter Amstutz
Actions #7

Updated by Peter Amstutz almost 8 years ago

  • Target version changed from 2017-03-15 sprint to 2017-03-29 sprint
Actions #8

Updated by Peter Amstutz almost 8 years ago

10401-run-upload-dir

Add support for uploading Directories in the "uploadfiles" method of arv-run (used by a-c-r)

Actions #9

Updated by Tom Clegg almost 8 years ago

10401-run-upload-dir @ 7e9156a LGTM

Actions #10

Updated by Peter Amstutz almost 8 years ago

10401-limit-dir-expansion

  • Bump cwltool version, add support for cwltool:LoadListing hint which controls expansion of directory listings (no_listing, shallow_listing, deep_listing), still defaults to deep_listing behavio for compatibility
  • Because this changes how directories are handled (no longer assumes that a directory object has a fully enumerated list of files) this required updates to how directory uploads & path mappings are handled
  • Also explicitly checks that the things it tries to upload are file URIs, fixes #11257
  • Rework how Arvados-specific CWL extensions are declared, so that document validation is properly aware of the Arvados extensions (this fixes the phone-home bug #11333)
Actions #11

Updated by Peter Amstutz almost 8 years ago

  • Target version changed from 2017-03-29 sprint to 2017-04-12 sprint
Actions #12

Updated by Peter Amstutz almost 8 years ago

  • Story points changed from 2.0 to 0.5
Actions #13

Updated by Tom Clegg almost 8 years ago

Not sure what to make of this. Master (b9236fbe81426446e1b541a45e219bbe513fe8d0) passes.

======================================================================
ERROR: test_run (tests.test_job.TestWorkflow)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/tom/src/arvados/tmp/VENVDIR/local/lib/python2.7/site-packages/mock/mock.py", line 1305, in patched
    return func(*args, **keywargs)
  File "/home/tom/src/arvados/sdk/cwl/tests/test_job.py", line 262, in test_run
    tool, metadata = document_loader.resolve_ref("tests/wf/scatter2.cwl")
  File "/home/tom/src/arvados/tmp/VENVDIR/local/lib/python2.7/site-packages/schema_salad/ref_resolver.py", line 504, in resolve_ref
    doc if doc else obj, doc_url, checklinks=checklinks)
  File "/home/tom/src/arvados/tmp/VENVDIR/local/lib/python2.7/site-packages/schema_salad/ref_resolver.py", line 828, in resolve_all
    self.validate_links(document, u"", all_doc_ids)
  File "/home/tom/src/arvados/tmp/VENVDIR/local/lib/python2.7/site-packages/schema_salad/ref_resolver.py", line 988, in validate_links
    raise errors[0]
ValidationException: tests/wf/scatter2.cwl:18:1: checking field `steps`
tests/wf/scatter2.cwl:19:3:   checking object `tests/wf/scatter2.cwl#scatterstep`
tests/wf/scatter2.cwl:35:7:     checking field `steps`
tests/wf/scatter2.cwl:36:9:       checking object `tests/wf/scatter2.cwl#scatterstep/mysub/sleep1`
tests/wf/scatter2.cwl:37:11:         checking field `in`
tests/wf/scatter2.cwl:38:13:           checking object `tests/wf/scatter2.cwl#scatterstep/mysub/sleep1/sleeptime`
tests/wf/scatter2.cwl:48:15:             object id `tests/wf/scatter2.cwl#scatterstep/mysub/sleep1/sleeptime` previously defined
tests/wf/scatter2.cwl:26:5:     checking object `tests/wf/scatter2.cwl#scatterstep/mysub`
tests/wf/scatter2.cwl:35:7:       checking field `steps`
tests/wf/scatter2.cwl:36:9:         checking object `tests/wf/scatter2.cwl#scatterstep/mysub/sleep1`
tests/wf/scatter2.cwl:37:11:           checking field `in`
tests/wf/scatter2.cwl:38:13:             checking object `tests/wf/scatter2.cwl#scatterstep/mysub/sleep1/sleeptime`
tests/wf/scatter2.cwl:48:15:               object id `tests/wf/scatter2.cwl#scatterstep/mysub/sleep1/sleeptime` previously defined

======================================================================
FAIL: test_statfile (tests.test_pathmapper.TestPathmap)
Test pathmapper handling ArvFile references.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/tom/src/arvados/tmp/VENVDIR/local/lib/python2.7/site-packages/mock/mock.py", line 1305, in patched
    return func(*args, **keywargs)
  File "/home/tom/src/arvados/sdk/cwl/tests/test_pathmapper.py", line 97, in test_statfile
    p._pathmap)
AssertionError: {'tests/hw.py': MapperEnt(resolved='keep:99999999999999999999999999999991+99/hw. [truncated]... != {'tests/hw.py': MapperEnt(resolved='tests/hw.py', target='tests/hw.py', type='Fi [truncated]...
- {'tests/hw.py': MapperEnt(resolved='keep:99999999999999999999999999999991+99/hw.py', target='/test/99999999999999999999999999999991+99/hw.py', type='File', staged=True)}
+ {'tests/hw.py': MapperEnt(resolved='tests/hw.py', target='tests/hw.py', type='File', staged=True)}

======================================================================
FAIL: test_upload (tests.test_pathmapper.TestPathmap)
Test pathmapper uploading files.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/tom/src/arvados/tmp/VENVDIR/local/lib/python2.7/site-packages/mock/mock.py", line 1305, in patched
    return func(*args, **keywargs)
  File "/home/tom/src/arvados/sdk/cwl/tests/test_pathmapper.py", line 57, in test_upload
    p._pathmap)
AssertionError: {'tests/hw.py': MapperEnt(resolved='keep:99999999999999999999999999999991+99/hw. [truncated]... != {'tests/hw.py': MapperEnt(resolved='tests/hw.py', target='tests/hw.py', type='Fi [truncated]...
- {'tests/hw.py': MapperEnt(resolved='keep:99999999999999999999999999999991+99/hw.py', target='/test/99999999999999999999999999999991+99/hw.py', type='File', staged=True)}
+ {'tests/hw.py': MapperEnt(resolved='tests/hw.py', target='tests/hw.py', type='File', staged=True)}

----------------------------------------------------------------------
Actions #14

Updated by Peter Amstutz over 7 years ago

Merged master / tests fixed, now @ 76e3d80bcc1548daba7bca97711684e5d68c1624

Actions #15

Updated by Tom Clegg over 7 years ago

Code LGTM.

(from discussion offline)
  • should add a test.
  • cwl spec PR should still happen (but not necessarily before merging the feature).
Actions #16

Updated by Peter Amstutz over 7 years ago

Tom Clegg wrote:

Code LGTM.

(from discussion offline)
  • should add a test.
  • cwl spec PR should still happen (but not necessarily before merging the feature).

Tests added.

Arvados CWL extensions document updated

Now @ 9ea73e2a66dcb1311732d43ea052412546c625fe

PR for CWL v1.1 will come later.

Actions #17

Updated by Tom Clegg over 7 years ago

LGTM

Actions #18

Updated by Peter Amstutz over 7 years ago

  • Status changed from In Progress to Resolved
  • % Done changed from 67 to 100

Applied in changeset arvados|commit:49a841a5179307f0c8d84f647a71f44fb2b4b26d.

Actions

Also available in: Atom PDF