Project

General

Profile

Actions

Feature #14322

closed

[CWL] Accept collection uuid in input

Added by Peter Amstutz about 6 years ago. Updated almost 6 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
-
Target version:
Story points:
2.0
Release relationship:
Auto

Description

Arvados-cwl-runner should allow users to provide uuids in input documents. For example:

{
  "file1": {
    "class": "File",
    "location": "keep:zzzzz-4zz18-zzzzzzzzzzzzz/file1.txt" 
  }
}

Arvados-cwl-runner should replace the value in "location" with the portable data hash and record the UUID in the "arv:collection" field. This applies to both File and Directory objects. Implementation note: this should probably happen somewhere in upload_workflow_deps() which is responsible for updating File references for uploaded files.

{
  "file1": {
    "class": "File",
    "location": "keep:abc+123/file1.txt",
    "arv:collection": "zzzzz-4zz18-zzzzzzzzzzzzz/file1.txt" 
  }
}

Users may provide both "location" by PDH and "arv:collection" in the input. If both portable data hash and arv:collection are present, the portable data hash will take precedence. Print a warning (or error???) if the collection uuid is not readable or the does not match the PDH that was provided.

When constructing container requests, if "arv:collection" is known for a File or Directory object, include it in the mount object alongside the portable data hash.

Implementation note: code paths that test startswith("keep:") will need to be updated to distinguish keep identifiers by UUID or PDH. (Alternately, we could use a different uri scheme for uuids).


Subtasks 1 (0 open1 closed)

Task #14893: Review 14322-cwl-uuid-inputResolvedPeter Amstutz03/13/2019Actions

Related issues 2 (1 open1 closed)

Related to Arvados - Feature #11442: [CWL] Resolve symbolic names to collectionsNewActions
Related to Arvados - Feature #14323: [API] Accept container mounts that specify both uuid and portable_data_hashResolvedLucas Di Pentima11/07/2018Actions
Actions #1

Updated by Peter Amstutz about 6 years ago

  • Status changed from New to In Progress
Actions #2

Updated by Peter Amstutz about 6 years ago

  • Subject changed from [CWL] Accept collection uuid to [CWL] Accept collection uuid
  • Status changed from In Progress to New
Actions #3

Updated by Peter Amstutz about 6 years ago

  • Description updated (diff)
Actions #4

Updated by Peter Amstutz about 6 years ago

  • Subject changed from [CWL] Accept collection uuid to [CWL] Accept collection uuid in input
Actions #5

Updated by Tom Clegg about 6 years ago

  • Related to Feature #11442: [CWL] Resolve symbolic names to collections added
Actions #6

Updated by Tom Clegg about 6 years ago

  • Related to Feature #14323: [API] Accept container mounts that specify both uuid and portable_data_hash added
Actions #7

Updated by Peter Amstutz about 6 years ago

  • Description updated (diff)
Actions #8

Updated by Peter Amstutz about 6 years ago

  • Description updated (diff)
Actions #9

Updated by Tom Morris about 6 years ago

  • Target version changed from To Be Groomed to Arvados Future Sprints
  • Story points set to 2.0
Actions #10

Updated by Tom Morris almost 6 years ago

  • Target version changed from Arvados Future Sprints to 2019-03-13 Sprint
Actions #11

Updated by Peter Amstutz almost 6 years ago

  • Assigned To set to Peter Amstutz
Actions #12

Updated by Tom Morris almost 6 years ago

  • Release set to 15
Actions #13

Updated by Peter Amstutz almost 6 years ago

  • Status changed from New to In Progress
Actions #14

Updated by Peter Amstutz almost 6 years ago

  • Target version changed from 2019-03-13 Sprint to 2019-03-27 Sprint
Actions #15

Updated by Peter Amstutz almost 6 years ago

14322-cwl-uuid-input 3e9cb56544f3acecf6aa2bf967263600abf0c584

  • Accept 'location: keep:zzzzz-4zz18-zzzzzzzzzzzzzzz' and convert it to portable data hash
  • Record uuid in arv:collectionUUID field
  • Input can also provide both 'location: keep:PDH' and 'collectionUUID: zzz', this will check that the collectionUUID has the expected PDH.

https://ci.curoverse.com/view/Developer/job/developer-run-tests/1116/

Looking at it this morning I realized I should probably update the documentation as well.

Actions #16

Updated by Peter Amstutz almost 6 years ago

14322-cwl-uuid-input @ b65b36691117322a34170f28ae1997073f2829f0

https://ci.curoverse.com/view/Developer/job/developer-run-tests/1121/

  • Accept 'location: keep:zzzzz-4zz18-zzzzzzzzzzzzzzz' and convert it to portable data hash
  • Record uuid in arv:collectionUUID field
  • Input can also provide both 'location: keep:PDH' and 'collectionUUID: zzz', this will check that the collectionUUID has the expected PDH.
  • Update docs
  • CollectionFsAccess accepts keep:uuid URIs
  • Workbench updated to use collectionUUID field
Actions #17

Updated by Eric Biagiotti almost 6 years ago

Actions #18

Updated by Peter Amstutz almost 6 years ago

Eric Biagiotti wrote:

  • Looks like the python 2 versions of test_submit_mismatched_uuid_inputs and test_submit_unknown_uuid_inputs have unicode problems.

Fixed.

For a large number of fetch_uuids, API server may limit response size, so we need to keep fetching from API server has nothing more to give us.

  • Were the conformance and integration tests run?

https://ci.curoverse.com/view/CWL/job/arvados-cwl-conformance-tests/60/

  • Anything we want to add to workbench 2, like we did for workbench?

Workbench2 lacks the same display of workflow input/output annotated with links to collections. I don't think workflow running in workbench2 understands uuids, either. Added #14322

14322-cwl-uuid-input @ 45974ce224baf26d0a4c445dd1e9322193f1f64f

https://ci.curoverse.com/view/Developer/job/developer-run-tests/1130/

Actions #19

Updated by Eric Biagiotti almost 6 years ago

https://ci.curoverse.com/view/CWL/job/arvados-cwl-conformance-tests/60/

Python 3 conformance test 186 timed out. Not sure if this is related. Maybe a flaky test?

Workbench2 lacks the same display of workflow input/output annotated with links to collections. I don't think workflow running in workbench2 understands uuids, either. Added #14322

I think you meant to link to 14974 here.

Below is the output from running the example we discussed. Sorry for the confusing virtualenv name, but a-c-r and the arvados-python-client should be running the latest.

(arvmount-test-env) eric@ubuntu:~/.arvbox/arvbox/arvados/doc/user/cwl/bwa-mem$ /home/eric/arvmount-test-env/bin/arvados-cwl-runner 
--create-workflow bwa-mem.cwl bwa-mem-template.yml
2019-03-15 15:27:19 cwltool INFO: /home/eric/arvmount-test-env/bin/arvados-cwl-runner 1.3.1.20190315153329, 
arvados-python-client 1.3.1.20190313174948, cwltool 1.0.20181217162649
2019-03-15 15:27:19 cwltool INFO: Resolved 'bwa-mem.cwl' to 'file:///home/eric/.arvbox/arvbox/arvados/doc/user/cwl/bwa-mem/bwa-mem.cwl'
Traceback (most recent call last):
  File "/home/eric/arvmount-test-env/bin/arvados-cwl-runner", line 4, in <module>
    __import__('pkg_resources').run_script('arvados-cwl-runner==1.3.1.20190315153329', 'arvados-cwl-runner')
  File "/home/eric/arvmount-test-env/local/lib/python2.7/site-packages/pkg_resources/__init__.py", line 658, in run_script
    self.require(requires)[0].run_script(script_name, ns)
  File "/home/eric/arvmount-test-env/local/lib/python2.7/site-packages/pkg_resources/__init__.py", line 1445, in run_script
    exec(script_code, namespace, namespace)
  File "/home/eric/arvmount-test-env/local/lib/python2.7/site-packages/arvados_cwl_runner-1.3.1.20190315153329-py2.7.egg/
    EGG-INFO/scripts/arvados-cwl-runner", line 10, in <module>

  File "build/bdist.linux-x86_64/egg/arvados_cwl/__init__.py", line 327, in main
  File "/home/eric/arvmount-test-env/local/lib/python2.7/site-packages/cwltool-1.0.20181217162649-py2.7.egg/cwltool/main.py", line 785, in main
    secret_store=runtimeContext.secret_store)
  File "/home/eric/arvmount-test-env/local/lib/python2.7/site-packages/cwltool-1.0.20181217162649-py2.7.egg/cwltool/main.py", line 368, in init_job_order
    visit_class(job_order_object, ("File",), functools.partial(add_sizes, make_fs_access(input_basedir)))
  File "/home/eric/arvmount-test-env/local/lib/python2.7/site-packages/cwltool-1.0.20181217162649-py2.7.egg/cwltool/utils.py", line 214, in visit_class
    visit_class(rec[d], cls, op)
  File "/home/eric/arvmount-test-env/local/lib/python2.7/site-packages/cwltool-1.0.20181217162649-py2.7.egg/cwltool/utils.py", line 212, in visit_class
    op(rec)
  File "/home/eric/arvmount-test-env/local/lib/python2.7/site-packages/cwltool-1.0.20181217162649-py2.7.egg/cwltool/process.py", line 349, in add_sizes
    obj["size"] = fsaccess.size(obj["location"])
  File "build/bdist.linux-x86_64/egg/arvados_cwl/fsaccess.py", line 160, in size
  File "build/bdist.linux-x86_64/egg/arvados_cwl/fsaccess.py", line 103, in get_collection
  File "build/bdist.linux-x86_64/egg/arvados_cwl/fsaccess.py", line 78, in get
IOError: [Errno 2] Could not access collection '2463fa9efeb75e099685528b3b9071e0+438': Not Found
Actions #20

Updated by Eric Biagiotti almost 6 years ago

I was able to run the bwa-mem example with the different input files from the CLI and see the results on workbench. I was also able to run arvados-cwl-runner --create-workflow bwa-mem.cwl bwa-mem-template.yml and observe the correct reference file being populated in the container request, and under the status tab, the cwl.input.json text is correctly populated with working links for location and http://arvados.org/cwl#collectionUUID

Last thing, I would update the following comment: https://dev.arvados.org/projects/arvados/repository/revisions/14322-cwl-uuid-input/entry/apps/workbench/app/helpers/application_helper.rb#L681. Other than that, LGTM.

Actions #21

Updated by Peter Amstutz almost 6 years ago

  • Status changed from In Progress to Resolved
Actions

Also available in: Atom PDF