Feature #14322
closed[CWL] Accept collection uuid in input
Description
Arvados-cwl-runner should allow users to provide uuids in input documents. For example:
{ "file1": { "class": "File", "location": "keep:zzzzz-4zz18-zzzzzzzzzzzzz/file1.txt" } }
Arvados-cwl-runner should replace the value in "location" with the portable data hash and record the UUID in the "arv:collection" field. This applies to both File and Directory objects. Implementation note: this should probably happen somewhere in upload_workflow_deps()
which is responsible for updating File references for uploaded files.
{ "file1": { "class": "File", "location": "keep:abc+123/file1.txt", "arv:collection": "zzzzz-4zz18-zzzzzzzzzzzzz/file1.txt" } }
Users may provide both "location" by PDH and "arv:collection" in the input. If both portable data hash and arv:collection are present, the portable data hash will take precedence. Print a warning (or error???) if the collection uuid is not readable or the does not match the PDH that was provided.
When constructing container requests, if "arv:collection" is known for a File or Directory object, include it in the mount object alongside the portable data hash.
Implementation note: code paths that test startswith("keep:")
will need to be updated to distinguish keep identifiers by UUID or PDH. (Alternately, we could use a different uri scheme for uuids).
Updated by Peter Amstutz about 6 years ago
- Status changed from New to In Progress
Updated by Peter Amstutz about 6 years ago
- Subject changed from [CWL] Accept collection uuid to [CWL] Accept collection uuid
- Status changed from In Progress to New
Updated by Peter Amstutz about 6 years ago
- Subject changed from [CWL] Accept collection uuid to [CWL] Accept collection uuid in input
Updated by Tom Clegg about 6 years ago
- Related to Feature #11442: [CWL] Resolve symbolic names to collections added
Updated by Tom Clegg about 6 years ago
- Related to Feature #14323: [API] Accept container mounts that specify both uuid and portable_data_hash added
Updated by Tom Morris about 6 years ago
- Target version changed from To Be Groomed to Arvados Future Sprints
- Story points set to 2.0
Updated by Tom Morris almost 6 years ago
- Target version changed from Arvados Future Sprints to 2019-03-13 Sprint
Updated by Peter Amstutz almost 6 years ago
- Status changed from New to In Progress
Updated by Peter Amstutz almost 6 years ago
- Target version changed from 2019-03-13 Sprint to 2019-03-27 Sprint
Updated by Peter Amstutz almost 6 years ago
14322-cwl-uuid-input 3e9cb56544f3acecf6aa2bf967263600abf0c584
- Accept 'location: keep:zzzzz-4zz18-zzzzzzzzzzzzzzz' and convert it to portable data hash
- Record uuid in arv:collectionUUID field
- Input can also provide both 'location: keep:PDH' and 'collectionUUID: zzz', this will check that the collectionUUID has the expected PDH.
https://ci.curoverse.com/view/Developer/job/developer-run-tests/1116/
Looking at it this morning I realized I should probably update the documentation as well.
Updated by Peter Amstutz almost 6 years ago
14322-cwl-uuid-input @ b65b36691117322a34170f28ae1997073f2829f0
https://ci.curoverse.com/view/Developer/job/developer-run-tests/1121/
- Accept 'location: keep:zzzzz-4zz18-zzzzzzzzzzzzzzz' and convert it to portable data hash
- Record uuid in arv:collectionUUID field
- Input can also provide both 'location: keep:PDH' and 'collectionUUID: zzz', this will check that the collectionUUID has the expected PDH.
- Update docs
- CollectionFsAccess accepts keep:uuid URIs
- Workbench updated to use collectionUUID field
Updated by Eric Biagiotti almost 6 years ago
- Looks like the python 2 versions of
test_submit_mismatched_uuid_inputs
andtest_submit_unknown_uuid_inputs
have unicode problems. - Not sure why the following needs a while loop. Why doesn't
arvrunner.api.collections().list()
return all the collections we need? https://dev.arvados.org/projects/arvados/repository/revisions/14322-cwl-uuid-input/entry/sdk/cwl/arvados_cwl/runner.py#L176. Maybe add a comment here? - Were the conformance and integration tests run?
- Anything we want to add to workbench 2, like we did for workbench?
Updated by Peter Amstutz almost 6 years ago
Eric Biagiotti wrote:
- Looks like the python 2 versions of
test_submit_mismatched_uuid_inputs
andtest_submit_unknown_uuid_inputs
have unicode problems.
Fixed.
- Not sure why the following needs a while loop. Why doesn't
arvrunner.api.collections().list()
return all the collections we need? https://dev.arvados.org/projects/arvados/repository/revisions/14322-cwl-uuid-input/entry/sdk/cwl/arvados_cwl/runner.py#L176. Maybe add a comment here?
For a large number of fetch_uuids, API server may limit response size, so we need to keep fetching from API server has nothing more to give us.
- Were the conformance and integration tests run?
https://ci.curoverse.com/view/CWL/job/arvados-cwl-conformance-tests/60/
- Anything we want to add to workbench 2, like we did for workbench?
Workbench2 lacks the same display of workflow input/output annotated with links to collections. I don't think workflow running in workbench2 understands uuids, either. Added #14322
14322-cwl-uuid-input @ 45974ce224baf26d0a4c445dd1e9322193f1f64f
https://ci.curoverse.com/view/Developer/job/developer-run-tests/1130/
Updated by Eric Biagiotti almost 6 years ago
https://ci.curoverse.com/view/CWL/job/arvados-cwl-conformance-tests/60/
Python 3 conformance test 186 timed out. Not sure if this is related. Maybe a flaky test?
Workbench2 lacks the same display of workflow input/output annotated with links to collections. I don't think workflow running in workbench2 understands uuids, either. Added #14322
I think you meant to link to 14974 here.
Below is the output from running the example we discussed. Sorry for the confusing virtualenv name, but a-c-r and the arvados-python-client should be running the latest.
(arvmount-test-env) eric@ubuntu:~/.arvbox/arvbox/arvados/doc/user/cwl/bwa-mem$ /home/eric/arvmount-test-env/bin/arvados-cwl-runner --create-workflow bwa-mem.cwl bwa-mem-template.yml 2019-03-15 15:27:19 cwltool INFO: /home/eric/arvmount-test-env/bin/arvados-cwl-runner 1.3.1.20190315153329, arvados-python-client 1.3.1.20190313174948, cwltool 1.0.20181217162649 2019-03-15 15:27:19 cwltool INFO: Resolved 'bwa-mem.cwl' to 'file:///home/eric/.arvbox/arvbox/arvados/doc/user/cwl/bwa-mem/bwa-mem.cwl' Traceback (most recent call last): File "/home/eric/arvmount-test-env/bin/arvados-cwl-runner", line 4, in <module> __import__('pkg_resources').run_script('arvados-cwl-runner==1.3.1.20190315153329', 'arvados-cwl-runner') File "/home/eric/arvmount-test-env/local/lib/python2.7/site-packages/pkg_resources/__init__.py", line 658, in run_script self.require(requires)[0].run_script(script_name, ns) File "/home/eric/arvmount-test-env/local/lib/python2.7/site-packages/pkg_resources/__init__.py", line 1445, in run_script exec(script_code, namespace, namespace) File "/home/eric/arvmount-test-env/local/lib/python2.7/site-packages/arvados_cwl_runner-1.3.1.20190315153329-py2.7.egg/ EGG-INFO/scripts/arvados-cwl-runner", line 10, in <module> File "build/bdist.linux-x86_64/egg/arvados_cwl/__init__.py", line 327, in main File "/home/eric/arvmount-test-env/local/lib/python2.7/site-packages/cwltool-1.0.20181217162649-py2.7.egg/cwltool/main.py", line 785, in main secret_store=runtimeContext.secret_store) File "/home/eric/arvmount-test-env/local/lib/python2.7/site-packages/cwltool-1.0.20181217162649-py2.7.egg/cwltool/main.py", line 368, in init_job_order visit_class(job_order_object, ("File",), functools.partial(add_sizes, make_fs_access(input_basedir))) File "/home/eric/arvmount-test-env/local/lib/python2.7/site-packages/cwltool-1.0.20181217162649-py2.7.egg/cwltool/utils.py", line 214, in visit_class visit_class(rec[d], cls, op) File "/home/eric/arvmount-test-env/local/lib/python2.7/site-packages/cwltool-1.0.20181217162649-py2.7.egg/cwltool/utils.py", line 212, in visit_class op(rec) File "/home/eric/arvmount-test-env/local/lib/python2.7/site-packages/cwltool-1.0.20181217162649-py2.7.egg/cwltool/process.py", line 349, in add_sizes obj["size"] = fsaccess.size(obj["location"]) File "build/bdist.linux-x86_64/egg/arvados_cwl/fsaccess.py", line 160, in size File "build/bdist.linux-x86_64/egg/arvados_cwl/fsaccess.py", line 103, in get_collection File "build/bdist.linux-x86_64/egg/arvados_cwl/fsaccess.py", line 78, in get IOError: [Errno 2] Could not access collection '2463fa9efeb75e099685528b3b9071e0+438': Not Found
Updated by Eric Biagiotti almost 6 years ago
I was able to run the bwa-mem example with the different input files from the CLI and see the results on workbench. I was also able to run arvados-cwl-runner --create-workflow bwa-mem.cwl bwa-mem-template.yml
and observe the correct reference file being populated in the container request, and under the status tab, the cwl.input.json
text is correctly populated with working links for location
and http://arvados.org/cwl#collectionUUID
Last thing, I would update the following comment: https://dev.arvados.org/projects/arvados/repository/revisions/14322-cwl-uuid-input/entry/apps/workbench/app/helpers/application_helper.rb#L681. Other than that, LGTM.
Updated by Peter Amstutz almost 6 years ago
- Status changed from In Progress to Resolved