Bug #19571
closed
arvados-cwl-runner scattering bug
Added by Tom Schoonjans over 2 years ago.
Updated about 2 years ago.
Description
Hi all,
The attached simple CWL workflow and inputs file returns the following error when run with arvados-cwl-runner:
$ arvados-cwl-runner workflow-fixed.cwl cwl.inputs.json
INFO /usr/bin/arvados-cwl-runner 2.4.2, arvados-python-client 2.4.2, cwltool 3.1.20220623174452
INFO Resolved 'workflow-fixed.cwl' to 'file:///home/tom/temp/crunch-failure/workflow.cwl'
INFO Using cluster xxxxx (https://xxxxx.yyyyy.com/)
ERROR Input object failed validation:
identifier field '['string one', 'second string', 'three three three']' must be a string
This was surprising given that this workflow works just fine with cwltool (3.1.20220802125926).
We suspect that this may have to do with us calling the same scattered step twice, but unsure exactly as to why.
Files
- Target version set to 2022-10-12 sprint
- Target version changed from 2022-10-12 sprint to 2022-10-26 sprint
- Assigned To set to Peter Amstutz
- Target version changed from 2022-10-26 sprint to 2022-11-09 sprint
- Related to Bug #19678: arvados-cwl-runner: id name must be a string added
- Related to deleted (Bug #19678: arvados-cwl-runner: id name must be a string)
- Has duplicate Bug #19678: arvados-cwl-runner: id name must be a string added
I think this is actually the same bug as was reported again in #19678, the having a input parameter named name
runs into trouble.
Having an input parameter with an id of 'name' does work as long as the type is "string" - the problem occurs when the type is anything other than "string".
Note also that this does not only apply to input parameters but to field names as well.
i.e. this is valid:
'''
type:
type: record
fields:
- name: name
type: string
'''
but this is rejected:
'''
type:
type: record
fields:
- name: name
type: boolean
'''
We have confirmed that simply removing the two places in schema_salad where it explicitly throws a ValidationException when it finds a value for a name field that is not of type str solves this problem, and arvados-cwl-runner still works for all of our test cases. We have not yet investigated why this problem appears to be specific to arvados-cwl-runner and is not an issue for cwltool. That seems a bit surprising since they both use schema_salad, but perhaps they are invoking it in a different way?
For our use-cases the attached patch (simply removing the two raise statements) completely solves this issue.
Hey Josh! Good to hear from you.
The way that arvados-cwl-runner
reads the workflow, packs it, and re-reads the packed version has occasionally turned up bugs on the second trip through the sausage machine (parsing and loading). I'll give this a look.
19678-job-loader @ e2267bd99209651c61425f335230e515421b2ef4
- Fix for parameters called 'name'
- Also fix regression involving default file references appearing in
nested processes (inline declaration of a tool within a workflow).
- Also fixed some dependency issues preventing arvados/jobs developer
image from working.
developer-run-tests: #3347
- Status changed from New to In Progress
- Status changed from In Progress to Resolved
Also available in: Atom
PDF