Bug #13201
closedarvados-cwl-runner workflow checks are expensive and repeated twice when run with --submit
Description
When run with `--submit`, arvados-cwl-runner (a-c-r) first performs a validation ("workflow checker") step, which in the case of complex workflows can be costly. On one of our current workflows we have to wait close to 40m before the runner container request is even created:
$ arvados-cwl-runner --eval-timeout 10 --submit-runner-ram 16384 --submit-runner-image mercury/arvados-jobs:sandbox jsdebug21 --submit --no-wait --api=containers workflows/gatk-4.0.0.0-genotype-gvcf.cwl 20180303-1100-a-c-r-submit-no-wait-21.gvcf-output.vcf_input.json 2018-03-08 23:08:31 cwltool INFO: /usr/bin/arvados-cwl-runner 1.0.20180216164101, arvados-python-client 0.1.20171211211613, cwltool 1.0.20180130110340 2018-03-08 23:08:31 cwltool INFO: Resolved 'workflows/gatk-4.0.0.0-genotype-gvcf.cwl' to 'file:///home/mercury/checkouts/arvados-pipelines/cwl/workflows/gatk-4.0.0.0-genotype-g vcf.cwl' 2018-03-08 23:09:39 cwltool WARNING: Workflow checker warning: workflows/gatk-4.0.0.0-genotype-gvcf.cwl:90:9: Source 'variant-index' of type {"items": ["null", "File"], "type": "array"} is partially incompatible workflows/gatk-4.0.0.0-genotype-gvcf.cwl:99:7: with sink 'secondary_files' of type {"items": ["File", {"type": "array", "items": "File"}], "type": "array"} 2018-03-08 23:09:43 cwltool WARNING: Workflow checker warning: workflows/gatk-4.0.0.0-genotype-gvcf.cwl:90:9: Source 'variant-index' of type {"items": ["null", "File"], "type": "array"} is partially incompatible workflows/gatk-4.0.0.0-genotype-gvcf.cwl:99:7: with sink 'secondary_files' of type {"items": ["File", {"type": "array", "items": "File"}], "type": "array"} 2018-03-08 23:45:14 arvados.cwl-runner INFO: [container gatk-4.0.0.0-genotype-gvcf.cwl] submitted container ncucu-xvhdp-as06624s5xo96gn ncucu-xvhdp-as06624s5xo96gn 2018-03-08 23:45:14 cwltool INFO: Final process status is success
Once a-c-r on the RunnerContainer starts, it performs those same workflow checks again. It might make sense to do that if these checks are cheap, as I guess the submitted workflow may have been slightly modified (e.g. to map inputs to keep locators, etc) and this could catch errors. However, since they apparently are not that cheap, perhaps an optimisation could be made so that a-c-r in the runnercontainer trusts that the workflow has already been validated before submission?
In the case of the above example, the runner container stderr from a-c-r begins with:
2018-03-08T23:46:18.223467681Z cwltool INFO: /usr/bin/arvados-cwl-runner 1.0.20180221170604, arvados-python-client 0.1.20180221170604, cwltool 1.0.20180130110340 2018-03-08T23:46:18.228709270Z cwltool INFO: Resolved '/var/lib/cwl/workflow.json#main' to 'file:///var/lib/cwl/workflow.json#main' 2018-03-08T23:47:17.319055335Z cwltool WARNING: Workflow checker warning: 2018-03-08T23:47:17.319055335Z ../../lib/cwl/workflow.json:1:59914: Source 'variant-index' of type {"items": ["null", "File"], "type": "array"} is partially incompatible 2018-03-08T23:47:17.319055335Z ../../lib/cwl/workflow.json:1:60174: with sink 'secondary_files' of type {"items": ["File", {"items": "File", "type": "array"}], "type": "array"} 2018-03-08T23:47:27.223296520Z cwltool WARNING: Workflow checker warning: 2018-03-08T23:47:27.223296520Z ../../lib/cwl/workflow.json:1:59914: Source 'variant-index' of type {"items": ["null", "File"], "type": "array"} is partially incompatible 2018-03-08T23:47:27.223296520Z ../../lib/cwl/workflow.json:1:60174: with sink 'secondary_files' of type {"items": ["File", {"items": "File", "type": "array"}], "type": "array"} 2018-03-09T00:23:17.752082746Z cwltool INFO: [workflow workflow.json#main] start