Feature #16957
closedcwltool/acr checks for circular dependencies
Added by Peter Amstutz about 4 years ago. Updated about 3 years ago.
Description
Extend the cwltool
workflow checker to detect if the workflow has a circular dependency (i.e. a step's inputs somehow depends on that same step's outputs). This should be a fatal error. Merge the changes into cwltool, see that a new cwltool is released, and update arvados-cwl-runner to use cwltool with the upgraded checker.
- Create a 3 step workflow that the output of the last step is included as an input to the first step, starting with cwl-hasher workflow we use for testing clusters
- Try to run in arvados.
- Change cwltool accordingly.
- Also catch the case where a step has an input field that depends on one of its own outputs
Related issues
Updated by Peter Amstutz about 4 years ago
- Related to Idea #16011: CWL support, docs, training, website added
Updated by Peter Amstutz over 3 years ago
- Target version set to 2021-03-31 sprint
Updated by Peter Amstutz over 3 years ago
- Target version changed from 2021-03-31 sprint to 2021-04-14 sprint
Updated by Peter Amstutz over 3 years ago
- Target version changed from 2021-04-14 sprint to 2021-04-28 bughunt sprint
Updated by Peter Amstutz over 3 years ago
- Target version deleted (
2021-04-28 bughunt sprint)
Updated by Peter Amstutz over 3 years ago
- Target version set to 2021-06-09 sprint
- Assigned To changed from Jiayong Li to Nico César
Updated by Peter Amstutz over 3 years ago
- Target version changed from 2021-06-09 sprint to 2021-06-23 sprint
Updated by Peter Amstutz over 3 years ago
- Target version changed from 2021-06-23 sprint to 2021-07-07 sprint
Updated by Peter Amstutz over 3 years ago
- Related to Idea #17848: CWL runner improvements added
Updated by Peter Amstutz over 3 years ago
- Related to deleted (Idea #16011: CWL support, docs, training, website)
Updated by Peter Amstutz over 3 years ago
- Target version changed from 2021-07-07 sprint to 2021-07-21 sprint
Updated by Peter Amstutz over 3 years ago
- Target version changed from 2021-07-21 sprint to 2021-08-04 sprint
Updated by Peter Amstutz over 3 years ago
- Assigned To deleted (
Nico César) - Subject changed from cwltool/acr checks for circular dependencies to cwltool/acr checks for circular dependencies
Updated by Peter Amstutz over 3 years ago
- Target version changed from 2021-08-04 sprint to 2021-08-18 sprint
Updated by Peter Amstutz over 3 years ago
- Target version changed from 2021-08-18 sprint to 2021-09-01 sprint
Updated by Peter Amstutz over 3 years ago
- Target version changed from 2021-09-01 sprint to 2021-09-15 sprint
Updated by Jiayong Li about 3 years ago
About step in self.steps cf. https://github.com/common-workflow-language/cwltool/blob/e37134a90ac7a7c18254e30cff16da590b45c6d7/cwltool/workflow.py#L126
Is there any way to find out whether a step is a command line tool or workflow?
For example,
ordereddict([('run', 'file:///home/jiayong/Code/work_scripts/cwl/depdency/ls-wf.cwl'), ('in', [ordereddict([('source', 'file:///home/jiayong/Code/work_scripts/cwl/depdency/subworkflow-ls-wf.cwl#cat/cattxt'), ('id', 'file:///home/jiayong/Code/work_scripts/cwl/depdency/subworkflow-ls-wf.cwl#ls-wf/txt')])]), ('out', ['file:///home/jiayong/Code/work_scripts/cwl/depdency/subworkflow-ls-wf.cwl#ls-wf/wctxt']), ('id', 'file:///home/jiayong/Code/work_scripts/cwl/depdency/subworkflow-ls-wf.cwl#ls-wf'), ('inputs', [ordereddict([('source', 'file:///home/jiayong/Code/work_scripts/cwl/depdency/subworkflow-ls-wf.cwl#cat/cattxt'), ('id', 'file:///home/jiayong/Code/work_scripts/cwl/depdency/subworkflow-ls-wf.cwl#ls-wf/txt'), ('type', 'File'), ('_tool_entry', ordereddict([('type', 'File'), ('id', 'file:///home/jiayong/Code/work_scripts/cwl/depdency/ls-wf.cwl#txt')]))])]), ('outputs', [ordereddict([('type', 'File'), ('outputSource', 'file:///home/jiayong/Code/work_scripts/cwl/depdency/ls-wf.cwl#wc/wctxt'), ('id', 'file:///home/jiayong/Code/work_scripts/cwl/depdency/subworkflow-ls-wf.cwl#ls-wf/wctxt'), ('_tool_entry', ordereddict([('type', 'File'), ('outputSource', 'file:///home/jiayong/Code/work_scripts/cwl/depdency/ls-wf.cwl#wc/wctxt'), ('id', 'file:///home/jiayong/Code/work_scripts/cwl/depdency/ls-wf.cwl#wctxt')]))])])])
It's not directly clear whether the above step is a workflow or not. ("outputSource" is an indicator but it's not very direct.)
Updated by Peter Amstutz about 3 years ago
Jiayong Li wrote:
About step in self.steps cf. https://github.com/common-workflow-language/cwltool/blob/e37134a90ac7a7c18254e30cff16da590b45c6d7/cwltool/workflow.py#L126
Is there any way to find out whether a step is a command line tool or workflow?
For example,
[...]It's not directly clear whether the above step is a workflow or not. ("outputSource" is an indicator but it's not very direct.)
Possibly an easier way to do this is to use the visit() method
You pass in a function, it will call that function with each tool or subworkflow that occurs in the workflow.
For each workflow, you just need to check for circular references in the steps. This should be straightforward: starting from the input parameters of the workflow, find what steps refer to those parameters, then find the output parameters, then find the steps that refer to the output parameters, and so forth. Each step you visit, you push it on to a "visited" list. If you find that you are re-visiting a step, that means you have found a cycle, and should raise an error.
Updated by Peter Amstutz about 3 years ago
- Target version changed from 2021-09-15 sprint to 2021-09-29 sprint
Updated by Peter Amstutz about 3 years ago
- Status changed from New to In Progress
Updated by Jiayong Li about 3 years ago
I tested adding a line before L135 https://github.com/common-workflow-language/cwltool/blob/e37134a90ac7a7c18254e30cff16da590b45c6d7/cwltool/workflow.py#L135
print(self.steps)
My test workflow is as follows
foo-wf.cwl
cwlVersion: v1.1 class: Workflow requirements: SubworkflowFeatureRequirement: {} inputs: txt: type: File outputs: wctxt: type: File outputSource: subworkflow-foo-wf/wctxt steps: cat: run: cat.cwl in: intxt: txt out: [cattxt] subworkflow-foo-wf: run: subworkflow-foo-wf.cwl in: txt: cat/cattxt out: [wctxt]
subworkflow-foo-wf.cwl
cwlVersion: v1.1 class: Workflow inputs: txt: type: File outputs: wctxt: type: File outputSource: wc/wctxt steps: cat: run: cat.cwl in: intxt: txt out: [cattxt] ls: run: ls.cwl in: intxt: cat/cattxt out: [lstxt] wc: run: wc.cwl in: intxt: ls/lstxt out: [wctxt]
Running this workflow with the added print(self.steps) yields
[<cwltool.workflow.WorkflowStep object at 0x7fa6afa07390>, <cwltool.workflow.WorkflowStep object at 0x7fa6aa2d16a0>, <cwltool.workflow.WorkflowStep object at 0x7fa6aa7206a0>] [<cwltool.workflow.WorkflowStep object at 0x7fa6aa253a58>, <cwltool.workflow.WorkflowStep object at 0x7fa6aa209048>]
This means print(self.steps) is called twice, once for subworkflow-foo-wf and once for foo-wf. My questions: are both of these two print commands run before executing the workflows? If we run circular dependency check on subworkflow-foo-wf, then execute it, and then run check on foo-wf and execute, this would defeat the purpose of check before run.
Updated by Peter Amstutz about 3 years ago
The static checks happen at loading time, not at execution time, so they happen before running, at this point it is only loading the description.
Updated by Peter Amstutz about 3 years ago
- Target version changed from 2021-09-29 sprint to 2021-10-13 sprint
Updated by Peter Amstutz about 3 years ago
- Target version changed from 2021-10-13 sprint to 2021-10-27 sprint
Updated by Peter Amstutz about 3 years ago
- Target version changed from 2021-10-27 sprint to 2021-11-10 sprint
Updated by Jiayong Li about 3 years ago
- Status changed from In Progress to Feedback
Updated by Peter Amstutz about 3 years ago
Just need to update the cwltool version used by a-c-r to get the new check.
Updated by Peter Amstutz about 3 years ago
- Target version changed from 2021-11-10 sprint to 2021-11-24 sprint
Updated by Peter Amstutz about 3 years ago
- Status changed from Feedback to Resolved