Feature #16957

cwltool/acr checks for circular dependencies

Added by Peter Amstutz 12 months ago. Updated 3 days ago.

Status:
In Progress
Priority:
Normal
Assigned To:
Category:
CWL
Target version:
Start date:
Due date:
% Done:

0%

Estimated time:
(Total: 0.00 h)
Story points:
-

Description

Extend the cwltool workflow checker to detect if the workflow has a circular dependency (i.e. a step's inputs somehow depends on that same step's outputs). This should be a fatal error. Merge the changes into cwltool, see that a new cwltool is released, and update arvados-cwl-runner to use cwltool with the upgraded checker.

Tasks:
  • Create a 3 step workflow that the output of the last step is included as an input to the first step, starting with cwl-hasher workflow we use for testing clusters
  • Try to run in arvados.
  • Change cwltool accordingly.
  • Also catch the case where a step has an input field that depends on one of its own outputs

Subtasks

Task #17455: ReviewNewPeter Amstutz


Related issues

Related to Arvados Epics - Story #17848: Improve a-c-r usabilityIn Progress07/01/202109/30/2021

History

#1 Updated by Peter Amstutz 12 months ago

  • Description updated (diff)

#2 Updated by Peter Amstutz 12 months ago

  • Description updated (diff)

#3 Updated by Peter Amstutz 12 months ago

  • Description updated (diff)

#4 Updated by Peter Amstutz 12 months ago

  • Related to Story #16011: CWL support, docs, training, website added

#5 Updated by Jiayong Li 11 months ago

  • Assigned To set to Jiayong Li

#6 Updated by Peter Amstutz 7 months ago

  • Target version set to 2021-03-31 sprint

#7 Updated by Peter Amstutz 6 months ago

  • Target version changed from 2021-03-31 sprint to 2021-04-14 sprint

#8 Updated by Peter Amstutz 5 months ago

  • Target version changed from 2021-04-14 sprint to 2021-04-28 bughunt sprint

#9 Updated by Peter Amstutz 5 months ago

  • Target version deleted (2021-04-28 bughunt sprint)

#10 Updated by Peter Amstutz 4 months ago

  • Target version set to 2021-06-09 sprint
  • Assigned To changed from Jiayong Li to Nico César

#11 Updated by Nico César 4 months ago

  • Description updated (diff)

#12 Updated by Peter Amstutz 4 months ago

  • Target version changed from 2021-06-09 sprint to 2021-06-23 sprint

#13 Updated by Peter Amstutz 3 months ago

  • Target version changed from 2021-06-23 sprint to 2021-07-07 sprint

#14 Updated by Peter Amstutz 3 months ago

#15 Updated by Peter Amstutz 3 months ago

  • Related to deleted (Story #16011: CWL support, docs, training, website)

#16 Updated by Peter Amstutz 3 months ago

  • Target version changed from 2021-07-07 sprint to 2021-07-21 sprint

#17 Updated by Peter Amstutz 3 months ago

  • Target version changed from 2021-07-21 sprint to 2021-08-04 sprint

#18 Updated by Peter Amstutz 2 months ago

  • Assigned To deleted (Nico César)
  • Subject changed from cwltool/acr checks for circular dependencies to cwltool/acr checks for circular dependencies

#19 Updated by Peter Amstutz 2 months ago

  • Target version changed from 2021-08-04 sprint to 2021-08-18 sprint

#20 Updated by Ward Vandewege about 2 months ago

  • Description updated (diff)

#21 Updated by Peter Amstutz about 2 months ago

  • Target version changed from 2021-08-18 sprint to 2021-09-01 sprint

#22 Updated by Peter Amstutz about 1 month ago

  • Target version changed from 2021-09-01 sprint to 2021-09-15 sprint

#23 Updated by Jiayong Li about 1 month ago

  • Assigned To set to Jiayong Li

#24 Updated by Jiayong Li 24 days ago

  • Description updated (diff)

#25 Updated by Jiayong Li 20 days ago

About step in self.steps cf. https://github.com/common-workflow-language/cwltool/blob/e37134a90ac7a7c18254e30cff16da590b45c6d7/cwltool/workflow.py#L126

Is there any way to find out whether a step is a command line tool or workflow?

For example,

ordereddict([('run', 'file:///home/jiayong/Code/work_scripts/cwl/depdency/ls-wf.cwl'), ('in', [ordereddict([('source', 'file:///home/jiayong/Code/work_scripts/cwl/depdency/subworkflow-ls-wf.cwl#cat/cattxt'), ('id', 'file:///home/jiayong/Code/work_scripts/cwl/depdency/subworkflow-ls-wf.cwl#ls-wf/txt')])]), ('out', ['file:///home/jiayong/Code/work_scripts/cwl/depdency/subworkflow-ls-wf.cwl#ls-wf/wctxt']), ('id', 'file:///home/jiayong/Code/work_scripts/cwl/depdency/subworkflow-ls-wf.cwl#ls-wf'), ('inputs', [ordereddict([('source', 'file:///home/jiayong/Code/work_scripts/cwl/depdency/subworkflow-ls-wf.cwl#cat/cattxt'), ('id', 'file:///home/jiayong/Code/work_scripts/cwl/depdency/subworkflow-ls-wf.cwl#ls-wf/txt'), ('type', 'File'), ('_tool_entry', ordereddict([('type', 'File'), ('id', 'file:///home/jiayong/Code/work_scripts/cwl/depdency/ls-wf.cwl#txt')]))])]), ('outputs', [ordereddict([('type', 'File'), ('outputSource', 'file:///home/jiayong/Code/work_scripts/cwl/depdency/ls-wf.cwl#wc/wctxt'), ('id', 'file:///home/jiayong/Code/work_scripts/cwl/depdency/subworkflow-ls-wf.cwl#ls-wf/wctxt'), ('_tool_entry', ordereddict([('type', 'File'), ('outputSource', 'file:///home/jiayong/Code/work_scripts/cwl/depdency/ls-wf.cwl#wc/wctxt'), ('id', 'file:///home/jiayong/Code/work_scripts/cwl/depdency/ls-wf.cwl#wctxt')]))])])]) 

It's not directly clear whether the above step is a workflow or not. ("outputSource" is an indicator but it's not very direct.)

#26 Updated by Peter Amstutz 20 days ago

Jiayong Li wrote:

About step in self.steps cf. https://github.com/common-workflow-language/cwltool/blob/e37134a90ac7a7c18254e30cff16da590b45c6d7/cwltool/workflow.py#L126

Is there any way to find out whether a step is a command line tool or workflow?

For example,
[...]

It's not directly clear whether the above step is a workflow or not. ("outputSource" is an indicator but it's not very direct.)

Possibly an easier way to do this is to use the visit() method

https://github.com/common-workflow-language/cwltool/blob/e37134a90ac7a7c18254e30cff16da590b45c6d7/cwltool/workflow.py#L175

You pass in a function, it will call that function with each tool or subworkflow that occurs in the workflow.

For each workflow, you just need to check for circular references in the steps. This should be straightforward: starting from the input parameters of the workflow, find what steps refer to those parameters, then find the output parameters, then find the steps that refer to the output parameters, and so forth. Each step you visit, you push it on to a "visited" list. If you find that you are re-visiting a step, that means you have found a cycle, and should raise an error.

#27 Updated by Peter Amstutz 8 days ago

  • Target version changed from 2021-09-15 sprint to 2021-09-29 sprint

#28 Updated by Peter Amstutz 8 days ago

  • Status changed from New to In Progress

#29 Updated by Jiayong Li 7 days ago

I tested adding a line before L135 https://github.com/common-workflow-language/cwltool/blob/e37134a90ac7a7c18254e30cff16da590b45c6d7/cwltool/workflow.py#L135

print(self.steps)

My test workflow is as follows
foo-wf.cwl

cwlVersion: v1.1
class: Workflow
requirements:
  SubworkflowFeatureRequirement: {}

inputs:
  txt:
    type: File

outputs:
  wctxt:
    type: File
    outputSource: subworkflow-foo-wf/wctxt

steps:
  cat:
    run: cat.cwl
    in:
      intxt: txt
    out: [cattxt]
  subworkflow-foo-wf:
    run: subworkflow-foo-wf.cwl
    in:
      txt: cat/cattxt
    out: [wctxt]

subworkflow-foo-wf.cwl

cwlVersion: v1.1
class: Workflow

inputs:
  txt:
    type: File

outputs:
  wctxt:
    type: File
    outputSource: wc/wctxt

steps:
  cat:
    run: cat.cwl
    in:
      intxt: txt
    out: [cattxt]
  ls:
    run: ls.cwl
    in:
      intxt: cat/cattxt
    out: [lstxt]
  wc:
    run: wc.cwl
    in:
      intxt: ls/lstxt
    out: [wctxt]

Running this workflow with the added print(self.steps) yields

[<cwltool.workflow.WorkflowStep object at 0x7fa6afa07390>, <cwltool.workflow.WorkflowStep object at 0x7fa6aa2d16a0>, <cwltool.workflow.WorkflowStep object at 0x7fa6aa7206a0>]
[<cwltool.workflow.WorkflowStep object at 0x7fa6aa253a58>, <cwltool.workflow.WorkflowStep object at 0x7fa6aa209048>]

This means print(self.steps) is called twice, once for subworkflow-foo-wf and once for foo-wf. My questions: are both of these two print commands run before executing the workflows? If we run circular dependency check on subworkflow-foo-wf, then execute it, and then run check on foo-wf and execute, this would defeat the purpose of check before run.

#30 Updated by Peter Amstutz 3 days ago

The static checks happen at loading time, not at execution time, so they happen before running, at this point it is only loading the description.

Also available in: Atom PDF