Feature #16957

cwltool/acr checks for circular dependencies

Added by Peter Amstutz over 1 year ago. Updated 2 months ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
CWL
Target version:
Start date:
Due date:
% Done:

0%

Estimated time:
Story points:
-
Release relationship:
Auto

Description

Extend the cwltool workflow checker to detect if the workflow has a circular dependency (i.e. a step's inputs somehow depends on that same step's outputs). This should be a fatal error. Merge the changes into cwltool, see that a new cwltool is released, and update arvados-cwl-runner to use cwltool with the upgraded checker.

Tasks:
  • Create a 3 step workflow that the output of the last step is included as an input to the first step, starting with cwl-hasher workflow we use for testing clusters
  • Try to run in arvados.
  • Change cwltool accordingly.
  • Also catch the case where a step has an input field that depends on one of its own outputs

Related issues

Related to Arvados Epics - Story #17848: Improve a-c-r usabilityIn Progress07/01/202103/31/2022

History

#1 Updated by Peter Amstutz over 1 year ago

  • Description updated (diff)

#2 Updated by Peter Amstutz over 1 year ago

  • Description updated (diff)

#3 Updated by Peter Amstutz over 1 year ago

  • Description updated (diff)

#4 Updated by Peter Amstutz over 1 year ago

  • Related to Story #16011: CWL support, docs, training, website added

#5 Updated by Jiayong Li about 1 year ago

  • Assigned To set to Jiayong Li

#6 Updated by Peter Amstutz 11 months ago

  • Target version set to 2021-03-31 sprint

#7 Updated by Peter Amstutz 10 months ago

  • Target version changed from 2021-03-31 sprint to 2021-04-14 sprint

#8 Updated by Peter Amstutz 10 months ago

  • Target version changed from 2021-04-14 sprint to 2021-04-28 bughunt sprint

#9 Updated by Peter Amstutz 10 months ago

  • Target version deleted (2021-04-28 bughunt sprint)

#10 Updated by Peter Amstutz 8 months ago

  • Target version set to 2021-06-09 sprint
  • Assigned To changed from Jiayong Li to Nico César

#11 Updated by Nico César 8 months ago

  • Description updated (diff)

#12 Updated by Peter Amstutz 8 months ago

  • Target version changed from 2021-06-09 sprint to 2021-06-23 sprint

#13 Updated by Peter Amstutz 7 months ago

  • Target version changed from 2021-06-23 sprint to 2021-07-07 sprint

#14 Updated by Peter Amstutz 7 months ago

#15 Updated by Peter Amstutz 7 months ago

  • Related to deleted (Story #16011: CWL support, docs, training, website)

#16 Updated by Peter Amstutz 7 months ago

  • Target version changed from 2021-07-07 sprint to 2021-07-21 sprint

#17 Updated by Peter Amstutz 7 months ago

  • Target version changed from 2021-07-21 sprint to 2021-08-04 sprint

#18 Updated by Peter Amstutz 6 months ago

  • Assigned To deleted (Nico César)
  • Subject changed from cwltool/acr checks for circular dependencies to cwltool/acr checks for circular dependencies

#19 Updated by Peter Amstutz 6 months ago

  • Target version changed from 2021-08-04 sprint to 2021-08-18 sprint

#20 Updated by Ward Vandewege 6 months ago

  • Description updated (diff)

#21 Updated by Peter Amstutz 6 months ago

  • Target version changed from 2021-08-18 sprint to 2021-09-01 sprint

#22 Updated by Peter Amstutz 6 months ago

  • Target version changed from 2021-09-01 sprint to 2021-09-15 sprint

#23 Updated by Jiayong Li 6 months ago

  • Assigned To set to Jiayong Li

#24 Updated by Jiayong Li 5 months ago

  • Description updated (diff)

#25 Updated by Jiayong Li 5 months ago

About step in self.steps cf. https://github.com/common-workflow-language/cwltool/blob/e37134a90ac7a7c18254e30cff16da590b45c6d7/cwltool/workflow.py#L126

Is there any way to find out whether a step is a command line tool or workflow?

For example,

ordereddict([('run', 'file:///home/jiayong/Code/work_scripts/cwl/depdency/ls-wf.cwl'), ('in', [ordereddict([('source', 'file:///home/jiayong/Code/work_scripts/cwl/depdency/subworkflow-ls-wf.cwl#cat/cattxt'), ('id', 'file:///home/jiayong/Code/work_scripts/cwl/depdency/subworkflow-ls-wf.cwl#ls-wf/txt')])]), ('out', ['file:///home/jiayong/Code/work_scripts/cwl/depdency/subworkflow-ls-wf.cwl#ls-wf/wctxt']), ('id', 'file:///home/jiayong/Code/work_scripts/cwl/depdency/subworkflow-ls-wf.cwl#ls-wf'), ('inputs', [ordereddict([('source', 'file:///home/jiayong/Code/work_scripts/cwl/depdency/subworkflow-ls-wf.cwl#cat/cattxt'), ('id', 'file:///home/jiayong/Code/work_scripts/cwl/depdency/subworkflow-ls-wf.cwl#ls-wf/txt'), ('type', 'File'), ('_tool_entry', ordereddict([('type', 'File'), ('id', 'file:///home/jiayong/Code/work_scripts/cwl/depdency/ls-wf.cwl#txt')]))])]), ('outputs', [ordereddict([('type', 'File'), ('outputSource', 'file:///home/jiayong/Code/work_scripts/cwl/depdency/ls-wf.cwl#wc/wctxt'), ('id', 'file:///home/jiayong/Code/work_scripts/cwl/depdency/subworkflow-ls-wf.cwl#ls-wf/wctxt'), ('_tool_entry', ordereddict([('type', 'File'), ('outputSource', 'file:///home/jiayong/Code/work_scripts/cwl/depdency/ls-wf.cwl#wc/wctxt'), ('id', 'file:///home/jiayong/Code/work_scripts/cwl/depdency/ls-wf.cwl#wctxt')]))])])]) 

It's not directly clear whether the above step is a workflow or not. ("outputSource" is an indicator but it's not very direct.)

#26 Updated by Peter Amstutz 5 months ago

Jiayong Li wrote:

About step in self.steps cf. https://github.com/common-workflow-language/cwltool/blob/e37134a90ac7a7c18254e30cff16da590b45c6d7/cwltool/workflow.py#L126

Is there any way to find out whether a step is a command line tool or workflow?

For example,
[...]

It's not directly clear whether the above step is a workflow or not. ("outputSource" is an indicator but it's not very direct.)

Possibly an easier way to do this is to use the visit() method

https://github.com/common-workflow-language/cwltool/blob/e37134a90ac7a7c18254e30cff16da590b45c6d7/cwltool/workflow.py#L175

You pass in a function, it will call that function with each tool or subworkflow that occurs in the workflow.

For each workflow, you just need to check for circular references in the steps. This should be straightforward: starting from the input parameters of the workflow, find what steps refer to those parameters, then find the output parameters, then find the steps that refer to the output parameters, and so forth. Each step you visit, you push it on to a "visited" list. If you find that you are re-visiting a step, that means you have found a cycle, and should raise an error.

#27 Updated by Peter Amstutz 5 months ago

  • Target version changed from 2021-09-15 sprint to 2021-09-29 sprint

#28 Updated by Peter Amstutz 5 months ago

  • Status changed from New to In Progress

#29 Updated by Jiayong Li 4 months ago

I tested adding a line before L135 https://github.com/common-workflow-language/cwltool/blob/e37134a90ac7a7c18254e30cff16da590b45c6d7/cwltool/workflow.py#L135

print(self.steps)

My test workflow is as follows
foo-wf.cwl

cwlVersion: v1.1
class: Workflow
requirements:
  SubworkflowFeatureRequirement: {}

inputs:
  txt:
    type: File

outputs:
  wctxt:
    type: File
    outputSource: subworkflow-foo-wf/wctxt

steps:
  cat:
    run: cat.cwl
    in:
      intxt: txt
    out: [cattxt]
  subworkflow-foo-wf:
    run: subworkflow-foo-wf.cwl
    in:
      txt: cat/cattxt
    out: [wctxt]

subworkflow-foo-wf.cwl

cwlVersion: v1.1
class: Workflow

inputs:
  txt:
    type: File

outputs:
  wctxt:
    type: File
    outputSource: wc/wctxt

steps:
  cat:
    run: cat.cwl
    in:
      intxt: txt
    out: [cattxt]
  ls:
    run: ls.cwl
    in:
      intxt: cat/cattxt
    out: [lstxt]
  wc:
    run: wc.cwl
    in:
      intxt: ls/lstxt
    out: [wctxt]

Running this workflow with the added print(self.steps) yields

[<cwltool.workflow.WorkflowStep object at 0x7fa6afa07390>, <cwltool.workflow.WorkflowStep object at 0x7fa6aa2d16a0>, <cwltool.workflow.WorkflowStep object at 0x7fa6aa7206a0>]
[<cwltool.workflow.WorkflowStep object at 0x7fa6aa253a58>, <cwltool.workflow.WorkflowStep object at 0x7fa6aa209048>]

This means print(self.steps) is called twice, once for subworkflow-foo-wf and once for foo-wf. My questions: are both of these two print commands run before executing the workflows? If we run circular dependency check on subworkflow-foo-wf, then execute it, and then run check on foo-wf and execute, this would defeat the purpose of check before run.

#30 Updated by Peter Amstutz 4 months ago

The static checks happen at loading time, not at execution time, so they happen before running, at this point it is only loading the description.

#31 Updated by Peter Amstutz 4 months ago

  • Target version changed from 2021-09-29 sprint to 2021-10-13 sprint

#32 Updated by Peter Amstutz 4 months ago

  • Target version changed from 2021-10-13 sprint to 2021-10-27 sprint

#33 Updated by Peter Amstutz 3 months ago

  • Target version changed from 2021-10-27 sprint to 2021-11-10 sprint

#34 Updated by Jiayong Li 3 months ago

  • Status changed from In Progress to Feedback

#35 Updated by Peter Amstutz 3 months ago

Just need to update the cwltool version used by a-c-r to get the new check.

#36 Updated by Peter Amstutz 3 months ago

  • Target version changed from 2021-11-10 sprint to 2021-11-24 sprint

#37 Updated by Peter Amstutz 2 months ago

  • Status changed from Feedback to Resolved

#38 Updated by Peter Amstutz 2 months ago

  • Release set to 45

Also available in: Atom PDF