Project

General

Profile

Actions

Feature #16957

closed

cwltool/acr checks for circular dependencies

Added by Peter Amstutz over 3 years ago. Updated over 2 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Jiayong Li
Category:
CWL
Target version:
Story points:
-
Release relationship:
Auto

Description

Extend the cwltool workflow checker to detect if the workflow has a circular dependency (i.e. a step's inputs somehow depends on that same step's outputs). This should be a fatal error. Merge the changes into cwltool, see that a new cwltool is released, and update arvados-cwl-runner to use cwltool with the upgraded checker.

Tasks:
  • Create a 3 step workflow that the output of the last step is included as an input to the first step, starting with cwl-hasher workflow we use for testing clusters
  • Try to run in arvados.
  • Change cwltool accordingly.
  • Also catch the case where a step has an input field that depends on one of its own outputs

Related issues

Related to Arvados Epics - Idea #17848: CWL runner improvementsResolved07/01/202103/29/2023Actions
Actions #1

Updated by Peter Amstutz over 3 years ago

  • Description updated (diff)
Actions #2

Updated by Peter Amstutz over 3 years ago

  • Description updated (diff)
Actions #3

Updated by Peter Amstutz over 3 years ago

  • Description updated (diff)
Actions #4

Updated by Peter Amstutz over 3 years ago

  • Related to Idea #16011: CWL support, docs, training, website added
Actions #5

Updated by Jiayong Li over 3 years ago

  • Assigned To set to Jiayong Li
Actions #6

Updated by Peter Amstutz about 3 years ago

  • Target version set to 2021-03-31 sprint
Actions #7

Updated by Peter Amstutz about 3 years ago

  • Target version changed from 2021-03-31 sprint to 2021-04-14 sprint
Actions #8

Updated by Peter Amstutz about 3 years ago

  • Target version changed from 2021-04-14 sprint to 2021-04-28 bughunt sprint
Actions #9

Updated by Peter Amstutz about 3 years ago

  • Target version deleted (2021-04-28 bughunt sprint)
Actions #10

Updated by Peter Amstutz almost 3 years ago

  • Target version set to 2021-06-09 sprint
  • Assigned To changed from Jiayong Li to Nico César
Actions #11

Updated by Nico César almost 3 years ago

  • Description updated (diff)
Actions #12

Updated by Peter Amstutz almost 3 years ago

  • Target version changed from 2021-06-09 sprint to 2021-06-23 sprint
Actions #13

Updated by Peter Amstutz almost 3 years ago

  • Target version changed from 2021-06-23 sprint to 2021-07-07 sprint
Actions #14

Updated by Peter Amstutz almost 3 years ago

  • Related to Idea #17848: CWL runner improvements added
Actions #15

Updated by Peter Amstutz almost 3 years ago

  • Related to deleted (Idea #16011: CWL support, docs, training, website)
Actions #16

Updated by Peter Amstutz almost 3 years ago

  • Target version changed from 2021-07-07 sprint to 2021-07-21 sprint
Actions #17

Updated by Peter Amstutz almost 3 years ago

  • Target version changed from 2021-07-21 sprint to 2021-08-04 sprint
Actions #18

Updated by Peter Amstutz over 2 years ago

  • Assigned To deleted (Nico César)
  • Subject changed from cwltool/acr checks for circular dependencies to cwltool/acr checks for circular dependencies
Actions #19

Updated by Peter Amstutz over 2 years ago

  • Target version changed from 2021-08-04 sprint to 2021-08-18 sprint
Actions #20

Updated by Ward Vandewege over 2 years ago

  • Description updated (diff)
Actions #21

Updated by Peter Amstutz over 2 years ago

  • Target version changed from 2021-08-18 sprint to 2021-09-01 sprint
Actions #22

Updated by Peter Amstutz over 2 years ago

  • Target version changed from 2021-09-01 sprint to 2021-09-15 sprint
Actions #23

Updated by Jiayong Li over 2 years ago

  • Assigned To set to Jiayong Li
Actions #24

Updated by Jiayong Li over 2 years ago

  • Description updated (diff)
Actions #25

Updated by Jiayong Li over 2 years ago

About step in self.steps cf. https://github.com/common-workflow-language/cwltool/blob/e37134a90ac7a7c18254e30cff16da590b45c6d7/cwltool/workflow.py#L126

Is there any way to find out whether a step is a command line tool or workflow?

For example,

ordereddict([('run', 'file:///home/jiayong/Code/work_scripts/cwl/depdency/ls-wf.cwl'), ('in', [ordereddict([('source', 'file:///home/jiayong/Code/work_scripts/cwl/depdency/subworkflow-ls-wf.cwl#cat/cattxt'), ('id', 'file:///home/jiayong/Code/work_scripts/cwl/depdency/subworkflow-ls-wf.cwl#ls-wf/txt')])]), ('out', ['file:///home/jiayong/Code/work_scripts/cwl/depdency/subworkflow-ls-wf.cwl#ls-wf/wctxt']), ('id', 'file:///home/jiayong/Code/work_scripts/cwl/depdency/subworkflow-ls-wf.cwl#ls-wf'), ('inputs', [ordereddict([('source', 'file:///home/jiayong/Code/work_scripts/cwl/depdency/subworkflow-ls-wf.cwl#cat/cattxt'), ('id', 'file:///home/jiayong/Code/work_scripts/cwl/depdency/subworkflow-ls-wf.cwl#ls-wf/txt'), ('type', 'File'), ('_tool_entry', ordereddict([('type', 'File'), ('id', 'file:///home/jiayong/Code/work_scripts/cwl/depdency/ls-wf.cwl#txt')]))])]), ('outputs', [ordereddict([('type', 'File'), ('outputSource', 'file:///home/jiayong/Code/work_scripts/cwl/depdency/ls-wf.cwl#wc/wctxt'), ('id', 'file:///home/jiayong/Code/work_scripts/cwl/depdency/subworkflow-ls-wf.cwl#ls-wf/wctxt'), ('_tool_entry', ordereddict([('type', 'File'), ('outputSource', 'file:///home/jiayong/Code/work_scripts/cwl/depdency/ls-wf.cwl#wc/wctxt'), ('id', 'file:///home/jiayong/Code/work_scripts/cwl/depdency/ls-wf.cwl#wctxt')]))])])]) 

It's not directly clear whether the above step is a workflow or not. ("outputSource" is an indicator but it's not very direct.)

Actions #26

Updated by Peter Amstutz over 2 years ago

Jiayong Li wrote:

About step in self.steps cf. https://github.com/common-workflow-language/cwltool/blob/e37134a90ac7a7c18254e30cff16da590b45c6d7/cwltool/workflow.py#L126

Is there any way to find out whether a step is a command line tool or workflow?

For example,
[...]

It's not directly clear whether the above step is a workflow or not. ("outputSource" is an indicator but it's not very direct.)

Possibly an easier way to do this is to use the visit() method

https://github.com/common-workflow-language/cwltool/blob/e37134a90ac7a7c18254e30cff16da590b45c6d7/cwltool/workflow.py#L175

You pass in a function, it will call that function with each tool or subworkflow that occurs in the workflow.

For each workflow, you just need to check for circular references in the steps. This should be straightforward: starting from the input parameters of the workflow, find what steps refer to those parameters, then find the output parameters, then find the steps that refer to the output parameters, and so forth. Each step you visit, you push it on to a "visited" list. If you find that you are re-visiting a step, that means you have found a cycle, and should raise an error.

Actions #27

Updated by Peter Amstutz over 2 years ago

  • Target version changed from 2021-09-15 sprint to 2021-09-29 sprint
Actions #28

Updated by Peter Amstutz over 2 years ago

  • Status changed from New to In Progress
Actions #29

Updated by Jiayong Li over 2 years ago

I tested adding a line before L135 https://github.com/common-workflow-language/cwltool/blob/e37134a90ac7a7c18254e30cff16da590b45c6d7/cwltool/workflow.py#L135

print(self.steps)

My test workflow is as follows
foo-wf.cwl

cwlVersion: v1.1
class: Workflow
requirements:
  SubworkflowFeatureRequirement: {}

inputs:
  txt:
    type: File

outputs:
  wctxt:
    type: File
    outputSource: subworkflow-foo-wf/wctxt

steps:
  cat:
    run: cat.cwl
    in:
      intxt: txt
    out: [cattxt]
  subworkflow-foo-wf:
    run: subworkflow-foo-wf.cwl
    in:
      txt: cat/cattxt
    out: [wctxt]

subworkflow-foo-wf.cwl

cwlVersion: v1.1
class: Workflow

inputs:
  txt:
    type: File

outputs:
  wctxt:
    type: File
    outputSource: wc/wctxt

steps:
  cat:
    run: cat.cwl
    in:
      intxt: txt
    out: [cattxt]
  ls:
    run: ls.cwl
    in:
      intxt: cat/cattxt
    out: [lstxt]
  wc:
    run: wc.cwl
    in:
      intxt: ls/lstxt
    out: [wctxt]

Running this workflow with the added print(self.steps) yields

[<cwltool.workflow.WorkflowStep object at 0x7fa6afa07390>, <cwltool.workflow.WorkflowStep object at 0x7fa6aa2d16a0>, <cwltool.workflow.WorkflowStep object at 0x7fa6aa7206a0>]
[<cwltool.workflow.WorkflowStep object at 0x7fa6aa253a58>, <cwltool.workflow.WorkflowStep object at 0x7fa6aa209048>]

This means print(self.steps) is called twice, once for subworkflow-foo-wf and once for foo-wf. My questions: are both of these two print commands run before executing the workflows? If we run circular dependency check on subworkflow-foo-wf, then execute it, and then run check on foo-wf and execute, this would defeat the purpose of check before run.

Actions #30

Updated by Peter Amstutz over 2 years ago

The static checks happen at loading time, not at execution time, so they happen before running, at this point it is only loading the description.

Actions #31

Updated by Peter Amstutz over 2 years ago

  • Target version changed from 2021-09-29 sprint to 2021-10-13 sprint
Actions #32

Updated by Peter Amstutz over 2 years ago

  • Target version changed from 2021-10-13 sprint to 2021-10-27 sprint
Actions #33

Updated by Peter Amstutz over 2 years ago

  • Target version changed from 2021-10-27 sprint to 2021-11-10 sprint
Actions #34

Updated by Jiayong Li over 2 years ago

  • Status changed from In Progress to Feedback
Actions #35

Updated by Peter Amstutz over 2 years ago

Just need to update the cwltool version used by a-c-r to get the new check.

Actions #36

Updated by Peter Amstutz over 2 years ago

  • Target version changed from 2021-11-10 sprint to 2021-11-24 sprint
Actions #37

Updated by Peter Amstutz over 2 years ago

  • Status changed from Feedback to Resolved
Actions #38

Updated by Peter Amstutz over 2 years ago

  • Release set to 45
Actions

Also available in: Atom PDF