Feature #17054

Custom naming for scatter steps

Added by Peter Amstutz 11 months ago. Updated 18 days ago.

Status:
In Progress
Priority:
Normal
Assigned To:
Category:
CWL
Target version:
-
Start date:
Due date:
% Done:

0%

Estimated time:
(Total: 0.00 h)
Story points:
-

Description

Add an extension to cwltool that allows the user to provide an expression that will determine the runtime name of a workflow step or scatter step. When a new cwltool is released, update the dependency arvados-cwl-runner.

Suggested approach

1. Add a new process requirement to cwltool/extensions-v1.1.yml

- name: StepNameHint
  type: record
  inVocab: false
  extends: cwl:ProcessRequirement
  doc: |
    Provide a hint for naming the runtime workflow step in logs or user interface.
  fields:
    - name: class
      type: string
      doc: "Always 'StepNameHint'" 
      jsonldPredicate:
        "_id": "@type" 
        "_type": "@vocab" 
    - name: stepname:
      type: [string, Expression]
      doc: |
        A string or expression returning a string with the preferred name for the step.  
        If it is an expression, it is evaluated after the input object has been completely determined.

2. update supportedProcessRequirements

Add "http://commonwl.org/cwltool#StepNameHint" to process.py supportedProcessRequirements

3. Update setup_schema() in main.py

use_custom_schema("v1.2", "http://commonwl.org/cwltool", ext11)

you should also add this to the "else" branch:
use_standard_schema("v1.2")

4. Update WorkflowJobStep in workflow_job.py

Add code to the job() method that

  1. checks if the current workflow step has "http://commonwl.org/cwltool#StepNameHint" in "hints" or "requirements"
  2. If so, gets the value of "stepname"
  3. Then does self.name = expression.do_eval(stepname)

5. Add tests

Write a workflow that uses the new hint to with an expression that uses something from the input to set the name of the workflow step.

Write a test case that calls cwltool --enable-ext and checks that the log output uses the custom name.

errormsg.txt (97.9 KB) errormsg.txt Jiayong Li, 06/08/2021 08:42 PM

Subtasks

Task #17456: ReviewNewPeter Amstutz


Related issues

Related to Arvados Epics - Story #17848: Improve a-c-r usabilityIn Progress07/01/202109/30/2021

History

#1 Updated by Peter Amstutz 11 months ago

  • Related to Story #16011: CWL support, docs, training, website added

#2 Updated by Nico C├ęsar 11 months ago

  • Related to Feature #16462: Expand arvados-controller to expose forecast features added

#3 Updated by Peter Amstutz 7 months ago

  • Target version set to 2021-03-31 sprint
  • Assigned To set to Jiayong Li

#4 Updated by Peter Amstutz 7 months ago

  • Description updated (diff)

#5 Updated by Peter Amstutz 6 months ago

  • Description updated (diff)

#6 Updated by Peter Amstutz 6 months ago

  • Target version changed from 2021-03-31 sprint to 2021-04-14 sprint

#7 Updated by Peter Amstutz 5 months ago

  • Target version changed from 2021-04-14 sprint to 2021-04-28 bughunt sprint

#8 Updated by Peter Amstutz 5 months ago

  • Target version deleted (2021-04-28 bughunt sprint)

#9 Updated by Peter Amstutz 5 months ago

  • Target version set to 2021-04-28 bughunt sprint

#10 Updated by Jiayong Li 5 months ago

  • Status changed from New to In Progress

#11 Updated by Peter Amstutz 5 months ago

  • Target version changed from 2021-04-28 bughunt sprint to 2021-05-12 sprint

#12 Updated by Jiayong Li 4 months ago

Working plan for changes

    def job(
        self,
        joborder: CWLObjectType,
        output_callback: Optional[OutputCallbackType],
        runtimeContext: RuntimeContext,
    ) -> JobsGeneratorType:
        runtimeContext = runtimeContext.copy()
        runtimeContext.part_of = self.name

        # change custom naming
        for hint in self.step["hints"]:
            if hint["class"] == "http://commonwl.org/cwltool#StepNameHint":
                runtimeContext.name = expression.do_eval(hint["stepname"])
        else:
            runtimeContext.name = shortname(self.id)

        _logger.info("[%s] start", self.name)

        yield from self.step.job(joborder, output_callback, runtimeContext)

#13 Updated by Peter Amstutz 4 months ago

  • Target version changed from 2021-05-12 sprint to 2021-05-26 sprint

#14 Updated by Peter Amstutz 4 months ago

  • Target version changed from 2021-05-26 sprint to 2021-06-09 sprint

#15 Updated by Jiayong Li 4 months ago

My test command line tool, workflow, and input yml are

echo.cwl

cwlVersion: v1.1
class: CommandLineTool
inputs:
  text:
    type: string
    inputBinding: {}
outputs: []
baseCommand: echo

scatter-echo-wf.cwl

$namespaces:
  cwltool: "http://commonwl.org/cwltool#" 
cwlVersion: v1.1
class: Workflow
requirements:
  ScatterFeatureRequirement: {}
  InlineJavascriptRequirement: {}

inputs:
  texts:
    type: string[]

outputs: []

steps:
  echo:
    run: echo.cwl
    scatter: text
    hints:
      cwltool:StepNameHint:
        stepname: $(inputs.text)
    in:
      text: texts
    out: []

scatter-echo-wf.yml

texts: ["a", "b", "c"]

The code I replaced in L72 of workflow_job.py is

runtimeContext.name = expression.do_eval(hint["stepname"], joborder, self.step.requirements, None, None, {})

I'm getting an error when I run "cwltool scatter-echo-wf.cwl scatter-echo-wf.yml".

Traceback (most recent call last):
  File "/home/jiayong/Code/env/cwltool/lib/python3.6/site-packages/cwltool/sandboxjs.py", line 332, in execjs
    return cast(CWLOutputType, json.loads(stdout))
  File "/usr/lib/python3.6/json/__init__.py", line 354, in loads
    return _default_decoder.decode(s)
  File "/usr/lib/python3.6/json/decoder.py", line 339, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/lib/python3.6/json/decoder.py", line 357, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/jiayong/Code/env/cwltool/lib/python3.6/site-packages/cwltool/expression.py", line 414, in do_eval
    else 2,
  File "/home/jiayong/Code/env/cwltool/lib/python3.6/site-packages/cwltool/expression.py", line 308, in interpolate
    js_console=js_console,
  File "/home/jiayong/Code/env/cwltool/lib/python3.6/site-packages/cwltool/expression.py", line 243, in evaluator
    js_console=js_console,
  File "/home/jiayong/Code/env/cwltool/lib/python3.6/site-packages/cwltool/sandboxjs.py", line 338, in execjs
    ) from err
cwltool.sandboxjs.JavascriptException: Expecting value: line 1 column 1 (char 0)
script was:
01 "use strict";
02 var inputs = {
03     "file:///home/jiayong/Code/work_scripts/cwl/echo/scatter-echo-wf.cwl#echo/text": "a" 
04 };
05 var self = null;
06 var runtime = {
07     "tmpdir": null,
08     "outdir": null
09 };
10 (function(){return ((inputs.text));})()
stdout was: 'undefined'
stderr was: ''

The three key arguments hint["stepname"], joborder, self.step.requirements are as follows

hint["stepname"]: $(inputs.text)
joborder: {'file:///home/jiayong/Code/work_scripts/cwl/echo/scatter-echo-wf.cwl#echo/text': 'a'}
self.step.requirements: [ordereddict([('class', 'InlineJavascriptRequirement')]), ordereddict([('class', 'ScatterFeatureRequirement')])]

Any idea what went wrong there?

#16 Updated by Jiayong Li 3 months ago

Now the problem is that cwltool runs without --enable-ext, but errors out when the flag is turned on.

echo.cwl

cwlVersion: v1.1
class: CommandLineTool
inputs:
  text:
    type: string
    inputBinding: {}
outputs: []
baseCommand: echo

scatter-echo-wf.cwl

$namespaces:
  cwltool: "http://commonwl.org/cwltool#" 
cwlVersion: v1.1
class: Workflow
requirements:
  ScatterFeatureRequirement: {}
  InlineJavascriptRequirement: {}

inputs:
  texts:
    type: string[]

outputs: []

steps:
  echo:
    run: echo.cwl
    scatter: text
    hints:
      cwltool:StepNameHint:
        stepname: $("test_" + inputs.text.split('.')[0])
    in:
      text: texts
    out: []

scatter-echo-wf.yml

texts: ["a.vcf", "b.vcf", "c.vcf"]

Error message:

$ cwltool --enable-ext scatter-echo-wf.cwl scatter-echo-wf.yml 
INFO /home/jiayong/Code/env/cwltool-jiayong/bin/cwltool 3.1.20210511185845
INFO Resolved 'scatter-echo-wf.cwl' to 'file:///home/jiayong/Code/work_scripts/cwl/echo/scatter-echo-wf.cwl'
ERROR Tool definition failed validation:
http://commonwl.org/cwltool:68:3: checking object `http://commonwl.org/cwltool#StepNameHint`
http://commonwl.org/cwltool:74:3:   checking field `fields`
http://commonwl.org/cwltool:81:7:     checking object
                                      `http://commonwl.org/cwltool#StepNameHint/stepname`
                                        Field `type` references unknown identifier `Expression`,
                                        tried http://commonwl.org/cwltool#Expression

I figured out the failed validation is coming from the appended section in extensions-v1.1.yml

- name: StepNameHint
  type: record
  inVocab: false
  extends: cwl:ProcessRequirement
  doc: |
    Provide a hint for naming the runtime workflow step in logs or user interface.
  fields:
    - name: class
      type: string
      doc: "Always 'StepNameHint'" 
      jsonldPredicate:
        "_id": "@type" 
        "_type": "@vocab" 
    - name: stepname
      type: [string, Expression]
      doc: |
        A string or expression returning a string with the preferred name for the step.
        If it is an expression, it is evaluated after the input object has been completely determined.

I'm not sure how I should write the hint differently so the tool definition validation will pass.

#17 Updated by Jiayong Li 3 months ago

Attached error message from running cwltool --enable-ext scatter-echo-wf.cwl scatter-echo-wf.yml with the following changes to extensions-v1.1.yml

- name: StepNameHint
  type: record
  inVocab: false
  extends: cwl:ProcessRequirement
  doc: |
    Provide a hint for naming the runtime workflow step in logs or user interface.
  fields:
    - name: class
      type: string
      doc: "Always 'StepNameHint'" 
      jsonldPredicate:
        "_id": "@type" 
        "_type": "@vocab" 
    - name: stepname
      type: [string, cwl:Expression]
      doc: |
        A string or expression returning a string with the preferred name for the step.
        If it is an expression, it is evaluated after the input object has been completely determined.

#18 Updated by Peter Amstutz 3 months ago

  • Target version changed from 2021-06-09 sprint to 2021-06-23 sprint

#19 Updated by Peter Amstutz 3 months ago

  • Target version changed from 2021-06-23 sprint to 2021-07-07 sprint

#20 Updated by Peter Amstutz 3 months ago

Jiayong,

If you merge with the latest cwltool, the reference to cwl:Expression in extensions-v1.1.yml should no longer produce an error.

#21 Updated by Peter Amstutz 3 months ago

#22 Updated by Peter Amstutz 3 months ago

  • Related to deleted (Feature #16462: Expand arvados-controller to expose forecast features)

#23 Updated by Peter Amstutz 3 months ago

  • Related to deleted (Story #16011: CWL support, docs, training, website)

#24 Updated by Peter Amstutz 2 months ago

  • Target version changed from 2021-07-07 sprint to 2021-07-21 sprint

#25 Updated by Peter Amstutz 2 months ago

  • Target version changed from 2021-07-21 sprint to 2021-08-04 sprint

#26 Updated by Jiayong Li about 2 months ago

1. In https://dev.arvados.org/issues/17054#4-Update-WorkflowJobStep-in-workflow_jobpy, you mentioned "checks if the current workflow step has "http://commonwl.org/cwltool#StepNameHint" in "hints" or "requirements"". Right now I'm only checking this under "hints", since it's called "StepNameHint", also the doc field says "provide a hint". Should I expect this to appear under "requirements" as well?

2. I wrote a unit test for custom naming, and it passed. However, some other tests have failed even though I made no changes for them.

=========================== short test summary info ============================
FAILED tests/test_context.py::test_replace_default_stdout_stderr - cwltool.er...
FAILED tests/test_examples.py::test_factory - cwltool.errors.WorkflowExceptio...
FAILED tests/test_load_tool.py::test_check_version - cwltool.errors.WorkflowE...
====== 3 failed, 376 passed, 132 skipped, 2 warnings in 184.91s (0:03:04) ======

#27 Updated by Peter Amstutz about 2 months ago

Jiayong Li wrote:

1. In https://dev.arvados.org/issues/17054#4-Update-WorkflowJobStep-in-workflow_jobpy, you mentioned "checks if the current workflow step has "http://commonwl.org/cwltool#StepNameHint" in "hints" or "requirements"". Right now I'm only checking this under "hints", since it's called "StepNameHint", also the doc field says "provide a hint". Should I expect this to appear under "requirements" as well?

In general, those extensions are supposed to be available under either hints or requirements, so it is good to accept them in both places for consistency.

You want to be using the method "get_requirement" which searches for a given process requirement (in both "hints" and "requirements") with the correct precedence rules.

2. I wrote a unit test for custom naming, and it passed. However, some other tests have failed even though I made no changes for them.
[...]

Where is your branch so I can review it?

As I said in standup, you should create a pull request for cwltool. In addition to the unit tests there are other code quality tools that are very picky, you will probably need to make further changes to make them happy.

#28 Updated by Peter Amstutz about 2 months ago

  • Target version changed from 2021-08-04 sprint to 2021-08-18 sprint

#30 Updated by Peter Amstutz about 1 month ago

  • Target version changed from 2021-08-18 sprint to 2021-09-01 sprint

#31 Updated by Peter Amstutz 19 days ago

  • Target version changed from 2021-09-01 sprint to 2021-09-15 sprint

#32 Updated by Peter Amstutz 18 days ago

  • Target version deleted (2021-09-15 sprint)

Also available in: Atom PDF