Bug #11850

[CWL] arv:RunInSingleContainer should take max() of ResourceRequirements of substeps

Added by Peter Amstutz 11 months ago. Updated about 2 months ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
-
Target version:
Start date:
06/21/2017
Due date:
% Done:

100%

Estimated time:
(Total: 0.00 h)
Story points:
0.5

Description

When creating a arv:RunInSingleContainer container, arvados-cwl-runner should look at the substeps to determine the maximum expected resource requirements to run the container.


Subtasks

Task #11879: Review 11850-singlecontainer-max-requirementsResolvedPeter Amstutz

Associated revisions

Revision f848ccbe
Added by jiayong2 about 2 months ago

Merge branch '11850-singlecontainer-max-requirements'

closes #11850

Arvados-DCO-1.1-Signed-off-by: Jiayong Li <>

History

#1 Updated by Peter Amstutz 11 months ago

  • Tracker changed from Story to Bug
  • Description updated (diff)

#2 Updated by Tom Morris 11 months ago

  • Target version set to 2017-07-05 sprint

#3 Updated by Jiayong Li 10 months ago

  • Assigned To set to Jiayong Li

#4 Updated by Jiayong Li 10 months ago

  • Status changed from New to In Progress
  • Target version changed from 2017-07-05 sprint to 2017-07-19 sprint

#5 Updated by Jiayong Li 10 months ago

Approach: modify the arvados-cwl-runner code and install it on a virtualenv, then try it on a test cluster with the --local option

#6 Updated by Jiayong Li 10 months ago

Discussion with Peter concludes that in the context of "arv:RunInSingleContainer", when ResourceRequirement has javascript expressions it should print "couldn't evaluate". (The rationale here is that we need to run the job first in order to evaluate the resource expression, but to run the job we need to assign resource requirement first. This creates a circular dependency.)

#7 Updated by Jiayong Li 10 months ago

Jiayong Li wrote:

Discussion with Peter concludes that in the context of "arv:RunInSingleContainer", when ResourceRequirement has javascript expressions it should print "couldn't evaluate". (The rationale here is that we need to run the job first in order to evaluate the resource expression, but to run the job we need to assign resource requirement first. This creates a circular dependency.)

In order to implement this, what's the best (the most precise) way to determine if ResourceRequirement has expressions in it?

For example,

requirements:
  arv:RunInSingleContainer: {}
  InlineJavascriptRequirement: {}
  ResourceRequirement:
    coresMin: |
      ${
        return 0 + 1
      }

The packed document has

('requirements', [CommentedMap([('class', 'InlineJavascriptRequirement')]), CommentedMap([('coresMin', u'${\n  return 0 + 1\n}\n'), ('class', 'ResourceRequirement')])])

#8 Updated by Jiayong Li 9 months ago

  • Target version changed from 2017-07-19 sprint to 2017-08-02 sprint

#9 Updated by Jiayong Li 9 months ago

(4:06:18 PM) tom: jiayong: the cwl code base has a few instances of this test: "$(" in sf or "${" in sf
(4:06:38 PM) jiayong: I see
(4:06:56 PM) tom: jiayong: ok. so perhaps 'if "${" in blah' is the right test?

#10 Updated by Jiayong Li 9 months ago

After much deliberation, I think https://dev.arvados.org/issues/11850#note-6 needs to be reconsidered. I think it's reasonable to have expression (parameter or javascript) in the top level workflow--there's no circular dependency in this case. In fact, the current myGenome workflow has such an instance.

    hints: 
      - class: arv:RunInSingleContainer
      - class: ResourceRequirement
        coresMin: 2
        ramMin: |
                  ${
                    if (inputs.samtoolsviewinput) {
                      var file = inputs.samtoolsviewinput.basename;
                      if (file) {
                        var groups = file.match(/^(.+)(chr[1-9])(.+)$/);
                        if (groups) {
                          return 41000;
                        } else {
                          return 10000;
                        }
                      } else {
                        return 10000;
                      }
                    } else {
                      return 10000;
                    }
                  }
        tmpdirMin: 50000

We just need to figure out how to evaluate the expression from the packed document (or workflowobj["requirements"] or workflowobj["hints"]).

#11 Updated by Jiayong Li 9 months ago

Taking into account https://dev.arvados.org/issues/11850#note-10, I think the rule should be rephrased as follows.

In the context of "arv:RunInSingleContainer", when ResourceRequirement has expressions (javascript or parameter) anywhere except on the top-level, it should print "couldn't evaluate".

How should I implement this? This will requirement taking max with potential expressions on the top-level. After looking through cwltool, it looks like expression.py has do_eval function. Any specific hint for using expression.do_eval?

#12 Updated by Peter Amstutz 9 months ago

Suggested behavior:

  • The toplevel workflow may have a resource requirement which may be an expression or a constant.
  • If any sub-tool has a resource requirement with an expression, it is an error
  • The ResourceRequirement applied to the subworkflow is the max() of the ResourceRequirement on the toplevel workflow and on each sub-tool.

To evaluate resource requirement for the toplevel workflow, do this in ArvadosWorkflow.job():

  1. create a "Builder" object
  2. assign "builder.job = joborder"
  3. assign "builder.requirements = self.requirements"
  4. call self.evalResources(builder, kwargs)

#13 Updated by Tom Morris 9 months ago

  • Target version changed from 2017-08-02 sprint to 2017-08-16 sprint

#14 Updated by Jiayong Li 8 months ago

Also need to add

builder.resources = {}

#15 Updated by Jiayong Li 8 months ago

  • Target version changed from 2017-08-16 sprint to 2017-08-30 Sprint

#16 Updated by Jiayong Li 8 months ago

  • Target version changed from 2017-08-30 Sprint to 2017-09-13 Sprint

#17 Updated by Jiayong Li 7 months ago

  • Target version changed from 2017-09-13 Sprint to 2017-09-27 Sprint

#18 Updated by Jiayong Li 7 months ago

  • Target version changed from 2017-09-27 Sprint to 2017-10-11 Sprint

#19 Updated by Jiayong Li 7 months ago

  • Target version changed from 2017-10-11 Sprint to 2017-10-25 Sprint

#20 Updated by Tom Morris 6 months ago

  • Target version changed from 2017-10-25 Sprint to 2017-11-08 Sprint

#21 Updated by Jiayong Li 6 months ago

  • Target version changed from 2017-11-08 Sprint to 2017-11-22 Sprint

#22 Updated by Jiayong Li 5 months ago

  • Target version changed from 2017-11-22 Sprint to 2017-12-06 Sprint

#23 Updated by Jiayong Li 5 months ago

  • Target version changed from 2017-12-06 Sprint to 2017-12-20 Sprint

#24 Updated by Jiayong Li 4 months ago

  • Target version changed from 2017-12-20 Sprint to 2018-01-17 Sprint

#25 Updated by Jiayong Li 3 months ago

  • Target version changed from 2018-01-17 Sprint to 2018-01-31 Sprint

#26 Updated by Jiayong Li 3 months ago

I'm looking to write a simple workflow that runs in single container that can test my code easily. More specifically, a workflow that prints the workflow requirement as the output. I'm having trouble with it, any suggestions?

I've tried the following, running with my local modified arvados-cwl-runner.

Command

arvados-cwl-runner --local --disable-reuse echo-wf.cwl

echo-wf.cwl

cwlVersion: v1.0
class: Workflow
$namespaces:
  arv: "http://arvados.org/cwl#" 
requirements:
  arv:RunInSingleContainer: {}
  StepInputExpressionRequirement: {}
  ResourceRequirement:
    coresMin: 2

inputs: []

outputs:
  echoout:
    type: string
    outputSource: echo/echoout

steps:
  echo:
    run: echo.cwl
    in:
      echoin:
        valueFrom: $(runtime.cores)
    out: [echoout]

echo.cwl

cwlVersion: v1.0
class: CommandLineTool
requirements:
  ResourceRequirement:
    coresMin: 3
inputs:
  echoin:
    type: string
outputs:
  echoout:
    type: string
    outputBinding:
      glob: out.txt
      loadContents: true
      outputEval: $(self[0].contents)
baseCommand: echo
arguments:
  - $(inputs.echoin)
stdout: out.txt

Pipeline
https://workbench.qr1hi.arvadosapi.com/pipeline_instances/qr1hi-d1hrv-4thjvc6lbmf565c

Note that the log says

runtime_constraints {"min_scratch_mb_per_node":2048,"docker_image":"arvados/jobs","max_tasks_per_node":0,"min_cores_per_node":3,"min_ram_mb_per_node":1024}

so min cores was the max of 2 and 3, the max code is working.

But the run failed with the error

[step echo] Cannot make job: Expression evaluation error:
runtime does not contain key 'cores'

My guess is that $(runtime.cores) can only be used in a command line tool, but not in a workflow. How do I reference the workflow level cores?

#27 Updated by Peter Amstutz 3 months ago

Jiayong Li wrote:

I'm looking to write a simple workflow that runs in single container that can test my code easily. More specifically, a workflow that prints the workflow requirement as the output. I'm having trouble with it, any suggestions?

I've tried the following, running with my local modified arvados-cwl-runner.

Command
[...]

echo-wf.cwl
[...]

echo.cwl
[...]

Pipeline
https://workbench.qr1hi.arvadosapi.com/pipeline_instances/qr1hi-d1hrv-4thjvc6lbmf565c

Note that the log says
[...]
so min cores was the max of 2 and 3, the max code is working.

But the run failed with the error
[...]
My guess is that $(runtime.cores) can only be used in a command line tool, but not in a workflow. How do I reference the workflow level cores?

You can't. Cores only make sense for CommandLineTool. You also can't access the Requirements that are in effect at the point of evaluation (although that might be a good future feature).

To test this feature, I suggest doing something like this:

  • Create a two step workflow, the first step requests 1 core, the second step requests 2 cores. Each step returns the value of $(runtime.cores) which are connected to the output.
  • Make that workflow a subworkflow within another workflow.
  • Run the workflow without arv:RunInSingleContainer, it should return 1 and 2.
  • Run the workflow with arv:RunInSingleContainer, it should return 2 and 2.

#28 Updated by Jiayong Li 3 months ago

I wrote the workflow following your instruction. It didn't produce the desired with arv:RunInSingleContainer. What am I doing wrong?

Here echo_a requires 2 cores and echo_b requires 3 cores.

Output:

2018-01-25 21:41:29 arvados.cwl-runner INFO: Final output collection b7bbabacb3973afe84a0be6973220e38+59 "Output of top-wf.cwl" (qr1hi-4zz18-62is0ej2dek2dg4)
{
    "echoout_a": "2\n", 
    "echoout_b": "3\n" 
}

We are expecting both to be 3.

Pipeline:
https://workbench.qr1hi.arvadosapi.com/pipeline_instances/qr1hi-d1hrv-o7qiwozkq25b8zt

Log for runtime of top-wf (at least this is correct):

"min_cores_per_node":3

top-wf.cwl

cwlVersion: v1.0
class: Workflow
$namespaces:
  arv: "http://arvados.org/cwl#" 
requirements:
  arv:RunInSingleContainer: {}
  SubworkflowFeatureRequirement: {}

inputs: []

outputs:
  echoout_a:
    type: string
    outputSource: echo-wf/echoout_a
  echoout_b:
    type: string
    outputSource: echo-wf/echoout_b

steps:
  echo-wf:
    run: echo-wf.cwl
    in: []
    out: [echoout_a, echoout_b]

echo-wf.cwl

cwlVersion: v1.0
class: Workflow
requirements:
  ResourceRequirement:
    coresMin: 1

inputs: []

outputs:
  echoout_a:
    type: string
    outputSource: echo_a/echoout
  echoout_b:
    type: string
    outputSource: echo_b/echoout

steps:
  echo_a:
    run: echo_a.cwl
    in: []
    out: [echoout]
  echo_b:
    run: echo_b.cwl
    in: []
    out: [echoout]

echo_a.cwl

cwlVersion: v1.0
class: CommandLineTool
requirements:
  ResourceRequirement:
    coresMin: 2
inputs: []
outputs:
  echoout:
    type: string
    outputBinding:
      glob: out.txt
      loadContents: true
      outputEval: $(self[0].contents)
baseCommand: echo
arguments:
  - $(runtime.cores)
stdout: out.txt

echo_b.cwl

cwlVersion: v1.0
class: CommandLineTool
requirements:
  ResourceRequirement:
    coresMin: 3
inputs: []
outputs:
  echoout:
    type: string
    outputBinding:
      glob: out.txt
      loadContents: true
      outputEval: $(self[0].contents)
baseCommand: echo
arguments:
  - $(runtime.cores)
stdout: out.txt

#29 Updated by Jiayong Li 3 months ago

Tried a different version, same result unfortunately.

Output:

2018-01-25 22:00:43 arvados.cwl-runner INFO: Final output collection b7bbabacb3973afe84a0be6973220e38+59 "Output of top-wf.cwl (2018-01-25T22:00:43.812Z)" (qr1hi-4zz18-df153ukorndxc44)
{
    "echoout_a": "2\n", 
    "echoout_b": "3\n" 
}

Pipeline:
https://workbench.qr1hi.arvadosapi.com/pipeline_instances/qr1hi-d1hrv-kntye861jxl9fi2

top-wf.cwl

cwlVersion: v1.0
class: Workflow
requirements:
  SubworkflowFeatureRequirement: {}

inputs: []

outputs:
  echoout_a:
    type: string
    outputSource: echo-wf/echoout_a
  echoout_b:
    type: string
    outputSource: echo-wf/echoout_b

steps:
  echo-wf:
    run: echo-wf.cwl
    in: []
    out: [echoout_a, echoout_b]

echo-wf.cwl

cwlVersion: v1.0
class: Workflow
$namespaces:
  arv: "http://arvados.org/cwl#" 
requirements:
  arv:RunInSingleContainer: {}
  ResourceRequirement:
    coresMin: 1

inputs: []

outputs:
  echoout_a:
    type: string
    outputSource: echo_a/echoout
  echoout_b:
    type: string
    outputSource: echo_b/echoout

steps:
  echo_a:
    run: echo_a.cwl
    in: []
    out: [echoout]
  echo_b:
    run: echo_b.cwl
    in: []
    out: [echoout]

#30 Updated by Peter Amstutz 3 months ago

Jiayong Li wrote:

Tried a different version, same result unfortunately.

Output:
[...]

Pipeline:
https://workbench.qr1hi.arvadosapi.com/pipeline_instances/qr1hi-d1hrv-kntye861jxl9fi2

top-wf.cwl
[...]

echo-wf.cwl
[...]

The requirement needs to be directly on the step itself, I think:

steps:
  echo-wf:
    requirements:
      arv:RunInSingleContainer: {}
    run: echo-wf.cwl
    in: []
    out: [echoout_a, echoout_b]

#31 Updated by Jiayong Li 3 months ago

Modified using the above lines, still the same result unfortunately.

Pipeline:
https://workbench.qr1hi.arvadosapi.com/pipeline_instances/qr1hi-d1hrv-zodk3bfxbq2k2y1

2018-01-26 00:23:23 arvados.cwl-runner INFO: Final output collection b7bbabacb3973afe84a0be6973220e38+59 "Output of top-wf.cwl (2018-01-26T00:23:23.610Z)" (qr1hi-4zz18-1jzut4956p46x0x)
{
    "echoout_a": "2\n", 
    "echoout_b": "3\n" 
}

#32 Updated by Jiayong Li 3 months ago

  • Target version changed from 2018-01-31 Sprint to 2018-02-14 Sprint

#33 Updated by Jiayong Li 3 months ago

  • Status changed from In Progress to Feedback

Ready for review

branch 11850-singlecontainer-max-requirements, commit 54e5ad11af9ee6804c908e49249edf87de7b35dd

#34 Updated by Jiayong Li 2 months ago

branch 11850-singlecontainer-max-requirements, commit 8a2035547ad8bf6abad6a4a03bb0b59211a00932

Tests passed in arvbox.

#35 Updated by Peter Amstutz 2 months ago


  •                     builder.requirements = self.requirements
                        builder.hints = self.hints
    

This should be:

                    builder.requirements = workflowobj["requirements"]
                    builder.hints = workflowobj["hints"]
  • These fields are listed out twice:

("coresMin", "coresMax", "ramMin", "ramMax", "tmpdirMin", "tmpdirMax", "outdirMin", "outdirMax")

You should assign it to a constant to avoid duplication.

  • new_spec won't be defined when self.wf_pdh is not None:
                "requirements": new_spec["requirements"]+[
                    {
...

However, new_spec["requirements"] is a reference to the same list as self.requirements, so this code has the side effect of modifying self.requirements:

                    new_spec = {"requirements": self.requirements, "hints": self.hints}
                    for t in ("requirements", "hints"):
                        for req in new_spec[t]:
                            if req["class"] == "ResourceRequirement":
                                new_spec[t].remove(req)
                        if max_res_req[t]:
                            new_spec[t].append(max_res_req[t])

Which means you can revert to this (and the same for hints):

                "requirements": self.requirements+[
                    {
...

#36 Updated by Peter Amstutz 2 months ago

One more thought, because outputs are cumulative, perhaps the total outDirMin should be the sum of outDirMin of all steps instead of max of any one step?

#37 Updated by Jiayong Li 2 months ago

  • Target version changed from 2018-02-14 Sprint to 2018-02-28 Sprint

Peter Amstutz wrote:

One more thought, because outputs are cumulative, perhaps the total outDirMin should be the sum of outDirMin of all steps instead of max of any one step?

Do you think the sum applies to tmpdirMin and tmpdirMax as well?

#38 Updated by Peter Amstutz 2 months ago

Jiayong Li wrote:

Peter Amstutz wrote:

One more thought, because outputs are cumulative, perhaps the total outDirMin should be the sum of outDirMin of all steps instead of max of any one step?

Do you think the sum applies to tmpdirMin and tmpdirMax as well?

No, for tmpdir taking max() is fine because the tmpdir is deleted between jobs.

#39 Updated by Jiayong Li 2 months ago

Peter Amstutz wrote:

Jiayong Li wrote:

Peter Amstutz wrote:

One more thought, because outputs are cumulative, perhaps the total outDirMin should be the sum of outDirMin of all steps instead of max of any one step?

Do you think the sum applies to tmpdirMin and tmpdirMax as well?

No, for tmpdir taking max() is fine because the tmpdir is deleted between jobs.

Hmmm, even if it's in a single container? That's good to know.

#40 Updated by Jiayong Li about 2 months ago

branch 11850-singlecontainer-max-requirements, commit 88a29cd091468feb98e5cd541c560f4d35bca716

Addressed the above comments, tests passed in arvbox.

#41 Updated by Peter Amstutz about 2 months ago

Jiayong Li wrote:

branch 11850-singlecontainer-max-requirements, commit 88a29cd091468feb98e5cd541c560f4d35bca716

Addressed the above comments, tests passed in arvbox.

This LGTM, please merge.

#42 Updated by Jiayong Li about 2 months ago

  • Status changed from Feedback to Resolved

Thanks for the review, the branch is now merged.

Also available in: Atom PDF