Bug #11850
closed[CWL] arv:RunInSingleContainer should take max() of ResourceRequirements of substeps
Added by Peter Amstutz over 7 years ago. Updated over 6 years ago.
Description
When creating a arv:RunInSingleContainer container, arvados-cwl-runner should look at the substeps to determine the maximum expected resource requirements to run the container.
Updated by Peter Amstutz over 7 years ago
- Tracker changed from Idea to Bug
- Description updated (diff)
Updated by Jiayong Li over 7 years ago
- Status changed from New to In Progress
- Target version changed from 2017-07-05 sprint to 2017-07-19 sprint
Updated by Jiayong Li over 7 years ago
Approach: modify the arvados-cwl-runner code and install it on a virtualenv, then try it on a test cluster with the --local option
Updated by Jiayong Li over 7 years ago
Discussion with Peter concludes that in the context of "arv:RunInSingleContainer", when ResourceRequirement has javascript expressions it should print "couldn't evaluate". (The rationale here is that we need to run the job first in order to evaluate the resource expression, but to run the job we need to assign resource requirement first. This creates a circular dependency.)
Updated by Jiayong Li over 7 years ago
Jiayong Li wrote:
Discussion with Peter concludes that in the context of "arv:RunInSingleContainer", when ResourceRequirement has javascript expressions it should print "couldn't evaluate". (The rationale here is that we need to run the job first in order to evaluate the resource expression, but to run the job we need to assign resource requirement first. This creates a circular dependency.)
In order to implement this, what's the best (the most precise) way to determine if ResourceRequirement has expressions in it?
For example,
requirements: arv:RunInSingleContainer: {} InlineJavascriptRequirement: {} ResourceRequirement: coresMin: | ${ return 0 + 1 }
The packed document has
('requirements', [CommentedMap([('class', 'InlineJavascriptRequirement')]), CommentedMap([('coresMin', u'${\n return 0 + 1\n}\n'), ('class', 'ResourceRequirement')])])
Updated by Jiayong Li over 7 years ago
- Target version changed from 2017-07-19 sprint to 2017-08-02 sprint
Updated by Jiayong Li over 7 years ago
(4:06:18 PM) tom: jiayong: the cwl code base has a few instances of this test: "$(" in sf or "${" in sf
(4:06:38 PM) jiayong: I see
(4:06:56 PM) tom: jiayong: ok. so perhaps 'if "${" in blah' is the right test?
Updated by Jiayong Li over 7 years ago
After much deliberation, I think https://dev.arvados.org/issues/11850#note-6 needs to be reconsidered. I think it's reasonable to have expression (parameter or javascript) in the top level workflow--there's no circular dependency in this case. In fact, the current myGenome workflow has such an instance.
hints: - class: arv:RunInSingleContainer - class: ResourceRequirement coresMin: 2 ramMin: | ${ if (inputs.samtoolsviewinput) { var file = inputs.samtoolsviewinput.basename; if (file) { var groups = file.match(/^(.+)(chr[1-9])(.+)$/); if (groups) { return 41000; } else { return 10000; } } else { return 10000; } } else { return 10000; } } tmpdirMin: 50000
We just need to figure out how to evaluate the expression from the packed document (or workflowobj["requirements"] or workflowobj["hints"]).
Updated by Jiayong Li over 7 years ago
Taking into account https://dev.arvados.org/issues/11850#note-10, I think the rule should be rephrased as follows.
In the context of "arv:RunInSingleContainer", when ResourceRequirement has expressions (javascript or parameter) anywhere except on the top-level, it should print "couldn't evaluate".
How should I implement this? This will requirement taking max with potential expressions on the top-level. After looking through cwltool, it looks like expression.py has do_eval function. Any specific hint for using expression.do_eval?
Updated by Peter Amstutz over 7 years ago
Suggested behavior:
- The toplevel workflow may have a resource requirement which may be an expression or a constant.
- If any sub-tool has a resource requirement with an expression, it is an error
- The ResourceRequirement applied to the subworkflow is the max() of the ResourceRequirement on the toplevel workflow and on each sub-tool.
To evaluate resource requirement for the toplevel workflow, do this in ArvadosWorkflow.job():
- create a "Builder" object
- assign "builder.job = joborder"
- assign "builder.requirements = self.requirements"
- call self.evalResources(builder, kwargs)
Updated by Tom Morris over 7 years ago
- Target version changed from 2017-08-02 sprint to 2017-08-16 sprint
Updated by Jiayong Li over 7 years ago
- Target version changed from 2017-08-16 sprint to 2017-08-30 Sprint
Updated by Jiayong Li about 7 years ago
- Target version changed from 2017-08-30 Sprint to 2017-09-13 Sprint
Updated by Jiayong Li about 7 years ago
- Target version changed from 2017-09-13 Sprint to 2017-09-27 Sprint
Updated by Jiayong Li about 7 years ago
- Target version changed from 2017-09-27 Sprint to 2017-10-11 Sprint
Updated by Jiayong Li about 7 years ago
- Target version changed from 2017-10-11 Sprint to 2017-10-25 Sprint
Updated by Tom Morris about 7 years ago
- Target version changed from 2017-10-25 Sprint to 2017-11-08 Sprint
Updated by Jiayong Li about 7 years ago
- Target version changed from 2017-11-08 Sprint to 2017-11-22 Sprint
Updated by Jiayong Li almost 7 years ago
- Target version changed from 2017-11-22 Sprint to 2017-12-06 Sprint
Updated by Jiayong Li almost 7 years ago
- Target version changed from 2017-12-06 Sprint to 2017-12-20 Sprint
Updated by Jiayong Li almost 7 years ago
- Target version changed from 2017-12-20 Sprint to 2018-01-17 Sprint
Updated by Jiayong Li almost 7 years ago
- Target version changed from 2018-01-17 Sprint to 2018-01-31 Sprint
Updated by Jiayong Li almost 7 years ago
I'm looking to write a simple workflow that runs in single container that can test my code easily. More specifically, a workflow that prints the workflow requirement as the output. I'm having trouble with it, any suggestions?
I've tried the following, running with my local modified arvados-cwl-runner.
Command
arvados-cwl-runner --local --disable-reuse echo-wf.cwl
echo-wf.cwl
cwlVersion: v1.0 class: Workflow $namespaces: arv: "http://arvados.org/cwl#" requirements: arv:RunInSingleContainer: {} StepInputExpressionRequirement: {} ResourceRequirement: coresMin: 2 inputs: [] outputs: echoout: type: string outputSource: echo/echoout steps: echo: run: echo.cwl in: echoin: valueFrom: $(runtime.cores) out: [echoout]
echo.cwl
cwlVersion: v1.0 class: CommandLineTool requirements: ResourceRequirement: coresMin: 3 inputs: echoin: type: string outputs: echoout: type: string outputBinding: glob: out.txt loadContents: true outputEval: $(self[0].contents) baseCommand: echo arguments: - $(inputs.echoin) stdout: out.txt
Pipeline
https://workbench.qr1hi.arvadosapi.com/pipeline_instances/qr1hi-d1hrv-4thjvc6lbmf565c
Note that the log says
runtime_constraints {"min_scratch_mb_per_node":2048,"docker_image":"arvados/jobs","max_tasks_per_node":0,"min_cores_per_node":3,"min_ram_mb_per_node":1024}
so min cores was the max of 2 and 3, the max code is working.
But the run failed with the error
[step echo] Cannot make job: Expression evaluation error: runtime does not contain key 'cores'
My guess is that $(runtime.cores) can only be used in a command line tool, but not in a workflow. How do I reference the workflow level cores?
Updated by Peter Amstutz almost 7 years ago
Jiayong Li wrote:
I'm looking to write a simple workflow that runs in single container that can test my code easily. More specifically, a workflow that prints the workflow requirement as the output. I'm having trouble with it, any suggestions?
I've tried the following, running with my local modified arvados-cwl-runner.
Command
[...]echo-wf.cwl
[...]echo.cwl
[...]Pipeline
https://workbench.qr1hi.arvadosapi.com/pipeline_instances/qr1hi-d1hrv-4thjvc6lbmf565cNote that the log says
[...]
so min cores was the max of 2 and 3, the max code is working.But the run failed with the error
[...]
My guess is that $(runtime.cores) can only be used in a command line tool, but not in a workflow. How do I reference the workflow level cores?
You can't. Cores only make sense for CommandLineTool. You also can't access the Requirements that are in effect at the point of evaluation (although that might be a good future feature).
To test this feature, I suggest doing something like this:
- Create a two step workflow, the first step requests 1 core, the second step requests 2 cores. Each step returns the value of $(runtime.cores) which are connected to the output.
- Make that workflow a subworkflow within another workflow.
- Run the workflow without arv:RunInSingleContainer, it should return 1 and 2.
- Run the workflow with arv:RunInSingleContainer, it should return 2 and 2.
Updated by Jiayong Li almost 7 years ago
I wrote the workflow following your instruction. It didn't produce the desired with arv:RunInSingleContainer. What am I doing wrong?
Here echo_a requires 2 cores and echo_b requires 3 cores.
Output:
2018-01-25 21:41:29 arvados.cwl-runner INFO: Final output collection b7bbabacb3973afe84a0be6973220e38+59 "Output of top-wf.cwl" (qr1hi-4zz18-62is0ej2dek2dg4) { "echoout_a": "2\n", "echoout_b": "3\n" }
We are expecting both to be 3.
Pipeline:
https://workbench.qr1hi.arvadosapi.com/pipeline_instances/qr1hi-d1hrv-o7qiwozkq25b8zt
Log for runtime of top-wf (at least this is correct):
"min_cores_per_node":3
top-wf.cwl
cwlVersion: v1.0 class: Workflow $namespaces: arv: "http://arvados.org/cwl#" requirements: arv:RunInSingleContainer: {} SubworkflowFeatureRequirement: {} inputs: [] outputs: echoout_a: type: string outputSource: echo-wf/echoout_a echoout_b: type: string outputSource: echo-wf/echoout_b steps: echo-wf: run: echo-wf.cwl in: [] out: [echoout_a, echoout_b]
echo-wf.cwl
cwlVersion: v1.0 class: Workflow requirements: ResourceRequirement: coresMin: 1 inputs: [] outputs: echoout_a: type: string outputSource: echo_a/echoout echoout_b: type: string outputSource: echo_b/echoout steps: echo_a: run: echo_a.cwl in: [] out: [echoout] echo_b: run: echo_b.cwl in: [] out: [echoout]
echo_a.cwl
cwlVersion: v1.0 class: CommandLineTool requirements: ResourceRequirement: coresMin: 2 inputs: [] outputs: echoout: type: string outputBinding: glob: out.txt loadContents: true outputEval: $(self[0].contents) baseCommand: echo arguments: - $(runtime.cores) stdout: out.txt
echo_b.cwl
cwlVersion: v1.0 class: CommandLineTool requirements: ResourceRequirement: coresMin: 3 inputs: [] outputs: echoout: type: string outputBinding: glob: out.txt loadContents: true outputEval: $(self[0].contents) baseCommand: echo arguments: - $(runtime.cores) stdout: out.txt
Updated by Jiayong Li almost 7 years ago
Tried a different version, same result unfortunately.
Output:
2018-01-25 22:00:43 arvados.cwl-runner INFO: Final output collection b7bbabacb3973afe84a0be6973220e38+59 "Output of top-wf.cwl (2018-01-25T22:00:43.812Z)" (qr1hi-4zz18-df153ukorndxc44) { "echoout_a": "2\n", "echoout_b": "3\n" }
Pipeline:
https://workbench.qr1hi.arvadosapi.com/pipeline_instances/qr1hi-d1hrv-kntye861jxl9fi2
top-wf.cwl
cwlVersion: v1.0 class: Workflow requirements: SubworkflowFeatureRequirement: {} inputs: [] outputs: echoout_a: type: string outputSource: echo-wf/echoout_a echoout_b: type: string outputSource: echo-wf/echoout_b steps: echo-wf: run: echo-wf.cwl in: [] out: [echoout_a, echoout_b]
echo-wf.cwl
cwlVersion: v1.0 class: Workflow $namespaces: arv: "http://arvados.org/cwl#" requirements: arv:RunInSingleContainer: {} ResourceRequirement: coresMin: 1 inputs: [] outputs: echoout_a: type: string outputSource: echo_a/echoout echoout_b: type: string outputSource: echo_b/echoout steps: echo_a: run: echo_a.cwl in: [] out: [echoout] echo_b: run: echo_b.cwl in: [] out: [echoout]
Updated by Peter Amstutz almost 7 years ago
Jiayong Li wrote:
Tried a different version, same result unfortunately.
Output:
[...]Pipeline:
https://workbench.qr1hi.arvadosapi.com/pipeline_instances/qr1hi-d1hrv-kntye861jxl9fi2top-wf.cwl
[...]echo-wf.cwl
[...]
The requirement needs to be directly on the step itself, I think:
steps: echo-wf: requirements: arv:RunInSingleContainer: {} run: echo-wf.cwl in: [] out: [echoout_a, echoout_b]
Updated by Jiayong Li almost 7 years ago
Modified using the above lines, still the same result unfortunately.
Pipeline:
https://workbench.qr1hi.arvadosapi.com/pipeline_instances/qr1hi-d1hrv-zodk3bfxbq2k2y1
2018-01-26 00:23:23 arvados.cwl-runner INFO: Final output collection b7bbabacb3973afe84a0be6973220e38+59 "Output of top-wf.cwl (2018-01-26T00:23:23.610Z)" (qr1hi-4zz18-1jzut4956p46x0x) { "echoout_a": "2\n", "echoout_b": "3\n" }
Updated by Jiayong Li almost 7 years ago
- Target version changed from 2018-01-31 Sprint to 2018-02-14 Sprint
Updated by Jiayong Li almost 7 years ago
- Status changed from In Progress to Feedback
Ready for review
branch 11850-singlecontainer-max-requirements, commit 54e5ad11af9ee6804c908e49249edf87de7b35dd
Updated by Jiayong Li almost 7 years ago
branch 11850-singlecontainer-max-requirements, commit 8a2035547ad8bf6abad6a4a03bb0b59211a00932
Tests passed in arvbox.
Updated by Peter Amstutz almost 7 years ago
builder.requirements = self.requirements builder.hints = self.hints
This should be:
builder.requirements = workflowobj["requirements"] builder.hints = workflowobj["hints"]
- These fields are listed out twice:
("coresMin", "coresMax", "ramMin", "ramMax", "tmpdirMin", "tmpdirMax", "outdirMin", "outdirMax")
You should assign it to a constant to avoid duplication.
new_spec
won't be defined whenself.wf_pdh is not None
:
"requirements": new_spec["requirements"]+[ { ...
However, new_spec["requirements"] is a reference to the same list as self.requirements, so this code has the side effect of modifying self.requirements:
new_spec = {"requirements": self.requirements, "hints": self.hints} for t in ("requirements", "hints"): for req in new_spec[t]: if req["class"] == "ResourceRequirement": new_spec[t].remove(req) if max_res_req[t]: new_spec[t].append(max_res_req[t])
Which means you can revert to this (and the same for hints):
"requirements": self.requirements+[ { ...
Updated by Peter Amstutz almost 7 years ago
One more thought, because outputs are cumulative, perhaps the total outDirMin should be the sum of outDirMin of all steps instead of max of any one step?
Updated by Jiayong Li almost 7 years ago
- Target version changed from 2018-02-14 Sprint to 2018-02-28 Sprint
Peter Amstutz wrote:
One more thought, because outputs are cumulative, perhaps the total outDirMin should be the sum of outDirMin of all steps instead of max of any one step?
Do you think the sum applies to tmpdirMin and tmpdirMax as well?
Updated by Peter Amstutz almost 7 years ago
Jiayong Li wrote:
Peter Amstutz wrote:
One more thought, because outputs are cumulative, perhaps the total outDirMin should be the sum of outDirMin of all steps instead of max of any one step?
Do you think the sum applies to tmpdirMin and tmpdirMax as well?
No, for tmpdir taking max() is fine because the tmpdir is deleted between jobs.
Updated by Jiayong Li almost 7 years ago
Peter Amstutz wrote:
Jiayong Li wrote:
Peter Amstutz wrote:
One more thought, because outputs are cumulative, perhaps the total outDirMin should be the sum of outDirMin of all steps instead of max of any one step?
Do you think the sum applies to tmpdirMin and tmpdirMax as well?
No, for tmpdir taking max() is fine because the tmpdir is deleted between jobs.
Hmmm, even if it's in a single container? That's good to know.
Updated by Jiayong Li over 6 years ago
branch 11850-singlecontainer-max-requirements, commit 88a29cd091468feb98e5cd541c560f4d35bca716
Addressed the above comments, tests passed in arvbox.
Updated by Peter Amstutz over 6 years ago
Jiayong Li wrote:
branch 11850-singlecontainer-max-requirements, commit 88a29cd091468feb98e5cd541c560f4d35bca716
Addressed the above comments, tests passed in arvbox.
This LGTM, please merge.
Updated by Jiayong Li over 6 years ago
- Status changed from Feedback to Resolved
Thanks for the review, the branch is now merged.