Bug #11850

[CWL] arv:RunInSingleContainer should take max() of ResourceRequirements of substeps

Added by Peter Amstutz about 1 month ago. Updated about 6 hours ago.

Status:In ProgressStart date:06/21/2017
Priority:NormalDue date:
Assignee:Jiayong Li% Done:

0%

Category:-
Target version:2017-08-02 sprint
Story points0.5Remaining (hours)0.00 hour
Velocity based estimate-

Description

When creating a arv:RunInSingleContainer container, arvados-cwl-runner should look at the substeps to determine the maximum expected resource requirements to run the container.


Subtasks

Task #11879: ReviewNewPeter Amstutz

History

#1 Updated by Peter Amstutz about 1 month ago

  • Tracker changed from Story to Bug
  • Description updated (diff)

#2 Updated by Tom Morris about 1 month ago

  • Target version set to 2017-07-05 sprint

#3 Updated by Jiayong Li about 1 month ago

  • Assignee set to Jiayong Li

#4 Updated by Jiayong Li 19 days ago

  • Status changed from New to In Progress
  • Target version changed from 2017-07-05 sprint to 2017-07-19 sprint

#5 Updated by Jiayong Li 11 days ago

Approach: modify the arvados-cwl-runner code and install it on a virtualenv, then try it on a test cluster with the --local option

#6 Updated by Jiayong Li 11 days ago

Discussion with Peter concludes that in the context of "arv:RunInSingleContainer", when ResourceRequirement has javascript expressions it should print "couldn't evaluate". (The rationale here is that we need to run the job first in order to evaluate the resource expression, but to run the job we need to assign resource requirement first. This creates a circular dependency.)

#7 Updated by Jiayong Li 11 days ago

Jiayong Li wrote:

Discussion with Peter concludes that in the context of "arv:RunInSingleContainer", when ResourceRequirement has javascript expressions it should print "couldn't evaluate". (The rationale here is that we need to run the job first in order to evaluate the resource expression, but to run the job we need to assign resource requirement first. This creates a circular dependency.)

In order to implement this, what's the best (the most precise) way to determine if ResourceRequirement has expressions in it?

For example,

requirements:
  arv:RunInSingleContainer: {}
  InlineJavascriptRequirement: {}
  ResourceRequirement:
    coresMin: |
      ${
        return 0 + 1
      }

The packed document has

('requirements', [CommentedMap([('class', 'InlineJavascriptRequirement')]), CommentedMap([('coresMin', u'${\n  return 0 + 1\n}\n'), ('class', 'ResourceRequirement')])])

#8 Updated by Jiayong Li 5 days ago

  • Target version changed from 2017-07-19 sprint to 2017-08-02 sprint

#9 Updated by Jiayong Li 5 days ago

(4:06:18 PM) tom: jiayong: the cwl code base has a few instances of this test: "$(" in sf or "${" in sf
(4:06:38 PM) jiayong: I see
(4:06:56 PM) tom: jiayong: ok. so perhaps 'if "${" in blah' is the right test?

#10 Updated by Jiayong Li 5 days ago

After much deliberation, I think https://dev.arvados.org/issues/11850#note-6 needs to be reconsidered. I think it's reasonable to have expression (parameter or javascript) in the top level workflow--there's no circular dependency in this case. In fact, the current myGenome workflow has such an instance.

    hints: 
      - class: arv:RunInSingleContainer
      - class: ResourceRequirement
        coresMin: 2
        ramMin: |
                  ${
                    if (inputs.samtoolsviewinput) {
                      var file = inputs.samtoolsviewinput.basename;
                      if (file) {
                        var groups = file.match(/^(.+)(chr[1-9])(.+)$/);
                        if (groups) {
                          return 41000;
                        } else {
                          return 10000;
                        }
                      } else {
                        return 10000;
                      }
                    } else {
                      return 10000;
                    }
                  }
        tmpdirMin: 50000

We just need to figure out how to evaluate the expression from the packed document (or workflowobj["requirements"] or workflowobj["hints"]).

#11 Updated by Jiayong Li about 6 hours ago

Taking into account https://dev.arvados.org/issues/11850#note-10, I think the rule should be rephrased as follows.

In the context of "arv:RunInSingleContainer", when ResourceRequirement has expressions (javascript or parameter) anywhere except on the top-level, it should print "couldn't evaluate".

How should I implement this? This will requirement taking max with potential expressions on the top-level. After looking through cwltool, it looks like expression.py has do_eval function. Any specific hint for using expression.do_eval?

Also available in: Atom PDF