Bug #15497

[ACR] arv:ReuseRequirement enableReuse: false doesn't work

Added by Stephen McLaughlin 4 months ago. Updated 17 days ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
-
Target version:
Start date:
10/24/2019
Due date:
% Done:

100%

Estimated time:
(Total: 0.00 h)
Story points:
-
Release relationship:
Auto

Description

I was interested in using arv:ReuseRequirement in one step of my workflows and I noticed it wasn't working like I expected.

I made an isolated test to demonstrate involving one Workflow that calls a CommandLineTool twice.

container-reuse-debug-wf.cwl:

$namespaces:
  arv: "http://arvados.org/cwl#" 
  cwltool: "http://commonwl.org/cwltool#" 

cwlVersion: v1.0
class: Workflow

inputs:
  a: File
  b: File
  c: File
  d: File
  e: File
  f: File

outputs:
  abc:
    type: File
    outputSource: concat_abc/out
  def:
    type: File
    outputSource: concat_def/out

steps:
  concat_abc:
    run: concat-files.cwl
    in:
      a: a
      b: b
      c: c
    out: [out]
  concat_def:
    run: concat-files.cwl
    in:
      a: d
      b: e
      c: f
    out: [out]

concat-files.cwl:

$namespaces:
  arv: "http://arvados.org/cwl#" 
  cwltool: "http://commonwl.org/cwltool#" 

cwlVersion: v1.0
class: CommandLineTool
hints:
  arv:APIRequirement: {}
  arv:ReuseRequirement:
    enableReuse: false

inputs:
  a: File
  b: File
  c: File

stdout: abc.txt

arguments:
  - cat
  - $(inputs.a)
  - $(inputs.b)
  - $(inputs.c)

outputs:
  out:
    type: stdout

And the call to arvados-cwl-runner:

arvados-cwl-runner --submit --no-wait --submit-runner-ram 2500 --thread-count=1 --project-uuid=e51c5-j7d0g-5rbacnqvduwtgvw --api=containers --name "concatfiles-container-reuse-test" container-reuse-debug-wf.cwl -a a.txt -b b.txt -c c.txt -d d.txt -e e.txt -f f.txt

This runs on Arvados as expected creating two output files. But, when I rerun arvados-cwl-runner the same way I do get container reuse. It doesn't seem that arv:ReuseRequirement is working in this scenario but I'm not sure if I'm misusing it or if there is a bug?

test-workflow.jpg (36.6 KB) test-workflow.jpg wf process calls Lucas Di Pentima, 10/23/2019 07:22 PM
cwl-reuse-test.tgz (72.4 KB) cwl-reuse-test.tgz workflow files Lucas Di Pentima, 10/23/2019 07:22 PM

Subtasks

Task #15748: Review 15497-reuse-fixResolvedEric Biagiotti

Associated revisions

Revision aa4b9b1d
Added by Lucas Di Pentima about 1 month ago

Merge branch '15497-reuse-fix'
Closes #15497

Revision bd6922ea (diff)
Added by Lucas Di Pentima 29 days ago

Merge branch '15497-reuse-fix'
Closes #15497

Revision 0e0b561d (diff)
Added by Lucas Di Pentima 29 days ago

Merge branch '15497-reuse-fix'
Closes #15497

History

#1 Updated by Stephen McLaughlin 4 months ago

  • Tracker changed from Support to Bug

#2 Updated by Lucas Di Pentima about 2 months ago

  • Status changed from New to In Progress

Sorry for the super late reply on this.

I think the predominant reuse behavior comes from the root workflow that's set to "true" because of the a-c-r's default setting. Only the sub jobs have the reuse behavior disabled explicitly, but they don't get to be evaluated because the root job is reused from a previous run.

On arvbox I manually tried both passing --disable-reuse to a-c-r, and adding the hint to the root workflow and in both cases the obtained behavior was correct, it didn't reused previous containers.

#3 Updated by Peter Amstutz about 1 month ago

If the toplevel arvados-cwl-runner process is reused, then it doesn't execute and thus doesn't have a chance to decide whether the child steps should be re-run.

Either the arvados-cwl-runner client should be more clever and traverse the workflow to look for steps that may need to be re-run (with WorkReuse: false) or arvados-cwl-runner client should never enable reuse for the top level workflow runner process (child steps can still be reused).

#4 Updated by Lucas Di Pentima about 1 month ago

  • Target version set to 2019-10-23 Sprint

#5 Updated by Lucas Di Pentima about 1 month ago

  • Target version changed from 2019-10-23 Sprint to 2019-11-06 Sprint

#6 Updated by Lucas Di Pentima about 1 month ago

  • Subject changed from arv:ReuseRequirement question possible bug (?) to [ACR] arv:ReuseRequirement question possible bug (?)
  • Project changed from Curoverse Support to Arvados

#7 Updated by Lucas Di Pentima about 1 month ago

  • Release set to 27

#9 Updated by Lucas Di Pentima about 1 month ago

I've been doing some tests on arvbox to fully understand what the issue is about.

I've attached my own test files and a drawing with how it's all assembled: there's a super-wf workflow that just calls another hash-output-wf that in turn calls a workflow and a couple of command line tools. One of those tools has the reuse=false hint to simulate an external resource that changes every time is read.

When super-wf is submitted with reuse=true, the whole job tree gets reused as this bug report description. Following Peter's comments on submitting the root wf with reuse=false, I did just that on super-wf and the result was that the entire tree of jobs was not reused, even the count-grepped-lines-wf and its children steps.

This is happening whether or not I run arvados-cwl-runner with --submit.

For testing purposes, you can reproduce what I did by running:

$ arvados-cwl-runner super-wf.cwl args.yml

It seems that a root step with reuse=false avoids its children being reused, is this behavior expected?

#10 Updated by Peter Amstutz about 1 month ago

Lucas Di Pentima wrote:

I've been doing some tests on arvbox to fully understand what the issue is about.

I've attached my own test files and a drawing with how it's all assembled: there's a super-wf workflow that just calls another hash-output-wf that in turn calls a workflow and a couple of command line tools. One of those tools has the reuse=false hint to simulate an external resource that changes every time is read.

When super-wf is submitted with reuse=true, the whole job tree gets reused as this bug report description. Following Peter's comments on submitting the root wf with reuse=false, I did just that on super-wf and the result was that the entire tree of jobs was not reused, even the count-grepped-lines-wf and its children steps.

This is happening whether or not I run arvados-cwl-runner with --submit.

For testing purposes, you can reproduce what I did by running:

[...]

It seems that a root step with reuse=false avoids its children being reused, is this behavior expected?

Yes that's expected because workflow level requirements are inherited by steps of the workflow, so it would only reuse steps which were then explicitly marked as with reuse=true.

#11 Updated by Lucas Di Pentima about 1 month ago

Updates at 36994ebb9 - branch 15497-reuse-fix
Test runs:

Never reuse runner containers so that individual reuse=false hints can be evaluated on the workflow.

#12 Updated by Eric Biagiotti about 1 month ago

Lucas Di Pentima wrote:

Updates at 36994ebb9 - branch 15497-reuse-fix
Test runs:

Never reuse runner containers so that individual reuse=false hints can be evaluated on the workflow.

So setting arv:ReuseRequirement to false in a workflow step never worked? Is there a way to write some integration tests for this? It also might be worth mentioning the bug number in the comment in arvcontainer.py.

Otherwise, this LGTM.

#13 Updated by Lucas Di Pentima about 1 month ago

The issue is when the user submits the wf with --submit, what happens is that a-c-r creates a "runner container" that will in turn run a-c-r inside de cluster. That runner container by default is set with reuse=true, so it will get reused after the 1st run and the included workflow doesn't get evaluated no matter which reuse behavior.

  • Added the issue # to the comment (was relying on git blame for that).
  • I think the current test suite is enough because the expected containers still have the --enable-reuse in their command key, but better to ask Peter for confirmation.

#14 Updated by Peter Amstutz about 1 month ago

I think this is fine. The more complicated solution would be to peer into the workflow and determine if it has non-reusable steps, but in my experience re-running the workflow runner is almost always what you want.

#15 Updated by Lucas Di Pentima about 1 month ago

  • Status changed from In Progress to Resolved

#16 Updated by Tom Morris 17 days ago

  • Subject changed from [ACR] arv:ReuseRequirement question possible bug (?) to [ACR] arv:ReuseRequirement enableReuse: false doesn't work

Also available in: Atom PDF