Story #16602

Workbench 2 uses correct version of arvados/jobs when submitting a workflow, not "latest"

Added by Peter Amstutz over 1 year ago. Updated about 1 year ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
-
Target version:
Start date:
08/26/2020
Due date:
% Done:

100%

Estimated time:
(Total: 0.00 h)
Story points:
-
Release relationship:
Auto

Description

Define a new container image field for the arv:WorkflowRunnerResources hint.

The upload_workflow() function should set the container image, using the same arvados_jobs_image logic with "self.jobs_image" that would be used to submit the workflow to run.

Workbench (both 1 and 2) should look for arv:WorkflowRunnerResources and apply coresMin/ramMin/keep_cache/container image to the container request.


Subtasks

Task #16693: Review 16602-wb-acr-versionResolvedPeter Amstutz

Task #16770: Review 16602-wb2-acr-versionResolvedPeter Amstutz


Related issues

Related to Arvados - Bug #16565: Don't upload development images as arvados/jobs:latest to Docker hubResolved08/19/2020

Associated revisions

Revision 9168f531
Added by Peter Amstutz about 1 year ago

Merge branch '16602-wb2-acr-version' refs #16602

Arvados-DCO-1.1-Signed-off-by: Peter Amstutz <>

Revision 8991b439
Added by Peter Amstutz about 1 year ago

Merge branch '16602-wb-acr-version' refs #16602

Arvados-DCO-1.1-Signed-off-by: Peter Amstutz <>

History

#1 Updated by Peter Amstutz over 1 year ago

  • Target version changed from 2020-08-12 Sprint to 2020-08-26 Sprint

#2 Updated by Peter Amstutz about 1 year ago

  • Assigned To set to Peter Amstutz

#3 Updated by Peter Amstutz about 1 year ago

  • Related to Bug #16565: Don't upload development images as arvados/jobs:latest to Docker hub added

#4 Updated by Peter Amstutz about 1 year ago

  • Status changed from New to In Progress

#5 Updated by Peter Amstutz about 1 year ago

The hard part isn't changing what version it submits, it is using the right version.

Proposed solution:

Add a new "container_image" field to the workflow table.

When Workbench submits a container request, it uses "container_image" instead of using arvados/jobs:latest.

This ensures that the runner version that will be used to run the workflow corresponds to the one used to pack the workflow in the first place.

If necessary, the docker image of the runner is uploaded to the same project-uuid of the workflow.

#6 Updated by Peter Amstutz about 1 year ago

This could also be put in arvados-specific metadata in the workflow definition, with the drawback that you have to parse the yaml to find it and can't query filter on it.

#7 Updated by Peter Amstutz about 1 year ago

On further research, Workbench already looks for a hint http://arvados.org/cwl#WorkflowRunnerResources which influences the container request, this hint gets set on the fly by arvados-cwl-runner based on the --submit-runner-ram flag. So the logical thing to do is to add the container image into that hint and teach both workbenches to use it.

#8 Updated by Peter Amstutz about 1 year ago

  • Description updated (diff)

#9 Updated by Peter Amstutz about 1 year ago

  • Target version changed from 2020-08-26 Sprint to 2020-09-09 Sprint

#10 Updated by Peter Amstutz about 1 year ago

16602-wb-acr-version @ 4ddeddd8d58e502dd471213198f421c842e7a5c7

  • arvados-cwl-runner sets acrContainerImage and workbench 1 uses it.

Next up, add support in workbench 2.

https://ci.arvados.org/view/Developer/job/developer-run-tests/2040/

#12 Updated by Lucas Di Pentima about 1 year ago

Some comments and questions:

  • ACR/WB1 branch: 16602-wb-acr-version
    • Is it worth to add a wb1 test for this?
    • Tried to test using arvbox but had an error when trying to register a workflow. No sure if this is related to the current update:
      root@663c6c4b8811:/usr/src/arvados/sdk/cwl/tests/wf/revsort# arvados-cwl-runner --create-workflow revsort.cwl
      INFO /usr/local/bin/arvados-cwl-runner 2.1.0.dev20200826212244, arvados-python-client 2.1.0.dev20200814195416, cwltool 3.0.20200807132242
      INFO Resolved 'revsort.cwl' to 'file:///usr/src/arvados/sdk/cwl/tests/wf/revsort/revsort.cwl'
      INFO Using empty collection d41d8cd98f00b204e9800998ecf8427e+0
      INFO ['docker', 'pull', 'arvados/jobs']
      Using default tag: latest
      latest: Pulling from arvados/jobs
      8559a31e96f4: Pull complete
      017576171510: Pull complete
      8b47f0d30222: Pull complete
      c73cfde5c474: Pull complete
      67c8c916cf1a: Pull complete
      be6affc9cc07: Pull complete
      da4b1effc0b3: Pull complete
      db23a5ceaaee: Pull complete
      6a212e3ef088: Pull complete
      63feb2687d75: Pull complete
      0c0175320747: Pull complete
      Digest: sha256:5ad1019f39bb3284c56e2c084455b582f9a2c2899e56891efdbcedf360a9120e
      Status: Downloaded newer image for arvados/jobs:latest
      INFO Uploading Docker image arvados/jobs:latest
      2020-08-27 16:01:17 arvados.arv_put[22172] INFO: Creating new cache file at /root/.cache/arvados/arv-put/5ac6901257143f2f080f2bdfea7632f7
      281M / 281M 100.0% 2020-08-27 16:01:19 arvados.arv_put[22172] INFO:
      
      2020-08-27 16:01:19 arvados.arv_put[22172] INFO: Collection saved as 'Docker image arvados jobs:latest sha256:e67b8'
      x1yo2-4zz18-xrl86w0pdj7vjs2
      2020-08-27 16:01:20 arvados.arv-run[22172] INFO: Using empty collection d41d8cd98f00b204e9800998ecf8427e+0
      2020-08-27 16:01:20 arvados.arv-run[22172] INFO: Using empty collection d41d8cd98f00b204e9800998ecf8427e+0
      2020-08-27 16:01:20 arvados.arv-run[22172] INFO: Using empty collection d41d8cd98f00b204e9800998ecf8427e+0
      2020-08-27 16:01:20 arvados.arv-run[22172] INFO: Using empty collection d41d8cd98f00b204e9800998ecf8427e+0
      2020-08-27 16:01:20 cwltool[22172] INFO: ['docker', 'pull', 'arvados/jobs:2.1.0.dev20200826212244']
      Error response from daemon: manifest for arvados/jobs:2.1.0.dev20200826212244 not found
      2020-08-27 16:01:23 cwltool[22172] ERROR: Unhandled error, try again with --debug for more information:
        Docker image arvados/jobs:2.1.0.dev20200826212244 is not available
      Command '['docker', 'pull', 'arvados/jobs:2.1.0.dev20200826212244']' returned non-zero exit status 1 (X-Request-Id: req-9b02obgqb4fi0zwmg0um)
      
  • WB2 branch: 16602-wb2-acr-version (testing against ce8i5)
    • API hint was removed, is it not used anymore?
    • I think it would be useful to have a test for this feature, at least at the action level (in case you don’t think is worth it at UI level because it’ll be revamped soon)
    • At WF running time, if the user adds something wrong on the Runner field, the js console registers a 422 error, but the UI shows a popup window with a generic “not found” message. This is probably because the API error includes the ’not found’ string. Not sure if we should tackle this if we’re going to re-do the wf UI.
    • Test suite failed on Jenkins and also when ran it locally

#13 Updated by Peter Amstutz about 1 year ago

Lucas Di Pentima wrote:

Some comments and questions:

  • ACR/WB1 branch: 16602-wb-acr-version
    • Is it worth to add a wb1 test for this?

Took longer than I would have liked but I did manage to add a wb1 test.

  • Tried to test using arvbox but had an error when trying to register a workflow. No sure if this is related to the current update:
    [...]

That's expected. You need to run arvados/build/build-dev-docker-jobs-image.sh to build a development image. The behavior for setting the workflow runner image is now the same as if you submitted it at the command line.

  • WB2 branch: 16602-wb2-acr-version (testing against ce8i5)
    • API hint was removed, is it not used anymore?

For some reason, they had both "api" and "API" in runtimeConstraints. The lowercase "api" doesn't do anything. Also there's no reason you would want to disable API access for the workflow runner because it needs it to work.

  • I think it would be useful to have a test for this feature, at least at the action level (in case you don’t think is worth it at UI level because it’ll be revamped soon)
  • Test suite failed on Jenkins and also when ran it locally

I updated the test and it includes coverage of using WorkflowRunnerResources in the container request.

  • At WF running time, if the user adds something wrong on the Runner field, the js console registers a 422 error, but the UI shows a popup window with a generic “not found” message. This is probably because the API error includes the ’not found’ string. Not sure if we should tackle this if we’re going to re-do the wf UI.

Yea, I don't want to get into that here. The user shouldn't mess with "advanced" fields if they don't know what to expect.

16602-wb-acr-version @ 940ca546c8150200c64e44f0ecbc30c9d4e59bc5

https://ci.arvados.org/view/Developer/job/developer-run-tests/2047/

16602-wb2-acr-version @ arvados-workbench2|3883f8b6c488e6db4be01b57f57f9635ebe0a993

https://ci.arvados.org/view/Developer/job/developer-tests-workbench2/83/

#15 Updated by Lucas Di Pentima about 1 year ago

  • Tests are failing, but not sure if the failures are related to these branches.
  • WB2 fails to work because of a missing semicolon at src/store/run-process-panel/run-process-panel-actions.ts line 95

Other than that, LGTM!

#18 Updated by Peter Amstutz about 1 year ago

  • Status changed from In Progress to Resolved

#19 Updated by Peter Amstutz about 1 year ago

  • Release set to 25

Also available in: Atom PDF