Idea #16602
closedWorkbench 2 uses correct version of arvados/jobs when submitting a workflow, not "latest"
Description
Define a new container image field for the arv:WorkflowRunnerResources hint.
The upload_workflow() function should set the container image, using the same arvados_jobs_image logic with "self.jobs_image" that would be used to submit the workflow to run.
Workbench (both 1 and 2) should look for arv:WorkflowRunnerResources and apply coresMin/ramMin/keep_cache/container image to the container request.
Updated by Peter Amstutz over 4 years ago
- Target version changed from 2020-08-12 Sprint to 2020-08-26 Sprint
Updated by Peter Amstutz over 4 years ago
- Related to Bug #16565: Don't upload development images as arvados/jobs:latest to Docker hub added
Updated by Peter Amstutz over 4 years ago
- Status changed from New to In Progress
Updated by Peter Amstutz over 4 years ago
The hard part isn't changing what version it submits, it is using the right version.
Proposed solution:
Add a new "container_image" field to the workflow table.
When Workbench submits a container request, it uses "container_image" instead of using arvados/jobs:latest.
This ensures that the runner version that will be used to run the workflow corresponds to the one used to pack the workflow in the first place.
If necessary, the docker image of the runner is uploaded to the same project-uuid of the workflow.
Updated by Peter Amstutz over 4 years ago
This could also be put in arvados-specific metadata in the workflow definition, with the drawback that you have to parse the yaml to find it and can't query filter on it.
Updated by Peter Amstutz over 4 years ago
On further research, Workbench already looks for a hint http://arvados.org/cwl#WorkflowRunnerResources which influences the container request, this hint gets set on the fly by arvados-cwl-runner based on the --submit-runner-ram flag. So the logical thing to do is to add the container image into that hint and teach both workbenches to use it.
Updated by Peter Amstutz over 4 years ago
- Target version changed from 2020-08-26 Sprint to 2020-09-09 Sprint
Updated by Peter Amstutz over 4 years ago
16602-wb-acr-version @ 4ddeddd8d58e502dd471213198f421c842e7a5c7
- arvados-cwl-runner sets
acrContainerImage
and workbench 1 uses it.
Next up, add support in workbench 2.
Updated by Peter Amstutz over 4 years ago
16602-wb2-acr-version @ arvados-workbench2|3e14ac8582fb8f73fd1807fa0a3b10c88cc89921
Updated by Lucas Di Pentima over 4 years ago
Some comments and questions:
- ACR/WB1 branch:
16602-wb-acr-version
- Is it worth to add a wb1 test for this?
- Tried to test using arvbox but had an error when trying to register a workflow. No sure if this is related to the current update:
root@663c6c4b8811:/usr/src/arvados/sdk/cwl/tests/wf/revsort# arvados-cwl-runner --create-workflow revsort.cwl INFO /usr/local/bin/arvados-cwl-runner 2.1.0.dev20200826212244, arvados-python-client 2.1.0.dev20200814195416, cwltool 3.0.20200807132242 INFO Resolved 'revsort.cwl' to 'file:///usr/src/arvados/sdk/cwl/tests/wf/revsort/revsort.cwl' INFO Using empty collection d41d8cd98f00b204e9800998ecf8427e+0 INFO ['docker', 'pull', 'arvados/jobs'] Using default tag: latest latest: Pulling from arvados/jobs 8559a31e96f4: Pull complete 017576171510: Pull complete 8b47f0d30222: Pull complete c73cfde5c474: Pull complete 67c8c916cf1a: Pull complete be6affc9cc07: Pull complete da4b1effc0b3: Pull complete db23a5ceaaee: Pull complete 6a212e3ef088: Pull complete 63feb2687d75: Pull complete 0c0175320747: Pull complete Digest: sha256:5ad1019f39bb3284c56e2c084455b582f9a2c2899e56891efdbcedf360a9120e Status: Downloaded newer image for arvados/jobs:latest INFO Uploading Docker image arvados/jobs:latest 2020-08-27 16:01:17 arvados.arv_put[22172] INFO: Creating new cache file at /root/.cache/arvados/arv-put/5ac6901257143f2f080f2bdfea7632f7 281M / 281M 100.0% 2020-08-27 16:01:19 arvados.arv_put[22172] INFO: 2020-08-27 16:01:19 arvados.arv_put[22172] INFO: Collection saved as 'Docker image arvados jobs:latest sha256:e67b8' x1yo2-4zz18-xrl86w0pdj7vjs2 2020-08-27 16:01:20 arvados.arv-run[22172] INFO: Using empty collection d41d8cd98f00b204e9800998ecf8427e+0 2020-08-27 16:01:20 arvados.arv-run[22172] INFO: Using empty collection d41d8cd98f00b204e9800998ecf8427e+0 2020-08-27 16:01:20 arvados.arv-run[22172] INFO: Using empty collection d41d8cd98f00b204e9800998ecf8427e+0 2020-08-27 16:01:20 arvados.arv-run[22172] INFO: Using empty collection d41d8cd98f00b204e9800998ecf8427e+0 2020-08-27 16:01:20 cwltool[22172] INFO: ['docker', 'pull', 'arvados/jobs:2.1.0.dev20200826212244'] Error response from daemon: manifest for arvados/jobs:2.1.0.dev20200826212244 not found 2020-08-27 16:01:23 cwltool[22172] ERROR: Unhandled error, try again with --debug for more information: Docker image arvados/jobs:2.1.0.dev20200826212244 is not available Command '['docker', 'pull', 'arvados/jobs:2.1.0.dev20200826212244']' returned non-zero exit status 1 (X-Request-Id: req-9b02obgqb4fi0zwmg0um)
- WB2 branch:
16602-wb2-acr-version
(testing againstce8i5
)- API hint was removed, is it not used anymore?
- I think it would be useful to have a test for this feature, at least at the action level (in case you don’t think is worth it at UI level because it’ll be revamped soon)
- At WF running time, if the user adds something wrong on the Runner field, the js console registers a 422 error, but the UI shows a popup window with a generic “not found” message. This is probably because the API error includes the ’not found’ string. Not sure if we should tackle this if we’re going to re-do the wf UI.
- Test suite failed on Jenkins and also when ran it locally
Updated by Peter Amstutz over 4 years ago
Lucas Di Pentima wrote:
Some comments and questions:
- ACR/WB1 branch:
16602-wb-acr-version
- Is it worth to add a wb1 test for this?
Took longer than I would have liked but I did manage to add a wb1 test.
- Tried to test using arvbox but had an error when trying to register a workflow. No sure if this is related to the current update:
[...]
That's expected. You need to run arvados/build/build-dev-docker-jobs-image.sh
to build a development image. The behavior for setting the workflow runner image is now the same as if you submitted it at the command line.
- WB2 branch:
16602-wb2-acr-version
(testing againstce8i5
)
- API hint was removed, is it not used anymore?
For some reason, they had both "api" and "API" in runtimeConstraints. The lowercase "api" doesn't do anything. Also there's no reason you would want to disable API access for the workflow runner because it needs it to work.
- I think it would be useful to have a test for this feature, at least at the action level (in case you don’t think is worth it at UI level because it’ll be revamped soon)
- Test suite failed on Jenkins and also when ran it locally
I updated the test and it includes coverage of using WorkflowRunnerResources in the container request.
- At WF running time, if the user adds something wrong on the Runner field, the js console registers a 422 error, but the UI shows a popup window with a generic “not found” message. This is probably because the API error includes the ’not found’ string. Not sure if we should tackle this if we’re going to re-do the wf UI.
Yea, I don't want to get into that here. The user shouldn't mess with "advanced" fields if they don't know what to expect.
16602-wb-acr-version @ 940ca546c8150200c64e44f0ecbc30c9d4e59bc5
16602-wb2-acr-version @ arvados-workbench2|3883f8b6c488e6db4be01b57f57f9635ebe0a993
Updated by Lucas Di Pentima over 4 years ago
Updated by Lucas Di Pentima over 4 years ago
- Tests are failing, but not sure if the failures are related to these branches.
- WB2 fails to work because of a missing semicolon at
src/store/run-process-panel/run-process-panel-actions.ts
line 95
Other than that, LGTM!
Updated by Peter Amstutz over 4 years ago
16602-wb-acr-version @ b36ffab0228e53226614f7d33e4a8e3921d0256f
16602-wb2-acr-version @ arvados-workbench2|5d95075cdfdc2ca21f262f23355320f8aa96b25e
Updated by Peter Amstutz over 4 years ago
- Status changed from In Progress to Resolved