Project

General

Profile

Actions

Bug #11068

closed

[Arvados-CWL-runner] need better error message when there are issues getting the docker image

Added by Ward Vandewege about 7 years ago. Updated over 6 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
-
Target version:
Story points:
-

Description

The 9tee4 diagnostics job (e.g. 9tee4-xvhdp-yuv7hvb80fbb3lk) failed with this message:

2017-02-07T16:40:16.675203241Z Executing container '9tee4-dz642-g8msiiezegv6dye'
2017-02-07T16:40:16.675306842Z Executing on host 'compute0.9tee4.arvadosapi.com'
2017-02-07T16:40:16.701872067Z Fetching Docker image from collection 'bb5e399246e3ce463f9dca24cbe5457a+335'
2017-02-07T16:40:16.732104491Z Using Docker image id 'b273175cddfb2672685b7701653490c100884c64e65a3444db9a0c70e5d02f83'
2017-02-07T16:40:16.735103990Z Docker image is available
2017-02-07T16:40:17.093587136Z Creating Docker container
2017-02-07T16:40:17.795483495Z Attaching container streams
2017-02-07T16:40:17.870947173Z Starting Docker container id 'fde1821f59d63968fa3a27d63313d826aace767d1b1adc5d43045d94535f8375'
2017-02-07T16:40:18.134036546Z Waiting for container to finish
2017-02-07T16:40:18.097797521Z notice: reading stats from /sys/fs/cgroup/cpuacct/docker/fde1821f59d63968fa3a27d63313d826aace767d1b1adc5d43045d94535f8375/cgroup.procs
2017-02-07T16:40:18.097886671Z notice: reading stats from /sys/fs/cgroup/memory/docker/fde1821f59d63968fa3a27d63313d826aace767d1b1adc5d43045d94535f8375/memory.stat
2017-02-07T16:40:18.098155788Z mem 0 cache 0 swap 0 pgmajfault 876544 rss
2017-02-07T16:40:18.098202650Z notice: reading stats from /sys/fs/cgroup/cpuacct/docker/fde1821f59d63968fa3a27d63313d826aace767d1b1adc5d43045d94535f8375/cpuacct.stat
2017-02-07T16:40:18.098299730Z notice: reading stats from /sys/fs/cgroup/cpuset/docker/fde1821f59d63968fa3a27d63313d826aace767d1b1adc5d43045d94535f8375/cpuset.cpus
2017-02-07T16:40:18.098341572Z cpu 0.0100 user 0.0000 sys 20 cpus
2017-02-07T16:40:18.098373191Z notice: reading stats from /sys/fs/cgroup/blkio/docker/fde1821f59d63968fa3a27d63313d826aace767d1b1adc5d43045d94535f8375/blkio.io_service_bytes
2017-02-07T16:40:18.098549373Z net:eth0 0 tx 0 rx
2017-02-07T16:40:20.001184707Z 2017-02-07 16:40:20 cwltool INFO: /usr/bin/arvados-cwl-runner aaa45b09de0e9437743fce53d7c0bf8165074b5e 1.0.20170203223449, arvados-python-client 0.1.20170203223449, cwltool 1.0.20170119234115
2017-02-07T16:40:20.001544310Z 2017-02-07 16:40:20 cwltool INFO: Resolved '/var/lib/cwl/workflow.json#main' to 'file:///var/lib/cwl/workflow.json#main'
2017-02-07T16:40:20.292931320Z 2017-02-07 16:40:20 cwltool ERROR: Workflow error, try again with --debug for more information:
2017-02-07T16:40:20.292931320Z ../../lib/cwl/workflow.json:1:264: [Errno 2] No such file or directory
2017-02-07T16:40:21.890447683Z Complete

This wasn't exactly clear.

We re-ran the container request with --debug (as 9tee4-xvhdp-bfykiehawkcduh9), which showed the error path:

2017-02-07T17:48:39.192771836Z Executing container '9tee4-dz642-prcd58gh6c69s2c'
2017-02-07T17:48:39.192906547Z Executing on host 'compute0.9tee4.arvadosapi.com'
2017-02-07T17:48:39.219819476Z Fetching Docker image from collection 'bb5e399246e3ce463f9dca24cbe5457a+335'
2017-02-07T17:48:39.257291374Z Using Docker image id 'b273175cddfb2672685b7701653490c100884c64e65a3444db9a0c70e5d02f83'
2017-02-07T17:48:39.260113555Z Docker image is available
2017-02-07T17:48:39.610759818Z Creating Docker container
2017-02-07T17:48:40.411530002Z Attaching container streams
2017-02-07T17:48:40.484833355Z Starting Docker container id '3f672e5be9356276d596d07123752d2376aa6ac495c4e936b6dc7c2042e56742'
2017-02-07T17:48:40.737958719Z Waiting for container to finish
2017-02-07T17:48:40.713845501Z notice: reading stats from /sys/fs/cgroup/cpuacct/docker/3f672e5be9356276d596d07123752d2376aa6ac495c4e936b6dc7c2042e56742/cgroup.procs
2017-02-07T17:48:40.713913821Z notice: reading stats from /sys/fs/cgroup/memory/docker/3f672e5be9356276d596d07123752d2376aa6ac495c4e936b6dc7c2042e56742/memory.stat
2017-02-07T17:48:40.714180103Z mem 0 cache 0 swap 0 pgmajfault 3338240 rss
2017-02-07T17:48:40.714226835Z notice: reading stats from /sys/fs/cgroup/cpuacct/docker/3f672e5be9356276d596d07123752d2376aa6ac495c4e936b6dc7c2042e56742/cpuacct.stat
2017-02-07T17:48:40.714315071Z notice: reading stats from /sys/fs/cgroup/cpuset/docker/3f672e5be9356276d596d07123752d2376aa6ac495c4e936b6dc7c2042e56742/cpuset.cpus
2017-02-07T17:48:40.714354801Z cpu 0.0100 user 0.0100 sys 20 cpus
2017-02-07T17:48:40.714381461Z notice: reading stats from /sys/fs/cgroup/blkio/docker/3f672e5be9356276d596d07123752d2376aa6ac495c4e936b6dc7c2042e56742/blkio.io_service_bytes
2017-02-07T17:48:40.714531430Z net:eth0 90 tx 90 rx
2017-02-07T17:48:42.677454263Z 2017-02-07 17:48:42 cwltool INFO: /usr/bin/arvados-cwl-runner aaa45b09de0e9437743fce53d7c0bf8165074b5e 1.0.20170203223449, arvados-python-client 0.1.20170203223449, cwltool 1.0.20170119234115
2017-02-07T17:48:42.677946190Z 2017-02-07 17:48:42 cwltool INFO: Resolved '/var/lib/cwl/workflow.json#main' to 'file:///var/lib/cwl/workflow.json#main'
2017-02-07T17:48:42.975179073Z 2017-02-07 17:48:42 cwltool ERROR: Workflow error, try again with --debug for more information:
2017-02-07T17:48:42.975179073Z ../../lib/cwl/workflow.json:1:264: [Errno 2] No such file or directory
2017-02-07T17:48:42.975179073Z Traceback (most recent call last):
2017-02-07T17:48:42.975179073Z   File "/usr/lib/python2.7/dist-packages/cwltool/main.py", line 717, in main
2017-02-07T17:48:42.975179073Z     **vars(args))
2017-02-07T17:48:42.975179073Z   File "/usr/lib/python2.7/dist-packages/arvados_cwl/__init__.py", line 347, in arv_executor
2017-02-07T17:48:42.975179073Z     upload_workflow_deps(self, tool)
2017-02-07T17:48:42.975179073Z   File "/usr/lib/python2.7/dist-packages/arvados_cwl/runner.py", line 197, in upload_workflow_deps
2017-02-07T17:48:42.975179073Z     upload_docker(arvrunner, tool)
2017-02-07T17:48:42.975179073Z   File "/usr/lib/python2.7/dist-packages/arvados_cwl/runner.py", line 135, in upload_docker
2017-02-07T17:48:42.975179073Z     arv_docker_get_image(arvrunner.api, docker_req, True, arvrunner.project_uuid)
2017-02-07T17:48:42.975179073Z   File "/usr/lib/python2.7/dist-packages/arvados_cwl/arvdocker.py", line 71, in arv_docker_get_image
2017-02-07T17:48:42.975179073Z     return pdh
2017-02-07T17:48:42.975179073Z   File "/usr/lib/python2.7/dist-packages/schema_salad/sourceline.py", line 146, in __exit__
2017-02-07T17:48:42.975179073Z     raise self.makeError(unicode(exc_value))
2017-02-07T17:48:42.975179073Z WorkflowException: ../../lib/cwl/workflow.json:1:264: [Errno 2] No such file or directory
2017-02-07T17:48:44.525795385Z Complete

I will fix the permissions problem, but the error message should be improved so re-running with --debug is not necessary.


Subtasks 1 (0 open1 closed)

Task #12200: Review 11068-cwl-missing-dockerResolvedPeter Amstutz09/21/2017Actions
Actions #1

Updated by Ward Vandewege about 7 years ago

  • Description updated (diff)
Actions #2

Updated by Tom Morris over 6 years ago

  • Target version set to Arvados Future Sprints
Actions #3

Updated by Tom Morris over 6 years ago

  • Target version changed from Arvados Future Sprints to 2017-09-13 Sprint
Actions #4

Updated by Peter Amstutz over 6 years ago

  • Assigned To set to Peter Amstutz
Actions #5

Updated by Peter Amstutz over 6 years ago

  • Target version changed from 2017-09-13 Sprint to 2017-09-27 Sprint
Actions #6

Updated by Peter Amstutz over 6 years ago

Improved error handling/reporting: 11068-cwl-missing-docker

Actions #7

Updated by Lucas Di Pentima over 6 years ago

This LGTM.

I think it would be nice to ask the cwltool project to enhance its error reporting when executing external commands on cwltool.docker.get_image() to avoid this kind of issues in the future.

Actions #8

Updated by Peter Amstutz over 6 years ago

  • Status changed from New to Resolved
  • % Done changed from 0 to 100

Applied in changeset arvados|commit:58c6f3aa42f4f30fc4a764ca56ab1a198754b69b.

Actions

Also available in: Atom PDF