Bug #12355
closedrun-arvados-cwl-conformance-tests really slow
Description
It is taking hours to complete. For some reason the "sudo" command to run crunch-job is taking 75 seconds to start.
2017-09-27_18:41:15.94921 dispatch: sudo -E -u crunch LD_LIBRARY_PATH= PATH=/var/lib/gems/ruby/2.3.0/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/var/lib/gems/bin:/usr/local /go/bin PERLLIB=/usr/src/arvados/sdk/perl/lib PYTHONPATH= RUBYLIB=/var/lib/gems/gems/bundler-1.15.4/lib GEM_PATH= /usr/src/arvados/sdk/cli/bin/crunch-job --job-api-token 2e6f0zp3ralzjqscwu38czli w6353bgziuat739i74sy12vaav --job mqkj9-8i9sb-0p5a2vn0evw4oa9 --git-dir /var/lib/arvados/internal.git --docker-bin docker --docker-run-args --net=host 2017-09-27_18:41:15.94968 dispatch: job mqkj9-8i9sb-0p5a2vn0evw4oa9 2017-09-27_18:41:15.94970 dispatch: child 28624 start 2017-09-27_18:41:15 2017-09-27_18:41:17.91463 2017-09-27_18:41:17.91467 2017-09-27 18:41:17 +0000 -- pipeline_instance mqkj9-d1hrv-f2qy2hjqsxph10c 2017-09-27_18:41:17.91467 cwl-runner mqkj9-8i9sb-j6koxtmv6dtazgd {:failed=>0, :running=>1, :todo=>0, :done=>0} 2017-09-27_18:41:23.72848 2017-09-27_18:41:23.72852 2017-09-27 18:41:23 +0000 -- pipeline_instance mqkj9-d1hrv-f2qy2hjqsxph10c 2017-09-27_18:41:23.72852 cwl-runner mqkj9-8i9sb-j6koxtmv6dtazgd {:failed=>0, :running=>1, :todo=>0, :done=>0} 2017-09-27_18:41:29.56375 2017-09-27_18:41:29.56379 2017-09-27 18:41:29 +0000 -- pipeline_instance mqkj9-d1hrv-f2qy2hjqsxph10c 2017-09-27_18:41:29.56379 cwl-runner mqkj9-8i9sb-j6koxtmv6dtazgd {:failed=>0, :running=>1, :todo=>0, :done=>0} 2017-09-27_18:41:35.34087 2017-09-27_18:41:35.34090 2017-09-27 18:41:35 +0000 -- pipeline_instance mqkj9-d1hrv-f2qy2hjqsxph10c 2017-09-27_18:41:35.34090 cwl-runner mqkj9-8i9sb-j6koxtmv6dtazgd {:failed=>0, :running=>1, :todo=>0, :done=>0} 2017-09-27_18:41:41.18088 2017-09-27_18:41:41.18092 2017-09-27 18:41:41 +0000 -- pipeline_instance mqkj9-d1hrv-f2qy2hjqsxph10c 2017-09-27_18:41:41.18092 cwl-runner mqkj9-8i9sb-j6koxtmv6dtazgd {:failed=>0, :running=>1, :todo=>0, :done=>0} 2017-09-27_18:41:46.98528 2017-09-27_18:41:46.98531 2017-09-27 18:41:46 +0000 -- pipeline_instance mqkj9-d1hrv-f2qy2hjqsxph10c 2017-09-27_18:41:46.98531 cwl-runner mqkj9-8i9sb-j6koxtmv6dtazgd {:failed=>0, :running=>1, :todo=>0, :done=>0} 2017-09-27_18:41:52.81456 2017-09-27_18:41:52.81458 2017-09-27 18:41:52 +0000 -- pipeline_instance mqkj9-d1hrv-f2qy2hjqsxph10c 2017-09-27_18:41:52.81459 cwl-runner mqkj9-8i9sb-j6koxtmv6dtazgd {:failed=>0, :running=>1, :todo=>0, :done=>0} 2017-09-27_18:41:58.58492 2017-09-27_18:41:58.58495 2017-09-27 18:41:58 +0000 -- pipeline_instance mqkj9-d1hrv-f2qy2hjqsxph10c 2017-09-27_18:41:58.58495 cwl-runner mqkj9-8i9sb-j6koxtmv6dtazgd {:failed=>0, :running=>1, :todo=>0, :done=>0} 2017-09-27_18:42:04.37249 2017-09-27_18:42:04.37252 2017-09-27 18:42:04 +0000 -- pipeline_instance mqkj9-d1hrv-f2qy2hjqsxph10c 2017-09-27_18:42:04.37252 cwl-runner mqkj9-8i9sb-j6koxtmv6dtazgd {:failed=>0, :running=>1, :todo=>0, :done=>0} 2017-09-27_18:42:10.17346 2017-09-27_18:42:10.17349 2017-09-27 18:42:10 +0000 -- pipeline_instance mqkj9-d1hrv-f2qy2hjqsxph10c 2017-09-27_18:42:10.17350 cwl-runner mqkj9-8i9sb-j6koxtmv6dtazgd {:failed=>0, :running=>1, :todo=>0, :done=>0} 2017-09-27_18:42:15.96954 2017-09-27_18:42:15.96957 2017-09-27 18:42:15 +0000 -- pipeline_instance mqkj9-d1hrv-f2qy2hjqsxph10c 2017-09-27_18:42:15.96958 cwl-runner mqkj9-8i9sb-j6koxtmv6dtazgd {:failed=>0, :running=>1, :todo=>0, :done=>0} 2017-09-27_18:42:21.76766 2017-09-27_18:42:21.76769 2017-09-27 18:42:21 +0000 -- pipeline_instance mqkj9-d1hrv-f2qy2hjqsxph10c 2017-09-27_18:42:21.76769 cwl-runner mqkj9-8i9sb-j6koxtmv6dtazgd {:failed=>0, :running=>1, :todo=>0, :done=>0} 2017-09-27_18:42:27.53407 2017-09-27_18:42:27.53409 2017-09-27 18:42:27 +0000 -- pipeline_instance mqkj9-d1hrv-f2qy2hjqsxph10c 2017-09-27_18:42:27.53410 cwl-runner mqkj9-8i9sb-j6koxtmv6dtazgd {:failed=>0, :running=>1, :todo=>0, :done=>0} 2017-09-27_18:42:33.32951 2017-09-27_18:42:33.32953 2017-09-27 18:42:33 +0000 -- pipeline_instance mqkj9-d1hrv-f2qy2hjqsxph10c 2017-09-27_18:42:33.32954 cwl-runner mqkj9-8i9sb-j6koxtmv6dtazgd {:failed=>0, :running=>1, :todo=>0, :done=>0} 2017-09-27_18:42:36.65370 mqkj9-8i9sb-0p5a2vn0evw4oa9 ! 28626 Sanity check is `docker ps -q`
Updated by Peter Amstutz over 7 years ago
- Description updated (diff)
- Target version deleted (
2017-10-11 Sprint)
Updated by Peter Amstutz over 7 years ago
- Target version set to 2017-10-11 Sprint
Updated by Ward Vandewege about 7 years ago
We run these tests in arvbox in localdemo mode, which generates lots of docker volumes which it leaves lying around.
ci:~# docker volume list |wc 485 971 41172 ci:~# docker volume prune ... Total reclaimed space: 99.51GB ci:~# ci:~# ci:~# ci:~# docker volume list |wc 6 13 457
Updated by Ward Vandewege about 7 years ago
- Status changed from New to In Progress
Updated by Ward Vandewege about 7 years ago
Here's an example of a particularly slow test job (this doesn't happen to all the tests):
root@8bc7b8700ff1:/etc/service# grep 3opjy-8i9sb-8l1vnvznj0hrl4x */log/main/current crunch-dispatch0/log/main/current:2017-10-01_23:24:35.21536 dispatch: ["git", "--git-dir=/var/lib/arvados/internal.git", "tag", "3opjy-8i9sb-8l1vnvznj0hrl4x", "a08e3bb86caa758df7d33a3df3f6b8c333e47838"] crunch-dispatch0/log/main/current:2017-10-01_23:24:35.21908 dispatch: sudo -E -u crunch LD_LIBRARY_PATH= PATH=/var/lib/gems/ruby/2.3.0/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/var/lib/gems/bin:/usr/local/go/bin PERLLIB=/usr/src/arvados/sdk/perl/lib PYTHONPATH= RUBYLIB=/var/lib/gems/gems/bundler-1.15.4/lib GEM_PATH= /usr/src/arvados/sdk/cli/bin/crunch-job --job-api-token 4hja3jfqayqm5x5m1vu669ivuak5bk6zsenyndkdtg14ibgs3z --job 3opjy-8i9sb-8l1vnvznj0hrl4x --git-dir /var/lib/arvados/internal.git --docker-bin docker --docker-run-args --net=host crunch-dispatch0/log/main/current:2017-10-01_23:24:35.21964 dispatch: job 3opjy-8i9sb-8l1vnvznj0hrl4x crunch-dispatch0/log/main/current:2017-10-01_23:24:38.05689 cwl-runner 3opjy-8i9sb-8l1vnvznj0hrl4x queued 2017-10-01T23:24:34.735906000Z crunch-dispatch0/log/main/current:2017-10-01_23:24:43.89373 cwl-runner 3opjy-8i9sb-8l1vnvznj0hrl4x queued 2017-10-01T23:24:34.735906000Z crunch-dispatch0/log/main/current:2017-10-01_23:24:49.77193 cwl-runner 3opjy-8i9sb-8l1vnvznj0hrl4x queued 2017-10-01T23:24:34.735906000Z crunch-dispatch0/log/main/current:2017-10-01_23:24:55.65407 cwl-runner 3opjy-8i9sb-8l1vnvznj0hrl4x queued 2017-10-01T23:24:34.735906000Z crunch-dispatch0/log/main/current:2017-10-01_23:25:01.49044 cwl-runner 3opjy-8i9sb-8l1vnvznj0hrl4x queued 2017-10-01T23:24:34.735906000Z crunch-dispatch0/log/main/current:2017-10-01_23:25:07.33950 cwl-runner 3opjy-8i9sb-8l1vnvznj0hrl4x queued 2017-10-01T23:24:34.735906000Z crunch-dispatch0/log/main/current:2017-10-01_23:25:13.26439 cwl-runner 3opjy-8i9sb-8l1vnvznj0hrl4x queued 2017-10-01T23:24:34.735906000Z crunch-dispatch0/log/main/current:2017-10-01_23:25:19.11267 cwl-runner 3opjy-8i9sb-8l1vnvznj0hrl4x queued 2017-10-01T23:24:34.735906000Z crunch-dispatch0/log/main/current:2017-10-01_23:25:24.95535 cwl-runner 3opjy-8i9sb-8l1vnvznj0hrl4x queued 2017-10-01T23:24:34.735906000Z crunch-dispatch0/log/main/current:2017-10-01_23:25:30.80978 cwl-runner 3opjy-8i9sb-8l1vnvznj0hrl4x queued 2017-10-01T23:24:34.735906000Z crunch-dispatch0/log/main/current:2017-10-01_23:25:36.66150 cwl-runner 3opjy-8i9sb-8l1vnvznj0hrl4x queued 2017-10-01T23:24:34.735906000Z crunch-dispatch0/log/main/current:2017-10-01_23:25:42.51133 cwl-runner 3opjy-8i9sb-8l1vnvznj0hrl4x queued 2017-10-01T23:24:34.735906000Z crunch-dispatch0/log/main/current:2017-10-01_23:25:48.41501 cwl-runner 3opjy-8i9sb-8l1vnvznj0hrl4x queued 2017-10-01T23:24:34.735906000Z crunch-dispatch0/log/main/current:2017-10-01_23:25:54.29049 cwl-runner 3opjy-8i9sb-8l1vnvznj0hrl4x queued 2017-10-01T23:24:34.735906000Z crunch-dispatch0/log/main/current:2017-10-01_23:26:00.17030 cwl-runner 3opjy-8i9sb-8l1vnvznj0hrl4x queued 2017-10-01T23:24:34.735906000Z crunch-dispatch0/log/main/current:2017-10-01_23:26:06.02121 cwl-runner 3opjy-8i9sb-8l1vnvznj0hrl4x queued 2017-10-01T23:24:34.735906000Z crunch-dispatch0/log/main/current:2017-10-01_23:26:11.85058 cwl-runner 3opjy-8i9sb-8l1vnvznj0hrl4x queued 2017-10-01T23:24:34.735906000Z crunch-dispatch0/log/main/current:2017-10-01_23:26:17.69224 cwl-runner 3opjy-8i9sb-8l1vnvznj0hrl4x queued 2017-10-01T23:24:34.735906000Z crunch-dispatch0/log/main/current:2017-10-01_23:26:23.59006 cwl-runner 3opjy-8i9sb-8l1vnvznj0hrl4x queued 2017-10-01T23:24:34.735906000Z crunch-dispatch0/log/main/current:2017-10-01_23:26:29.50650 cwl-runner 3opjy-8i9sb-8l1vnvznj0hrl4x queued 2017-10-01T23:24:34.735906000Z crunch-dispatch0/log/main/current:2017-10-01_23:26:35.35775 cwl-runner 3opjy-8i9sb-8l1vnvznj0hrl4x queued 2017-10-01T23:24:34.735906000Z crunch-dispatch0/log/main/current:2017-10-01_23:26:41.16927 cwl-runner 3opjy-8i9sb-8l1vnvznj0hrl4x queued 2017-10-01T23:24:34.735906000Z crunch-dispatch0/log/main/current:2017-10-01_23:26:47.00715 cwl-runner 3opjy-8i9sb-8l1vnvznj0hrl4x queued 2017-10-01T23:24:34.735906000Z crunch-dispatch0/log/main/current:2017-10-01_23:26:52.89632 cwl-runner 3opjy-8i9sb-8l1vnvznj0hrl4x queued 2017-10-01T23:24:34.735906000Z crunch-dispatch0/log/main/current:2017-10-01_23:26:58.75690 cwl-runner 3opjy-8i9sb-8l1vnvznj0hrl4x queued 2017-10-01T23:24:34.735906000Z crunch-dispatch0/log/main/current:2017-10-01_23:27:04.59749 cwl-runner 3opjy-8i9sb-8l1vnvznj0hrl4x queued 2017-10-01T23:24:34.735906000Z crunch-dispatch0/log/main/current:2017-10-01_23:27:10.44902 cwl-runner 3opjy-8i9sb-8l1vnvznj0hrl4x queued 2017-10-01T23:24:34.735906000Z crunch-dispatch0/log/main/current:2017-10-01_23:27:16.30613 cwl-runner 3opjy-8i9sb-8l1vnvznj0hrl4x queued 2017-10-01T23:24:34.735906000Z crunch-dispatch0/log/main/current:2017-10-01_23:27:22.19203 cwl-runner 3opjy-8i9sb-8l1vnvznj0hrl4x queued 2017-10-01T23:24:34.735906000Z crunch-dispatch0/log/main/current:2017-10-01_23:27:28.09981 cwl-runner 3opjy-8i9sb-8l1vnvznj0hrl4x queued 2017-10-01T23:24:34.735906000Z crunch-dispatch0/log/main/current:2017-10-01_23:27:33.95345 cwl-runner 3opjy-8i9sb-8l1vnvznj0hrl4x queued 2017-10-01T23:24:34.735906000Z crunch-dispatch0/log/main/current:2017-10-01_23:27:39.79504 cwl-runner 3opjy-8i9sb-8l1vnvznj0hrl4x queued 2017-10-01T23:24:34.735906000Z crunch-dispatch0/log/main/current:2017-10-01_23:27:45.64178 cwl-runner 3opjy-8i9sb-8l1vnvznj0hrl4x queued 2017-10-01T23:24:34.735906000Z crunch-dispatch0/log/main/current:2017-10-01_23:27:49.55444 3opjy-8i9sb-8l1vnvznj0hrl4x ! 6388 Sanity check is `docker ps -q` crunch-dispatch0/log/main/current:2017-10-01_23:27:49.55447 3opjy-8i9sb-8l1vnvznj0hrl4x ! 6388 sanity check: start crunch-dispatch0/log/main/current:2017-10-01_23:27:49.56418 3opjy-8i9sb-8l1vnvznj0hrl4x ! 6388 stderr starting: ['docker','ps','-q'] crunch-dispatch0/log/main/current:2017-10-01_23:27:49.67802 3opjy-8i9sb-8l1vnvznj0hrl4x ! 6388 sanity check: exit 0 crunch-dispatch0/log/main/current:2017-10-01_23:27:49.67806 3opjy-8i9sb-8l1vnvznj0hrl4x ! 6388 Sanity check OK crunch-dispatch0/log/main/current:2017-10-01_23:27:50.19404 3opjy-8i9sb-8l1vnvznj0hrl4x 6388 running from /usr/src/arvados/sdk/cli/bin/crunch-job with arvados-cli Gem version(s) 0.1.20170817171636 crunch-dispatch0/log/main/current:2017-10-01_23:27:51.07831 cwl-runner 3opjy-8i9sb-8l1vnvznj0hrl4x {:done=>0, :running=>0, :failed=>0, :todo=>1} crunch-dispatch0/log/main/current:2017-10-01_23:27:51.08984 3opjy-8i9sb-8l1vnvznj0hrl4x 6388 check slurm allocation crunch-dispatch0/log/main/current:2017-10-01_23:27:51.08991 3opjy-8i9sb-8l1vnvznj0hrl4x ! 2017-10-01 23:27:50 arvados.arv_put[12396] INFO: No cache usage requested for this run. crunch-dispatch0/log/main/current:2017-10-01_23:27:51.08993 3opjy-8i9sb-8l1vnvznj0hrl4x 6388 node localhost - 8 slots crunch-dispatch0/log/main/current:2017-10-01_23:27:51.08995 3opjy-8i9sb-8l1vnvznj0hrl4x 6388 start crunch-dispatch0/log/main/current:2017-10-01_23:27:51.08997 3opjy-8i9sb-8l1vnvznj0hrl4x 6388 clean work dirs: start crunch-dispatch0/log/main/current:2017-10-01_23:27:51.08999 3opjy-8i9sb-8l1vnvznj0hrl4x 6388 stderr starting: ['bash','-ec',' arv-mount --unmount-timeout 10 --unmount-all ${CRUNCH_TMP} rm -rf ${JOB_WORK} ${CRUNCH_INSTALL} ${CRUNCH_TMP}/task ${CRUNCH_TMP}/src* ${CRUNCH_TMP}/*.cid '] crunch-dispatch0/log/main/current:2017-10-01_23:27:51.09001 3opjy-8i9sb-8l1vnvznj0hrl4x 6388 clean work dirs: exit 0 crunch-dispatch0/log/main/current:2017-10-01_23:27:51.09003 3opjy-8i9sb-8l1vnvznj0hrl4x 6388 Install docker image 89f452e02664fbf685bb8b518200ff48+342 crunch-dispatch0/log/main/current:2017-10-01_23:27:51.09005 3opjy-8i9sb-8l1vnvznj0hrl4x 6388 docker image hash is sha256:ca63548c22c43fb0f9dbfe1682f5d925c6f79a88c717e87d046da77bd1a35345 crunch-dispatch0/log/main/current:2017-10-01_23:27:51.09007 3opjy-8i9sb-8l1vnvznj0hrl4x 6388 load docker image: start crunch-dispatch0/log/main/current:2017-10-01_23:27:51.09011 3opjy-8i9sb-8l1vnvznj0hrl4x 6388 stderr starting: ['/bin/bash','-o','pipefail','-ec',' loaded() { id=$(docker inspect --format="{{.ID}}" sha256\\:ca63548c22c43fb0f9dbfe1682f5d925c6f79a88c717e87d046da77bd1a35345) || return 1 echo "image ID is $id" [[ ${id} = sha256\\:ca63548c22c43fb0f9dbfe1682f5d925c6f79a88c717e87d046da77bd1a35345 ]] } if loaded >&2 2>/dev/null; then echo >&2 "image is already present" exit 0 fi echo >&2 "docker image is not present; loading" arv-get 89f452e02664fbf685bb8b518200ff48\\+342\\/sha256\\:ca63548c22c43fb0f9dbfe1682f5d925c6f79a88c717e87d046da77bd1a35345\\.tar | docker load if ! loaded >&2; then echo >&2 "`docker load` exited 0, but image is not found (!)" exit 1 fi echo >&2 "image loaded successfully" '] crunch-dispatch0/log/main/current:2017-10-01_23:27:51.09013 3opjy-8i9sb-8l1vnvznj0hrl4x 6388 stderr image ID is sha256:ca63548c22c43fb0f9dbfe1682f5d925c6f79a88c717e87d046da77bd1a35345 crunch-dispatch0/log/main/current:2017-10-01_23:27:51.10189 3opjy-8i9sb-8l1vnvznj0hrl4x 6388 stderr image is already present crunch-dispatch0/log/main/current:2017-10-01_23:27:51.10452 3opjy-8i9sb-8l1vnvznj0hrl4x 6388 load docker image: exit 0 crunch-dispatch0/log/main/current:2017-10-01_23:27:51.11504 3opjy-8i9sb-8l1vnvznj0hrl4x 6388 check --memory-swap feature: start crunch-dispatch0/log/main/current:2017-10-01_23:27:51.12729 3opjy-8i9sb-8l1vnvznj0hrl4x 6388 stderr starting: ['docker','run','--help'] crunch-dispatch0/log/main/current:2017-10-01_23:27:51.25062 3opjy-8i9sb-8l1vnvznj0hrl4x 6388 check --memory-swap feature: exit 0 crunch-dispatch0/log/main/current:2017-10-01_23:27:51.26089 3opjy-8i9sb-8l1vnvznj0hrl4x 6388 check whether default user is UID 0: start crunch-dispatch0/log/main/current:2017-10-01_23:27:51.27326 3opjy-8i9sb-8l1vnvznj0hrl4x 6388 stderr starting: ['/bin/sh','-ec','docker run --net=host sha256:ca63548c22c43fb0f9dbfe1682f5d925c6f79a88c717e87d046da77bd1a35345 id --user'] crunch-dispatch0/log/main/current:2017-10-01_23:27:51.71863 3opjy-8i9sb-8l1vnvznj0hrl4x 6388 check whether default user is UID 0: exit 0 crunch-dispatch0/log/main/current:2017-10-01_23:27:51.72874 3opjy-8i9sb-8l1vnvznj0hrl4x 6388 Container will run with default user crunch-dispatch0/log/main/current:2017-10-01_23:27:51.73907 3opjy-8i9sb-8l1vnvznj0hrl4x 6388 Looking for version a08e3bb86caa758df7d33a3df3f6b8c333e47838 from repository arvados crunch-dispatch0/log/main/current:2017-10-01_23:27:51.74958 3opjy-8i9sb-8l1vnvznj0hrl4x 6388 Using local repository '/var/lib/arvados/internal.git' crunch-dispatch0/log/main/current:2017-10-01_23:27:51.76441 3opjy-8i9sb-8l1vnvznj0hrl4x 6388 Version a08e3bb86caa758df7d33a3df3f6b8c333e47838 is commit a08e3bb86caa758df7d33a3df3f6b8c333e47838 crunch-dispatch0/log/main/current:2017-10-01_23:27:51.99798 3opjy-8i9sb-8l1vnvznj0hrl4x 6388 run install script on all workers: start crunch-dispatch0/log/main/current:2017-10-01_23:27:52.01147 3opjy-8i9sb-8l1vnvznj0hrl4x 6388 stderr starting: ['sh','-c','mkdir -p /tmp/crunch0/opt && cd /tmp/crunch0 && perl -'] crunch-dispatch0/log/main/current:2017-10-01_23:27:52.04300 3opjy-8i9sb-8l1vnvznj0hrl4x 6388 stderr Extracting archive: c2e1effc24b27bfc978c443705532885 crunch-dispatch0/log/main/current:2017-10-01_23:27:52.22986 3opjy-8i9sb-8l1vnvznj0hrl4x 6388 run install script on all workers: exit 0 crunch-dispatch0/log/main/current:2017-10-01_23:27:52.26353 3opjy-8i9sb-8l1vnvznj0hrl4x 6388 script cwl-runner crunch-dispatch0/log/main/current:2017-10-01_23:27:52.26356 3opjy-8i9sb-8l1vnvznj0hrl4x 6388 script_version a08e3bb86caa758df7d33a3df3f6b8c333e47838 crunch-dispatch0/log/main/current:2017-10-01_23:27:52.26627 3opjy-8i9sb-8l1vnvznj0hrl4x 6388 script_parameters {"arv:enable_reuse":true,"file1":null,"cwl:tool":"3fd0716d9e4ef16a4fe2b04c53240fb6+60/workflow.cwl#main","arv:on_error":"continue"} crunch-dispatch0/log/main/current:2017-10-01_23:27:52.27680 3opjy-8i9sb-8l1vnvznj0hrl4x 6388 runtime_constraints {"max_tasks_per_node":0,"min_ram_mb_per_node":1024,"docker_image":"arvados/jobs:a08e3bb86caa758df7d33a3df3f6b8c333e47838"} crunch-dispatch0/log/main/current:2017-10-01_23:27:52.32018 3opjy-8i9sb-8l1vnvznj0hrl4x ! crunch-job have 1 free slots for 1 initial tasks at this level, 1 nodes, and 8 slots crunch-dispatch0/log/main/current:2017-10-01_23:27:52.32020 3opjy-8i9sb-8l1vnvznj0hrl4x 6388 start level 0 with 1 slots crunch-dispatch0/log/main/current:2017-10-01_23:27:52.40196 3opjy-8i9sb-8l1vnvznj0hrl4x 6388 status: 0 done, 0 running, 1 todo crunch-dispatch0/log/main/current:2017-10-01_23:27:52.41401 3opjy-8i9sb-8l1vnvznj0hrl4x 6388 0 job_task 3opjy-ot0gb-n0nnqshooftstyr crunch-dispatch0/log/main/current:2017-10-01_23:27:52.42421 3opjy-8i9sb-8l1vnvznj0hrl4x 6388 0 child 12610 started on localhost.1 crunch-dispatch0/log/main/current:2017-10-01_23:27:52.47857 3opjy-8i9sb-8l1vnvznj0hrl4x 6388 0 stderr starting: ['bash','-c','if [ -e \\/tmp\\/crunch0\\/task\\/localhost\\.1 ]; then rm -rf \\/tmp\\/crunch0\\/task\\/localhost\\.1; fi; mkdir -p \\/tmp\\/crunch0 \\/tmp\\/crunch0\\/work \\/tmp\\/crunch0\\/task\\/localhost\\.1 \\/tmp\\/crunch0\\/task\\/localhost\\.1\\.keep && cd \\/tmp\\/crunch0 && MEM=$(awk \'($1 == "MemTotal:"){print $2}\' </proc/meminfo) && SWAP=$(awk \'($1 == "SwapTotal:"){print $2}\' </proc/meminfo) && MEMLIMIT=$(( ($MEM * 95) / (1 * 100) )) && let SWAPLIMIT=$MEMLIMIT+$SWAP && declare -a VOLUMES=() && if which crunchrunner >/dev/null ; then VOLUMES+=("--volume=$(which crunchrunner):/usr/local/bin/crunchrunner:ro") ; fi && if test -f /etc/ssl/certs/ca-certificates.crt ; then VOLUMES+=("--volume=/etc/ssl/certs/ca-certificates.crt:/etc/arvados/ca-certificates.crt:ro") ; elif test -f /etc/pki/tls/certs/ca-bundle.crt ; then VOLUMES+=("--volume=/etc/pki/tls/certs/ca-bundle.crt:/etc/arvados/ca-certificates.crt:ro") ; fi && exec arv-mount --read-write --mount-by-pdh=by_pdh --mount-tmp=tmp --crunchstat-interval=10 --allow-other \\/tmp\\/crunch0\\/task\\/localhost\\.1\\.keep --exec crunchstat -cgroup-root=\\/sys\\/fs\\/cgroup -cgroup-parent=docker -cgroup-cid=/tmp/crunch0/3opjy-ot0gb-n0nnqshooftstyr-0.cid -poll=10000 docker run --net=host --name=3opjy-ot0gb-n0nnqshooftstyr-0 --attach=stdout --attach=stderr --attach=stdin -i --cidfile=/tmp/crunch0/3opjy-ot0gb-n0nnqshooftstyr-0.cid --sig-proxy --memory=${MEMLIMIT}k --memory-swap=${SWAPLIMIT}k --volume=\\/tmp\\/crunch0\\/src\\:\\/tmp\\/crunch0\\/src\\:ro --volume=\\/tmp\\/crunch0\\/opt\\:\\/tmp\\/crunch0\\/opt\\:ro --volume=\\/tmp\\/crunch0\\/task\\/localhost\\.1\\.keep\\/by_pdh\\:\\/keep\\:ro --volume=\\/tmp\\/crunch0\\/task\\/localhost\\.1\\.keep\\/tmp\\:\\/keep_tmp --volume=/tmp "${VOLUMES[@]}" --env=CRUNCH_SRC\\=\\/tmp\\/crunch0\\/src --env=CRUNCH_JOB_BIN\\=\\/usr\\/src\\/arvados\\/sdk\\/cli\\/bin\\/crunch\\-job --env=CRUNCH_GIT_ARCHIVE_HASH\\=c2e1effc24b27bfc978c443705532885 --env=CRUNCH_JOB_UUID\\=3opjy\\-8i9sb\\-8l1vnvznj0hrl4x --env=CRUNCH_SRC_URL\\=\\/var\\/lib\\/arvados\\/internal\\.git --env=JOB_PARAMETER_ARV\\:ON_ERROR\\=continue --env=JOB_UUID\\=3opjy\\-8i9sb\\-8l1vnvznj0hrl4x --env=TASK_WORK\\=\\/tmp\\/crunch\\-job\\-task\\-work\\/localhost\\.1 --env=TASK_SEQUENCE\\=0 --env=CRUNCH_JOB_DOCKER_RUN_ARGS\\=\\-\\-net\\=host --env=CRUNCH_JOB_DOCKER_BIN\\=docker --env=CRUNCH_TMP\\=\\/tmp\\/crunch0 --env=TASK_SLOT_NODE\\=localhost --env=TASK_KEEPMOUNT_TMP\\=\\/keep_tmp --env=TASK_SLOT_NUMBER\\=1 --env=ARVADOS_API_TOKEN\\=[...] --env=TASK_TMPDIR\\=\\/tmp\\/crunch\\-job\\-task\\-work\\/localhost\\.1 --env=ARVADOS_API_HOST_INSECURE\\=1 --env=ARVADOS_API_HOST\\=172\\.17\\.0\\.2\\:8000 --env=JOB_PARAMETER_CWL\\:TOOL\\=3fd0716d9e4ef16a4fe2b04c53240fb6\\+60\\/workflow\\.cwl\\#main --env=CRUNCH_REFRESH_TRIGGER\\=\\/tmp\\/crunch_refresh_trigger --env=CRUNCH_INSTALL\\=\\/tmp\\/crunch0\\/opt --env=CRUNCH_WORK\\=\\/tmp\\/crunch0\\/work --env=TASK_KEEPMOUNT\\=\\/keep --env=CRUNCH_SRC_COMMIT\\=a08e3bb86caa758df7d33a3df3f6b8c333e47838 --env=JOB_WORK\\=\\/tmp\\/crunch\\-job\\-work --env=JOB_SCRIPT\\=cwl\\-runner --env=JOB_PARAMETER_ARV\\:ENABLE_REUSE\\=1 --env=CRUNCH_NODE_SLOTS\\=1 --env=TASK_QSEQUENCE\\=0 --env=JOB_PARAMETER_FILE1\\= --env=TASK_UUID\\=3opjy\\-ot0gb\\-n0nnqshooftstyr --env=HOME\\=\\/tmp\\/crunch\\-job\\-task\\-work\\/localhost\\.1 sha256\\:ca63548c22c43fb0f9dbfe1682f5d925c6f79a88c717e87d046da77bd1a35345 /bin/sh -c \'python -c "from pkg_resources import get_distribution as get; print \\"Using Arvados SDK version\\", get(\\"arvados-python-client\\").version">&2 2>/dev/null; mkdir -p "/tmp/crunch-job-work" "/tmp/crunch-job-task-work/localhost.1" && if which stdbuf >/dev/null ; then exec stdbuf --output=0 --error=0 \\/tmp\\/crunch0\\/src\\/crunch_scripts\\/cwl\\-runner ; else exec \\/tmp\\/crunch0\\/src\\/crunch_scripts\\/cwl\\-runner ; fi\''] crunch-dispatch0/log/main/current:2017-10-01_23:27:52.54833 3opjy-8i9sb-8l1vnvznj0hrl4x 6388 status: 0 done, 1 running, 0 todo crunch-dispatch0/log/main/current:2017-10-01_23:27:52.95995 3opjy-8i9sb-8l1vnvznj0hrl4x 6388 0 stderr crunchstat: Running [docker run --net=host --name=3opjy-ot0gb-n0nnqshooftstyr-0 --attach=stdout --attach=stderr --attach=stdin -i --cidfile=/tmp/crunch0/3opjy-ot0gb-n0nnqshooftstyr-0.cid --sig-proxy --memory=29342775k --memory-swap=29342775k --volume=/tmp/crunch0/src:/tmp/crunch0/src:ro --volume=/tmp/crunch0/opt:/tmp/crunch0/opt:ro --volume=/tmp/crunch0/task/localhost.1.keep/by_pdh:/keep:ro --volume=/tmp/crunch0/task/localhost.1.keep/tmp:/keep_tmp --volume=/tmp --volume=/usr/local/bin/crunchrunner:/usr/local/bin/crunchrunner:ro --volume=/etc/ssl/certs/ca-certificates.crt:/etc/arvados/ca-certificates.crt:ro --env=CRUNCH_SRC=/tmp/crunch0/src --env=CRUNCH_JOB_BIN=/usr/src/arvados/sdk/cli/bin/crunch-job --env=CRUNCH_GIT_ARCHIVE_HASH=c2e1effc24b27bfc978c443705532885 --env=CRUNCH_JOB_UUID=3opjy-8i9sb-8l1vnvznj0hrl4x --env=CRUNCH_SRC_URL=/var/lib/arvados/internal.git --env=JOB_PARAMETER_ARV:ON_ERROR=continue --env=JOB_UUID=3opjy-8i9sb-8l1vnvznj0hrl4x --env=TASK_WORK=/tmp/crunch-job-task-work/localhost.1 --env=TASK_SEQUENCE=0 --env=CRUNCH_JOB_DOCKER_RUN_ARGS=--net=host --env=CRUNCH_JOB_DOCKER_BIN=docker --env=CRUNCH_TMP=/tmp/crunch0 --env=TASK_SLOT_NODE=localhost --env=TASK_KEEPMOUNT_TMP=/keep_tmp --env=TASK_SLOT_NUMBER=1 --env=ARVADOS_API_TOKEN=4hja3jfqayqm5x5m1vu669ivuak5bk6zsenyndkdtg14ibgs3z --env=TASK_TMPDIR=/tmp/crunch-job-task-work/localhost.1 --env=ARVADOS_API_HOST_INSECURE=1 --env=ARVADOS_API_HOST=172.17.0.2:8000 --env=JOB_PARAMETER_CWL:TOOL=3fd0716d9e4ef16a4fe2b04c53240fb6+60/workflow.cwl#main --env=CRUNCH_REFRESH_TRIGGER=/tmp/crunch_refresh_trigger --env=CRUNCH_INSTALL=/tmp/crunch0/opt --env=CRUNCH_WORK=/tmp/crunch0/work --env=TASK_KEEPMOUNT=/keep --env=CRUNCH_SRC_COMMIT=a08e3bb86caa758df7d33a3df3f6b8c333e47838 --env=JOB_WORK=/tmp/crunch-job-work --env=JOB_SCRIPT=cwl-runner --env=JOB_PARAMETER_ARV:ENABLE_REUSE=1 --env=CRUNCH_NODE_SLOTS=1 --env=TASK_QSEQUENCE=0 --env=JOB_PARAMETER_FILE1= --env=TASK_UUID=3opjy-ot0gb-n0nnqshooftstyr --env=HOME=/tmp/crunch-job-task-work/localhost.1 sha256:ca63548c22c43fb0f9dbfe1682f5d925c6f79a88c717e87d046da77bd1a35345 /bin/sh -c python -c "from pkg_resources import get_distribution as get; print \"Using Arvados SDK version\", get(\"arvados-python-client\").version">&2 2>/dev/null; mkdir -p "/tmp/crunch-job-work" "/tmp/crunch-job-task-work/localhost.1" && if which stdbuf >/dev/null ; then exec stdbuf --output=0 --error=0 \/tmp\/crunch0\/src\/crunch_scripts\/cwl\-runner ; else exec \/tmp\/crunch0\/src\/crunch_scripts\/cwl\-runner ; fi] crunch-dispatch0/log/main/current:2017-10-01_23:27:53.17098 3opjy-8i9sb-8l1vnvznj0hrl4x 6388 0 stderr WARNING: Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap. crunch-dispatch0/log/main/current:2017-10-01_23:27:53.19502 3opjy-8i9sb-8l1vnvznj0hrl4x 6388 0 stderr crunchstat: notice: reading stats from /sys/fs/cgroup/cpuacct/docker/faac60713a8bbe459e1edc0a5680cc3604c510470d6c7924df94b89182ad1169/cgroup.procs crunch-dispatch0/log/main/current:2017-10-01_23:27:53.20549 3opjy-8i9sb-8l1vnvznj0hrl4x 6388 0 stderr crunchstat: notice: reading stats from /sys/fs/cgroup/memory/docker/faac60713a8bbe459e1edc0a5680cc3604c510470d6c7924df94b89182ad1169/memory.stat crunch-dispatch0/log/main/current:2017-10-01_23:27:53.21576 3opjy-8i9sb-8l1vnvznj0hrl4x 6388 0 stderr crunchstat: mem 0 cache 0 pgmajfault 1503232 rss crunch-dispatch0/log/main/current:2017-10-01_23:27:53.22603 3opjy-8i9sb-8l1vnvznj0hrl4x 6388 0 stderr crunchstat: notice: reading stats from /sys/fs/cgroup/cpuacct/docker/faac60713a8bbe459e1edc0a5680cc3604c510470d6c7924df94b89182ad1169/cpuacct.stat crunch-dispatch0/log/main/current:2017-10-01_23:27:53.23634 3opjy-8i9sb-8l1vnvznj0hrl4x 6388 0 stderr crunchstat: notice: reading stats from /sys/fs/cgroup/cpuset/docker/faac60713a8bbe459e1edc0a5680cc3604c510470d6c7924df94b89182ad1169/cpuset.cpus crunch-dispatch0/log/main/current:2017-10-01_23:27:53.24647 3opjy-8i9sb-8l1vnvznj0hrl4x 6388 0 stderr crunchstat: cpu 0.0200 user 0.0100 sys 8 cpus crunch-dispatch0/log/main/current:2017-10-01_23:27:53.25694 3opjy-8i9sb-8l1vnvznj0hrl4x 6388 0 stderr crunchstat: notice: reading stats from /sys/fs/cgroup/blkio/docker/faac60713a8bbe459e1edc0a5680cc3604c510470d6c7924df94b89182ad1169/blkio.io_service_bytes crunch-dispatch0/log/main/current:2017-10-01_23:27:53.26788 3opjy-8i9sb-8l1vnvznj0hrl4x 6388 0 stderr crunchstat: net:docker0 0 tx 0 rx crunch-dispatch0/log/main/current:2017-10-01_23:27:53.27754 3opjy-8i9sb-8l1vnvznj0hrl4x 6388 0 stderr crunchstat: net:eth0 854023 tx 401647313 rx crunch-dispatch0/log/main/current:2017-10-01_23:27:53.48866 3opjy-8i9sb-8l1vnvznj0hrl4x 6388 0 stderr Using Arvados SDK version 0.1.20170818194607 crunch-dispatch0/log/main/current:2017-10-01_23:27:54.32346 3opjy-8i9sb-8l1vnvznj0hrl4x 6388 0 stderr arvados.cwl-runner INFO: /tmp/crunch0/src/crunch_scripts/cwl-runner a08e3bb86caa758df7d33a3df3f6b8c333e47838 1.0.20170929203319, arvados-python-client 0.1.20170818194607, cwltool 1.0.20170928192020 crunch-dispatch0/log/main/current:2017-10-01_23:27:56.19649 cwl-runner 3opjy-8i9sb-8l1vnvznj0hrl4x {:todo=>0, :failed=>0, :running=>1, :done=>0} crunch-dispatch0/log/main/current:2017-10-01_23:27:56.19664 3opjy-8i9sb-8l1vnvznj0hrl4x 6388 0 stderr Workflow checker warning: crunch-dispatch0/log/main/current:2017-10-01_23:27:56.19668 3opjy-8i9sb-8l1vnvznj0hrl4x 6388 0 stderr keep/3fd0716d9e4ef16a4fe2b04c53240fb6+60/workflow.cwl:9:11: Source 'file1' of type ["null", "File"] is partially incompatible crunch-dispatch0/log/main/current:2017-10-01_23:27:56.19674 3opjy-8i9sb-8l1vnvznj0hrl4x 6388 0 stderr keep/3fd0716d9e4ef16a4fe2b04c53240fb6+60/workflow.cwl:26:13: with sink 'file1' of type "File" crunch-dispatch0/log/main/current:2017-10-01_23:27:56.19675 3opjy-8i9sb-8l1vnvznj0hrl4x 6388 0 stderr Workflow checker warning: crunch-dispatch0/log/main/current:2017-10-01_23:27:56.19676 3opjy-8i9sb-8l1vnvznj0hrl4x 6388 0 stderr keep/3fd0716d9e4ef16a4fe2b04c53240fb6+60/workflow.cwl:9:11: Source 'file1' of type ["null", "File"] is partially incompatible crunch-dispatch0/log/main/current:2017-10-01_23:27:56.19679 3opjy-8i9sb-8l1vnvznj0hrl4x 6388 0 stderr keep/3fd0716d9e4ef16a4fe2b04c53240fb6+60/workflow.cwl:26:13: with sink 'file1' of type "File" crunch-dispatch0/log/main/current:2017-10-01_23:27:56.19681 3opjy-8i9sb-8l1vnvznj0hrl4x 6388 0 stderr [workflow workflow.cwl#main] start crunch-dispatch0/log/main/current:2017-10-01_23:27:56.19683 3opjy-8i9sb-8l1vnvznj0hrl4x 6388 0 stderr [step step1] start crunch-dispatch0/log/main/current:2017-10-01_23:27:56.19686 3opjy-8i9sb-8l1vnvznj0hrl4x 6388 0 stderr arvados.cwl-runner INFO: [job step1] reused job 3opjy-8i9sb-0xq1n2f22q4u7aq crunch-dispatch0/log/main/current:2017-10-01_23:27:56.34585 3opjy-8i9sb-8l1vnvznj0hrl4x 6388 0 stderr [step step1] completed success crunch-dispatch0/log/main/current:2017-10-01_23:27:56.35637 3opjy-8i9sb-8l1vnvznj0hrl4x 6388 0 stderr [step step2] start crunch-dispatch0/log/main/current:2017-10-01_23:27:56.56731 3opjy-8i9sb-8l1vnvznj0hrl4x 6388 0 stderr [step step2] completed success crunch-dispatch0/log/main/current:2017-10-01_23:27:56.57754 3opjy-8i9sb-8l1vnvznj0hrl4x 6388 0 stderr [workflow workflow.cwl#main] completed success crunch-dispatch0/log/main/current:2017-10-01_23:27:56.58777 3opjy-8i9sb-8l1vnvznj0hrl4x 6388 0 stderr arvados.cwl-runner INFO: Overall process status is success crunch-dispatch0/log/main/current:2017-10-01_23:27:56.60485 3opjy-8i9sb-8l1vnvznj0hrl4x 6388 0 stderr arvados.cwl-runner INFO: Final output collection b17d5f1b3929239388d8e5b7038931c6+59 "Output of main (2017-10-01T23:27:56.584Z)" (3opjy-4zz18-b3fi6roye3wivbc) crunch-dispatch0/log/main/current:2017-10-01_23:27:56.93023 3opjy-8i9sb-8l1vnvznj0hrl4x 6388 0 child 12610 on localhost.1 exit 0 success=1 crunch-dispatch0/log/main/current:2017-10-01_23:27:56.94047 3opjy-8i9sb-8l1vnvznj0hrl4x 6388 0 success in 4 seconds crunch-dispatch0/log/main/current:2017-10-01_23:27:56.99571 3opjy-8i9sb-8l1vnvznj0hrl4x 6388 0 task output (35 bytes): b17d5f1b3929239388d8e5b7038931c6+59 crunch-dispatch0/log/main/current:2017-10-01_23:27:57.00600 3opjy-8i9sb-8l1vnvznj0hrl4x 6388 1 tasks exited (1 succeeded), checking for new tasks from API server. crunch-dispatch0/log/main/current:2017-10-01_23:27:57.04712 3opjy-8i9sb-8l1vnvznj0hrl4x 6388 Got 0 new tasks from API server. crunch-dispatch0/log/main/current:2017-10-01_23:27:57.12886 3opjy-8i9sb-8l1vnvznj0hrl4x 6388 status: 1 done, 0 running, 0 todo crunch-dispatch0/log/main/current:2017-10-01_23:27:57.13918 3opjy-8i9sb-8l1vnvznj0hrl4x 6388 wait for last 0 children to finish crunch-dispatch0/log/main/current:2017-10-01_23:27:57.14937 3opjy-8i9sb-8l1vnvznj0hrl4x 6388 Freeze not implemented crunch-dispatch0/log/main/current:2017-10-01_23:27:57.15969 3opjy-8i9sb-8l1vnvznj0hrl4x 6388 collate crunch-dispatch0/log/main/current:2017-10-01_23:27:57.60841 3opjy-8i9sb-8l1vnvznj0hrl4x 6388 collated output manifest text to send to API server is 110 bytes with access tokens crunch-dispatch0/log/main/current:2017-10-01_23:27:57.69536 3opjy-8i9sb-8l1vnvznj0hrl4x 6388 job output b17d5f1b3929239388d8e5b7038931c6+59 crunch-dispatch0/log/main/current:2017-10-01_23:27:57.76883 3opjy-8i9sb-8l1vnvznj0hrl4x 6388 finish crunch-dispatch0/log/main/current:2017-10-01_23:27:57.93659 3opjy-8i9sb-8l1vnvznj0hrl4x 6388 log collection is 2f64a912eb5b5d2958e5d1e059928021+85 crunch-dispatch0/log/main/current:2017-10-01_23:27:58.08029 dispatch: job 3opjy-8i9sb-8l1vnvznj0hrl4x end crunch-dispatch0/log/main/current:2017-10-01_23:28:02.03382 cwl-runner 3opjy-8i9sb-8l1vnvznj0hrl4x b17d5f1b3929239388d8e5b7038931c6+59 crunch-dispatch1/log/main/current:2017-10-01_23:24:35.57990 dispatch: ["git", "--git-dir=/var/lib/arvados/internal.git", "tag", "3opjy-8i9sb-8l1vnvznj0hrl4x", "a08e3bb86caa758df7d33a3df3f6b8c333e47838"] crunch-dispatch1/log/main/current:2017-10-01_23:24:35.58508 dispatch: sudo -E -u crunch LD_LIBRARY_PATH= PATH=/var/lib/gems/ruby/2.3.0/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/var/lib/gems/bin:/usr/local/go/bin PERLLIB=/usr/src/arvados/sdk/perl/lib PYTHONPATH= RUBYLIB=/var/lib/gems/gems/bundler-1.15.4/lib GEM_PATH= /usr/src/arvados/sdk/cli/bin/crunch-job --job-api-token 5bg6t2ybjggzim7xude15gmfvk1wk528h0la8dftpkn4580lpn --job 3opjy-8i9sb-8l1vnvznj0hrl4x --git-dir /var/lib/arvados/internal.git --docker-bin docker --docker-run-args --net=host crunch-dispatch1/log/main/current:2017-10-01_23:24:35.58560 dispatch: job 3opjy-8i9sb-8l1vnvznj0hrl4x crunch-dispatch1/log/main/current:2017-10-01_23:28:06.87840 3opjy-8i9sb-8l1vnvznj0hrl4x ! 6394 Sanity check is `docker ps -q` crunch-dispatch1/log/main/current:2017-10-01_23:28:06.87843 3opjy-8i9sb-8l1vnvznj0hrl4x ! 6394 sanity check: start crunch-dispatch1/log/main/current:2017-10-01_23:28:06.88624 3opjy-8i9sb-8l1vnvznj0hrl4x ! 6394 stderr starting: ['docker','ps','-q'] crunch-dispatch1/log/main/current:2017-10-01_23:28:07.00297 3opjy-8i9sb-8l1vnvznj0hrl4x ! 6394 sanity check: exit 0 crunch-dispatch1/log/main/current:2017-10-01_23:28:07.00299 3opjy-8i9sb-8l1vnvznj0hrl4x ! 6394 Sanity check OK crunch-dispatch1/log/main/current:2017-10-01_23:28:07.06402 6394 API method jobs/lock failed: API call /jobs/3opjy-8i9sb-8l1vnvznj0hrl4x/lock failed: 422 Unprocessable Entity {"errors":["#<ArvadosModel::AlreadyLockedError: ArvadosModel::AlreadyLockedError>"],"error_token":"1506900487+40fbf247"}. Retrying at 2017-10-01T23:28:08Z. crunch-dispatch1/log/main/current:2017-10-01_23:28:08.10061 6394 API method jobs/lock failed: API call /jobs/3opjy-8i9sb-8l1vnvznj0hrl4x/lock failed: 422 Unprocessable Entity {"errors":["#<ArvadosModel::AlreadyLockedError: ArvadosModel::AlreadyLockedError>"],"error_token":"1506900488+e0945fc5"}. Retrying at 2017-10-01T23:28:10Z. crunch-dispatch1/log/main/current:2017-10-01_23:28:10.13425 6394 API method jobs/lock failed: API call /jobs/3opjy-8i9sb-8l1vnvznj0hrl4x/lock failed: 422 Unprocessable Entity {"errors":["#<ArvadosModel::AlreadyLockedError: ArvadosModel::AlreadyLockedError>"],"error_token":"1506900490+76758d4c"}. Retrying at 2017-10-01T23:28:14Z. crunch-dispatch1/log/main/current:2017-10-01_23:28:14.16847 3opjy-8i9sb-8l1vnvznj0hrl4x ! 6394 Error while locking job, exiting 75 crunch-dispatch1/log/main/current:2017-10-01_23:28:14.17735 dispatch: job 3opjy-8i9sb-8l1vnvznj0hrl4x end
Updated by Ward Vandewege about 7 years ago
I instrumented a copy of arvbox to print timestamps, and then parsed those out. This is the result (in seconds). It's missing the timing for test 111 (that's because I'm not printing a timestamp at the end of the run).
jenkins@ci:~/slow-arvbox-test/arvados$ cat screenlog.0 |~/calculate-timings.rb Test 1: 43 Test 2: 32 Test 3: 32 Test 4: 77 Test 5: 407 Test 6: 413 Test 7: 422 Test 8: 408 Test 9: 423 Test 10: 604 Test 11: 603 Test 12: 604 Test 13: 619 Test 14: 332 Test 15: 438 Test 16: 287 Test 17: 468 Test 18: 693 Test 19: 288 Test 20: 708 Test 21: 243 Test 22: 438 Test 23: 768 Test 24: 258 Test 25: 633 Test 26: 438 Test 27: 634 Test 28: 152 Test 29: 813 Test 30: 1130 Test 31: 829 Test 32: 62 Test 33: 347 Test 34: 228 Test 35: 287 Test 36: 348 Test 37: 423 Test 38: 227 Test 39: 529 Test 40: 332 Test 41: 513 Test 42: 663 Test 43: 438 Test 44: 347 Test 45: 363 Test 46: 543 Test 47: 152 Test 48: 62 Test 49: 17 Test 50: 108 Test 51: 62 Test 52: 242 Test 53: 693 Test 54: 408 Test 55: 648 Test 56: 258 Test 57: 814 Test 58: 497 Test 59: 624 Test 60: 113 Test 61: 167 Test 62: 273 Test 63: 482 Test 64: 303 Test 65: 678 Test 66: 603 Test 67: 543 Test 68: 558 Test 69: 768 Test 70: 1159 Test 71: 303 Test 72: 362 Test 73: 1054 Test 74: 769 Test 75: 1039 Test 76: 648 Test 77: 498 Test 78: 603 Test 79: 618 Test 80: 618 Test 81: 1 Test 82: 573 Test 83: 435 Test 84: 498 Test 85: 273 Test 86: 587 Test 87: 333 Test 88: 588 Test 89: 603 Test 90: 573 Test 91: 302 Test 92: 679 Test 93: 332 Test 94: 423 Test 95: 1 Test 96: 363 Test 97: 829 Test 98: 633 Test 99: 1 Test 100: 468 Test 101: 92 Test 102: 2 Test 103: 498 Test 104: 303 Test 105: 182 Test 106: 182 Test 107: 392 Test 108: 318 Test 109: 1 Test 110: 303
Here is a comparison with a run on another machine (which is about 50% slower than ci in general):
Test 1: 65 Test 2: 33 Test 3: 33 Test 4: 33 Test 5: 33 Test 6: 41 Test 7: 63 Test 8: 63 Test 9: 78 Test 10: 78 Test 11: 78 Test 12: 78 Test 13: 78 Test 14: 18 Test 15: 62 Test 16: 63 Test 17: 78 Test 18: 78 Test 19: 18 Test 20: 123 Test 21: 18 Test 22: 79 Test 23: 138 Test 24: 18 Test 25: 138 Test 26: 109 Test 27: 78 Test 28: 18 Test 29: 123 Test 30: 153 Test 31: 138 Test 32: 18 Test 33: 94 Test 34: 17 Test 35: 78 Test 36: 79 Test 37: 48 Test 38: 63 Test 39: 93 Test 40: 33 Test 41: 48 Test 42: 138 Test 43: 18 Test 44: 33 Test 45: 94 Test 46: 33 Test 47: 123 Test 48: 78 Test 49: 18 Test 50: 139 Test 51: 63 Test 52: 18 Test 53: 124 Test 54: 18 Test 55: 108 Test 56: 18 Test 57: 124 Test 58: 63 Test 59: 116 Test 60: 42 Test 61: 18 Test 62: 63 Test 63: 108 Test 64: 18 Test 65: 123 Test 66: 63 Test 67: 78 Test 68: 78 Test 69: 93 Test 70: 154 Test 71: 18 Test 72: 63 Test 73: 138 Test 74: 78 Test 75: 94 Test 76: 63 Test 77: 48 Test 78: 63 Test 79: 63 Test 80: 63 Test 81: 2 Test 82: 63 Test 83: 76 Test 84: 33 Test 85: 18 Test 86: 48 Test 87: 63 Test 88: 78 Test 89: 63 Test 90: 63 Test 91: 18 Test 92: 109 Test 93: 17 Test 94: 48 Test 95: 3 Test 96: 48 Test 97: 123 Test 98: 63 Test 99: 2 Test 100: 63 Test 101: 78 Test 102: 4 Test 103: 93 Test 104: 63 Test 105: 18 Test 106: 63 Test 107: 109 Test 108: 63 Test 109: 2 Test 110: 78
Updated by Ward Vandewege about 7 years ago
Switching to using the containers API makes a huge improvement, it takes the runtime down to around 36 minutes, even on ci.
That's with crunch-dispatch-local executing one container request at a time.
If I give cwltest the -j=8 parameter (for instance) to run 8 of these things at a time, arvados-cwl-runner bombs out like this:
2017-10-03 23:13:58 arvados.arv_put INFO: Resuming upload from cache file /root/.cache/arvados/arv-put/c5dadc18a2dc00619c0a24e33ed5e703 2017-10-03 23:13:58 arvados.arv_put ERROR: arv-put: Another process is already uploading this data. Use --no-cache if this is really what you want. 2017-10-03 23:13:58 cwltool ERROR: Workflow error, try again with --debug for more information: v1.0/cat3-tool.cwl:7:5: keepdocker exited with code 1
The failures are all to do with multiple jobs trying to arv-put (the same) docker images via arv-keepdocker.
Do we have a way to namespace the arv-keepdocker calls, or serialize them?
Updated by Peter Amstutz about 7 years ago
Ward Vandewege wrote:
Switching to using the containers API makes a huge improvement, it takes the runtime down to around 36 minutes, even on ci.
That sounds like a normal amount of time. Looking at the crunch1 dispatching path, the point where it gets hung up seems to be when it runs "sudo crunch-job ..." and crunch2 doesn't have that.
That's with crunch-dispatch-local executing one container request at a time.
If I give cwltest the -j=8 parameter (for instance) to run 8 of these things at a time, arvados-cwl-runner bombs out like this:
[...]
The failures are all to do with multiple jobs trying to arv-put (the same) docker images via arv-keepdocker.
Do we have a way to namespace the arv-keepdocker calls, or serialize them?
I got this error the last time I tried to use the parallel execution option. I thought I filed a bug but maybe not. We should fix it.
Updated by Ward Vandewege about 7 years ago
For future reference, this is an example of the runtime on the ci box as of 3623287cdc05121a86e573b89aebb6e4aa6866b5 on branch 12355-make-cwl-conformance-tests-faster. The big gain came from switching to the containers API, though the branch has some other fixes too.
Test 1: 28 Test 2: 19 Test 3: 17 Test 4: 17 Test 5: 17 Test 6: 21 Test 7: 17 Test 8: 17 Test 9: 17 Test 10: 17 Test 11: 17 Test 12: 17 Test 13: 17 Test 14: 17 Test 15: 16 Test 16: 17 Test 17: 17 Test 18: 17 Test 19: 16 Test 20: 18 Test 21: 17 Test 22: 17 Test 23: 32 Test 24: 17 Test 25: 32 Test 26: 32 Test 27: 32 Test 28: 17 Test 29: 17 Test 30: 31 Test 31: 32 Test 32: 17 Test 33: 17 Test 34: 17 Test 35: 17 Test 36: 17 Test 37: 17 Test 38: 17 Test 39: 17 Test 40: 17 Test 41: 17 Test 42: 32 Test 43: 17 Test 44: 17 Test 45: 17 Test 46: 17 Test 47: 32 Test 48: 47 Test 49: 2 Test 50: 47 Test 51: 17 Test 52: 3 Test 53: 17 Test 54: 17 Test 55: 17 Test 56: 2 Test 57: 18 Test 58: 16 Test 59: 23 Test 60: 8 Test 61: 2 Test 62: 17 Test 63: 32 Test 64: 17 Test 65: 17 Test 66: 17 Test 67: 17 Test 68: 17 Test 69: 32 Test 70: 32 Test 71: 17 Test 72: 17 Test 73: 32 Test 74: 31 Test 75: 48 Test 76: 17 Test 77: 17 Test 78: 17 Test 79: 17 Test 80: 17 Test 81: 1 Test 82: 17 Test 83: 45 Test 84: 19 Test 85: 2 Test 86: 17 Test 87: 17 Test 88: 17 Test 89: 32 Test 90: 17 Test 91: 17 Test 92: 17 Test 93: 17 Test 94: 17 Test 95: 2 Test 96: 2 Test 97: 17 Test 98: 17 Test 99: 1 Test 100: 18 Test 101: 17 Test 102: 2 Test 103: 32 Test 104: 17 Test 105: 17 Test 106: 17 Test 107: 17 Test 108: 17 Test 109: 2 Test 110: 17
107 tests passed, 4 unsupported features All tool tests succeeded + CODE=0 + test 0 = 0 + arvbox stop cwltest cwltest + exit 0 real 35m47.248s user 0m0.324s sys 0m0.332s
Updated by Ward Vandewege about 7 years ago
ready for review: branch 12355-make-cwl-conformance-tests-faster at 832e319fe2ba70e3ae18410238f07281aa929af9
Updated by Peter Amstutz about 7 years ago
@@ -102,7 +102,7 @@ EOF2 chmod +x /tmp/cwltest/arv-cwl-containers env -exec ./run_test.sh EXTRA=--compute-checksum $@ +exec ./run_test.sh RUNNER=/tmp/cwltest/arv-cwl-containers EXTRA=--compute-checksum $@ EOF CODE=$?
Instead of changing the test_with_arvbox.sh
script, you can pass this in on the test_with_arvbox.sh
command line in the Jenkins configuration:
test_with_arvbox.sh --config localdemo RUNNER=/tmp/cwltest/arv-cwl-containers
Rest of the changes LGTM.
Updated by Ward Vandewege about 7 years ago
Peter Amstutz wrote:
[...]
Instead of changing the
test_with_arvbox.sh
script, you can pass this in on thetest_with_arvbox.sh
command line in the Jenkins configuration:[...]
Rest of the changes LGTM.
Well, sure, we could, but test_with_arvbox.sh with the jobs api is malfunctioning (very slow in general; especially so on the ci box). Should we really default to that!?
Updated by Peter Amstutz about 7 years ago
Ward Vandewege wrote:
Peter Amstutz wrote:
[...]
Instead of changing the
test_with_arvbox.sh
script, you can pass this in on thetest_with_arvbox.sh
command line in the Jenkins configuration:[...]
Rest of the changes LGTM.
Well, sure, we could, but test_with_arvbox.sh with the jobs api is malfunctioning (very slow in general; especially so on the ci box). Should we really default to that!?
Guess not. On further thought, setting RUNNER= on the test_with_arvbox.sh command line should still override the default, so it doesn't break anything.
LGTM, please merge.
Updated by Anonymous about 7 years ago
- Status changed from In Progress to Resolved
- % Done changed from 0 to 100
Applied in changeset arvados|commit:3910be3219ace18462a128571bfac0b35446a392.