Bug #12355

run-arvados-cwl-conformance-tests really slow

Added by Peter Amstutz 22 days ago. Updated 14 days ago.

Status:ResolvedStart date:
Priority:NormalDue date:
Assignee:Ward Vandewege% Done:

100%

Category:-
Target version:2017-10-11 Sprint
Story points-
Velocity based estimate-

Description

It is taking hours to complete. For some reason the "sudo" command to run crunch-job is taking 75 seconds to start.

2017-09-27_18:41:15.94921 dispatch: sudo -E -u crunch LD_LIBRARY_PATH= PATH=/var/lib/gems/ruby/2.3.0/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/var/lib/gems/bin:/usr/local
/go/bin PERLLIB=/usr/src/arvados/sdk/perl/lib PYTHONPATH= RUBYLIB=/var/lib/gems/gems/bundler-1.15.4/lib GEM_PATH= /usr/src/arvados/sdk/cli/bin/crunch-job --job-api-token 2e6f0zp3ralzjqscwu38czli
w6353bgziuat739i74sy12vaav --job mqkj9-8i9sb-0p5a2vn0evw4oa9 --git-dir /var/lib/arvados/internal.git --docker-bin docker --docker-run-args --net=host
2017-09-27_18:41:15.94968 dispatch: job mqkj9-8i9sb-0p5a2vn0evw4oa9
2017-09-27_18:41:15.94970 dispatch: child 28624 start 2017-09-27_18:41:15
2017-09-27_18:41:17.91463 
2017-09-27_18:41:17.91467 2017-09-27 18:41:17 +0000 -- pipeline_instance mqkj9-d1hrv-f2qy2hjqsxph10c
2017-09-27_18:41:17.91467 cwl-runner mqkj9-8i9sb-j6koxtmv6dtazgd {:failed=>0, :running=>1, :todo=>0, :done=>0}
2017-09-27_18:41:23.72848 
2017-09-27_18:41:23.72852 2017-09-27 18:41:23 +0000 -- pipeline_instance mqkj9-d1hrv-f2qy2hjqsxph10c
2017-09-27_18:41:23.72852 cwl-runner mqkj9-8i9sb-j6koxtmv6dtazgd {:failed=>0, :running=>1, :todo=>0, :done=>0}
2017-09-27_18:41:29.56375 
2017-09-27_18:41:29.56379 2017-09-27 18:41:29 +0000 -- pipeline_instance mqkj9-d1hrv-f2qy2hjqsxph10c
2017-09-27_18:41:29.56379 cwl-runner mqkj9-8i9sb-j6koxtmv6dtazgd {:failed=>0, :running=>1, :todo=>0, :done=>0}
2017-09-27_18:41:35.34087 
2017-09-27_18:41:35.34090 2017-09-27 18:41:35 +0000 -- pipeline_instance mqkj9-d1hrv-f2qy2hjqsxph10c
2017-09-27_18:41:35.34090 cwl-runner mqkj9-8i9sb-j6koxtmv6dtazgd {:failed=>0, :running=>1, :todo=>0, :done=>0}
2017-09-27_18:41:41.18088 
2017-09-27_18:41:41.18092 2017-09-27 18:41:41 +0000 -- pipeline_instance mqkj9-d1hrv-f2qy2hjqsxph10c
2017-09-27_18:41:41.18092 cwl-runner mqkj9-8i9sb-j6koxtmv6dtazgd {:failed=>0, :running=>1, :todo=>0, :done=>0}
2017-09-27_18:41:46.98528 
2017-09-27_18:41:46.98531 2017-09-27 18:41:46 +0000 -- pipeline_instance mqkj9-d1hrv-f2qy2hjqsxph10c
2017-09-27_18:41:46.98531 cwl-runner mqkj9-8i9sb-j6koxtmv6dtazgd {:failed=>0, :running=>1, :todo=>0, :done=>0}
2017-09-27_18:41:52.81456 
2017-09-27_18:41:52.81458 2017-09-27 18:41:52 +0000 -- pipeline_instance mqkj9-d1hrv-f2qy2hjqsxph10c
2017-09-27_18:41:52.81459 cwl-runner mqkj9-8i9sb-j6koxtmv6dtazgd {:failed=>0, :running=>1, :todo=>0, :done=>0}
2017-09-27_18:41:58.58492 
2017-09-27_18:41:58.58495 2017-09-27 18:41:58 +0000 -- pipeline_instance mqkj9-d1hrv-f2qy2hjqsxph10c
2017-09-27_18:41:58.58495 cwl-runner mqkj9-8i9sb-j6koxtmv6dtazgd {:failed=>0, :running=>1, :todo=>0, :done=>0}
2017-09-27_18:42:04.37249 
2017-09-27_18:42:04.37252 2017-09-27 18:42:04 +0000 -- pipeline_instance mqkj9-d1hrv-f2qy2hjqsxph10c
2017-09-27_18:42:04.37252 cwl-runner mqkj9-8i9sb-j6koxtmv6dtazgd {:failed=>0, :running=>1, :todo=>0, :done=>0}
2017-09-27_18:42:10.17346 
2017-09-27_18:42:10.17349 2017-09-27 18:42:10 +0000 -- pipeline_instance mqkj9-d1hrv-f2qy2hjqsxph10c
2017-09-27_18:42:10.17350 cwl-runner mqkj9-8i9sb-j6koxtmv6dtazgd {:failed=>0, :running=>1, :todo=>0, :done=>0}
2017-09-27_18:42:15.96954 
2017-09-27_18:42:15.96957 2017-09-27 18:42:15 +0000 -- pipeline_instance mqkj9-d1hrv-f2qy2hjqsxph10c
2017-09-27_18:42:15.96958 cwl-runner mqkj9-8i9sb-j6koxtmv6dtazgd {:failed=>0, :running=>1, :todo=>0, :done=>0}
2017-09-27_18:42:21.76766 
2017-09-27_18:42:21.76769 2017-09-27 18:42:21 +0000 -- pipeline_instance mqkj9-d1hrv-f2qy2hjqsxph10c
2017-09-27_18:42:21.76769 cwl-runner mqkj9-8i9sb-j6koxtmv6dtazgd {:failed=>0, :running=>1, :todo=>0, :done=>0}
2017-09-27_18:42:27.53407 
2017-09-27_18:42:27.53409 2017-09-27 18:42:27 +0000 -- pipeline_instance mqkj9-d1hrv-f2qy2hjqsxph10c
2017-09-27_18:42:27.53410 cwl-runner mqkj9-8i9sb-j6koxtmv6dtazgd {:failed=>0, :running=>1, :todo=>0, :done=>0}
2017-09-27_18:42:33.32951 
2017-09-27_18:42:33.32953 2017-09-27 18:42:33 +0000 -- pipeline_instance mqkj9-d1hrv-f2qy2hjqsxph10c
2017-09-27_18:42:33.32954 cwl-runner mqkj9-8i9sb-j6koxtmv6dtazgd {:failed=>0, :running=>1, :todo=>0, :done=>0}
2017-09-27_18:42:36.65370 mqkj9-8i9sb-0p5a2vn0evw4oa9 !  28626  Sanity check is `docker ps -q`

Related issues

Related to Arvados - Bug #12404: Parallel a-c-r runs interfere in Docker uploads New

Associated revisions

Revision b435e2b5
Added by Ward Vandewege 16 days ago

Run cwl conformance tests with the containers api.

refs #12355

Arvados-DCO-1.1-Signed-off-by: Ward Vandewege <>

Revision 6a5b472b
Added by Ward Vandewege 16 days ago

arvbox: bump ruby version up to 2.3.5

refs #12355

Arvados-DCO-1.1-Signed-off-by: Ward Vandewege <>

Revision 0bb0c5fc
Added by Ward Vandewege 16 days ago

arvbox: jobs api: crunch-dispatch: stop excessive stat of /etc/localtime

refs #12355

Arvados-DCO-1.1-Signed-off-by: Ward Vandewege <>

Revision 8fa3f045
Added by Ward Vandewege 16 days ago

arvbox: containers api: make crunch-dispatch-local poll more agressively

refs #12355

Arvados-DCO-1.1-Signed-off-by: Ward Vandewege <>

Revision 3623287c
Added by Ward Vandewege 15 days ago

arvbox: run the rails code in production mode when running in localdemo or
publicdemo.

refs #12355

Arvados-DCO-1.1-Signed-off-by: Ward Vandewege <>

Revision 832e319f
Added by Ward Vandewege 15 days ago

arvbox: use overlay2 inside arvbox.

refs #12355

Arvados-DCO-1.1-Signed-off-by: Ward Vandewege <>

Revision d09be5c6
Added by Ward Vandewege 14 days ago

Merge branch 'master' into 12355-make-cwl-conformance-tests-faster

refs #12355

Arvados-DCO-1.1-Signed-off-by: Ward Vandewege <>

Revision 3910be32
Added by Ward Vandewege 14 days ago

Merge branch '12355-make-cwl-conformance-tests-faster'

closes #12355

Arvados-DCO-1.1-Signed-off-by: Ward Vandewege <>

History

#1 Updated by Tom Morris 22 days ago

  • Target version set to 2017-10-11 Sprint

#2 Updated by Peter Amstutz 22 days ago

  • Description updated (diff)
  • Target version deleted (2017-10-11 Sprint)

#3 Updated by Peter Amstutz 22 days ago

  • Target version set to 2017-10-11 Sprint

#4 Updated by Tom Morris 22 days ago

  • Assignee set to Ward Vandewege

#5 Updated by Ward Vandewege 18 days ago

We run these tests in arvbox in localdemo mode, which generates lots of docker volumes which it leaves lying around.

ci:~# docker volume list |wc
    485     971   41172
ci:~# docker volume prune
...

Total reclaimed space: 99.51GB
ci:~#
ci:~#
ci:~#
ci:~# docker volume list |wc
      6      13     457

#6 Updated by Ward Vandewege 18 days ago

  • Status changed from New to In Progress

#7 Updated by Ward Vandewege 18 days ago

Here's an example of a particularly slow test job (this doesn't happen to all the tests):

root@8bc7b8700ff1:/etc/service# grep 3opjy-8i9sb-8l1vnvznj0hrl4x */log/main/current
crunch-dispatch0/log/main/current:2017-10-01_23:24:35.21536 dispatch: ["git", "--git-dir=/var/lib/arvados/internal.git", "tag", "3opjy-8i9sb-8l1vnvznj0hrl4x", "a08e3bb86caa758df7d33a3df3f6b8c333e47838"]
crunch-dispatch0/log/main/current:2017-10-01_23:24:35.21908 dispatch: sudo -E -u crunch LD_LIBRARY_PATH= PATH=/var/lib/gems/ruby/2.3.0/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/var/lib/gems/bin:/usr/local/go/bin PERLLIB=/usr/src/arvados/sdk/perl/lib PYTHONPATH= RUBYLIB=/var/lib/gems/gems/bundler-1.15.4/lib GEM_PATH= /usr/src/arvados/sdk/cli/bin/crunch-job --job-api-token 4hja3jfqayqm5x5m1vu669ivuak5bk6zsenyndkdtg14ibgs3z --job 3opjy-8i9sb-8l1vnvznj0hrl4x --git-dir /var/lib/arvados/internal.git --docker-bin docker --docker-run-args --net=host
crunch-dispatch0/log/main/current:2017-10-01_23:24:35.21964 dispatch: job 3opjy-8i9sb-8l1vnvznj0hrl4x
crunch-dispatch0/log/main/current:2017-10-01_23:24:38.05689 cwl-runner 3opjy-8i9sb-8l1vnvznj0hrl4x queued 2017-10-01T23:24:34.735906000Z
crunch-dispatch0/log/main/current:2017-10-01_23:24:43.89373 cwl-runner 3opjy-8i9sb-8l1vnvznj0hrl4x queued 2017-10-01T23:24:34.735906000Z
crunch-dispatch0/log/main/current:2017-10-01_23:24:49.77193 cwl-runner 3opjy-8i9sb-8l1vnvznj0hrl4x queued 2017-10-01T23:24:34.735906000Z
crunch-dispatch0/log/main/current:2017-10-01_23:24:55.65407 cwl-runner 3opjy-8i9sb-8l1vnvznj0hrl4x queued 2017-10-01T23:24:34.735906000Z
crunch-dispatch0/log/main/current:2017-10-01_23:25:01.49044 cwl-runner 3opjy-8i9sb-8l1vnvznj0hrl4x queued 2017-10-01T23:24:34.735906000Z
crunch-dispatch0/log/main/current:2017-10-01_23:25:07.33950 cwl-runner 3opjy-8i9sb-8l1vnvznj0hrl4x queued 2017-10-01T23:24:34.735906000Z
crunch-dispatch0/log/main/current:2017-10-01_23:25:13.26439 cwl-runner 3opjy-8i9sb-8l1vnvznj0hrl4x queued 2017-10-01T23:24:34.735906000Z
crunch-dispatch0/log/main/current:2017-10-01_23:25:19.11267 cwl-runner 3opjy-8i9sb-8l1vnvznj0hrl4x queued 2017-10-01T23:24:34.735906000Z
crunch-dispatch0/log/main/current:2017-10-01_23:25:24.95535 cwl-runner 3opjy-8i9sb-8l1vnvznj0hrl4x queued 2017-10-01T23:24:34.735906000Z
crunch-dispatch0/log/main/current:2017-10-01_23:25:30.80978 cwl-runner 3opjy-8i9sb-8l1vnvznj0hrl4x queued 2017-10-01T23:24:34.735906000Z
crunch-dispatch0/log/main/current:2017-10-01_23:25:36.66150 cwl-runner 3opjy-8i9sb-8l1vnvznj0hrl4x queued 2017-10-01T23:24:34.735906000Z
crunch-dispatch0/log/main/current:2017-10-01_23:25:42.51133 cwl-runner 3opjy-8i9sb-8l1vnvznj0hrl4x queued 2017-10-01T23:24:34.735906000Z
crunch-dispatch0/log/main/current:2017-10-01_23:25:48.41501 cwl-runner 3opjy-8i9sb-8l1vnvznj0hrl4x queued 2017-10-01T23:24:34.735906000Z
crunch-dispatch0/log/main/current:2017-10-01_23:25:54.29049 cwl-runner 3opjy-8i9sb-8l1vnvznj0hrl4x queued 2017-10-01T23:24:34.735906000Z
crunch-dispatch0/log/main/current:2017-10-01_23:26:00.17030 cwl-runner 3opjy-8i9sb-8l1vnvznj0hrl4x queued 2017-10-01T23:24:34.735906000Z
crunch-dispatch0/log/main/current:2017-10-01_23:26:06.02121 cwl-runner 3opjy-8i9sb-8l1vnvznj0hrl4x queued 2017-10-01T23:24:34.735906000Z
crunch-dispatch0/log/main/current:2017-10-01_23:26:11.85058 cwl-runner 3opjy-8i9sb-8l1vnvznj0hrl4x queued 2017-10-01T23:24:34.735906000Z
crunch-dispatch0/log/main/current:2017-10-01_23:26:17.69224 cwl-runner 3opjy-8i9sb-8l1vnvznj0hrl4x queued 2017-10-01T23:24:34.735906000Z
crunch-dispatch0/log/main/current:2017-10-01_23:26:23.59006 cwl-runner 3opjy-8i9sb-8l1vnvznj0hrl4x queued 2017-10-01T23:24:34.735906000Z
crunch-dispatch0/log/main/current:2017-10-01_23:26:29.50650 cwl-runner 3opjy-8i9sb-8l1vnvznj0hrl4x queued 2017-10-01T23:24:34.735906000Z
crunch-dispatch0/log/main/current:2017-10-01_23:26:35.35775 cwl-runner 3opjy-8i9sb-8l1vnvznj0hrl4x queued 2017-10-01T23:24:34.735906000Z
crunch-dispatch0/log/main/current:2017-10-01_23:26:41.16927 cwl-runner 3opjy-8i9sb-8l1vnvznj0hrl4x queued 2017-10-01T23:24:34.735906000Z
crunch-dispatch0/log/main/current:2017-10-01_23:26:47.00715 cwl-runner 3opjy-8i9sb-8l1vnvznj0hrl4x queued 2017-10-01T23:24:34.735906000Z
crunch-dispatch0/log/main/current:2017-10-01_23:26:52.89632 cwl-runner 3opjy-8i9sb-8l1vnvznj0hrl4x queued 2017-10-01T23:24:34.735906000Z
crunch-dispatch0/log/main/current:2017-10-01_23:26:58.75690 cwl-runner 3opjy-8i9sb-8l1vnvznj0hrl4x queued 2017-10-01T23:24:34.735906000Z
crunch-dispatch0/log/main/current:2017-10-01_23:27:04.59749 cwl-runner 3opjy-8i9sb-8l1vnvznj0hrl4x queued 2017-10-01T23:24:34.735906000Z
crunch-dispatch0/log/main/current:2017-10-01_23:27:10.44902 cwl-runner 3opjy-8i9sb-8l1vnvznj0hrl4x queued 2017-10-01T23:24:34.735906000Z
crunch-dispatch0/log/main/current:2017-10-01_23:27:16.30613 cwl-runner 3opjy-8i9sb-8l1vnvznj0hrl4x queued 2017-10-01T23:24:34.735906000Z
crunch-dispatch0/log/main/current:2017-10-01_23:27:22.19203 cwl-runner 3opjy-8i9sb-8l1vnvznj0hrl4x queued 2017-10-01T23:24:34.735906000Z
crunch-dispatch0/log/main/current:2017-10-01_23:27:28.09981 cwl-runner 3opjy-8i9sb-8l1vnvznj0hrl4x queued 2017-10-01T23:24:34.735906000Z
crunch-dispatch0/log/main/current:2017-10-01_23:27:33.95345 cwl-runner 3opjy-8i9sb-8l1vnvznj0hrl4x queued 2017-10-01T23:24:34.735906000Z
crunch-dispatch0/log/main/current:2017-10-01_23:27:39.79504 cwl-runner 3opjy-8i9sb-8l1vnvznj0hrl4x queued 2017-10-01T23:24:34.735906000Z
crunch-dispatch0/log/main/current:2017-10-01_23:27:45.64178 cwl-runner 3opjy-8i9sb-8l1vnvznj0hrl4x queued 2017-10-01T23:24:34.735906000Z
crunch-dispatch0/log/main/current:2017-10-01_23:27:49.55444 3opjy-8i9sb-8l1vnvznj0hrl4x !  6388  Sanity check is `docker ps -q`
crunch-dispatch0/log/main/current:2017-10-01_23:27:49.55447 3opjy-8i9sb-8l1vnvznj0hrl4x !  6388  sanity check: start
crunch-dispatch0/log/main/current:2017-10-01_23:27:49.56418 3opjy-8i9sb-8l1vnvznj0hrl4x !  6388  stderr starting: ['docker','ps','-q']
crunch-dispatch0/log/main/current:2017-10-01_23:27:49.67802 3opjy-8i9sb-8l1vnvznj0hrl4x !  6388  sanity check: exit 0
crunch-dispatch0/log/main/current:2017-10-01_23:27:49.67806 3opjy-8i9sb-8l1vnvznj0hrl4x !  6388  Sanity check OK
crunch-dispatch0/log/main/current:2017-10-01_23:27:50.19404 3opjy-8i9sb-8l1vnvznj0hrl4x 6388  running from /usr/src/arvados/sdk/cli/bin/crunch-job with arvados-cli Gem version(s) 0.1.20170817171636
crunch-dispatch0/log/main/current:2017-10-01_23:27:51.07831 cwl-runner 3opjy-8i9sb-8l1vnvznj0hrl4x {:done=>0, :running=>0, :failed=>0, :todo=>1}
crunch-dispatch0/log/main/current:2017-10-01_23:27:51.08984 3opjy-8i9sb-8l1vnvznj0hrl4x 6388  check slurm allocation
crunch-dispatch0/log/main/current:2017-10-01_23:27:51.08991 3opjy-8i9sb-8l1vnvznj0hrl4x ! 2017-10-01 23:27:50 arvados.arv_put[12396] INFO: No cache usage requested for this run.
crunch-dispatch0/log/main/current:2017-10-01_23:27:51.08993 3opjy-8i9sb-8l1vnvznj0hrl4x 6388  node localhost - 8 slots
crunch-dispatch0/log/main/current:2017-10-01_23:27:51.08995 3opjy-8i9sb-8l1vnvznj0hrl4x 6388  start
crunch-dispatch0/log/main/current:2017-10-01_23:27:51.08997 3opjy-8i9sb-8l1vnvznj0hrl4x 6388  clean work dirs: start
crunch-dispatch0/log/main/current:2017-10-01_23:27:51.08999 3opjy-8i9sb-8l1vnvznj0hrl4x 6388  stderr starting: ['bash','-ec',' arv-mount --unmount-timeout 10 --unmount-all ${CRUNCH_TMP} rm -rf ${JOB_WORK} ${CRUNCH_INSTALL} ${CRUNCH_TMP}/task ${CRUNCH_TMP}/src* ${CRUNCH_TMP}/*.cid     ']
crunch-dispatch0/log/main/current:2017-10-01_23:27:51.09001 3opjy-8i9sb-8l1vnvznj0hrl4x 6388  clean work dirs: exit 0
crunch-dispatch0/log/main/current:2017-10-01_23:27:51.09003 3opjy-8i9sb-8l1vnvznj0hrl4x 6388  Install docker image 89f452e02664fbf685bb8b518200ff48+342
crunch-dispatch0/log/main/current:2017-10-01_23:27:51.09005 3opjy-8i9sb-8l1vnvznj0hrl4x 6388  docker image hash is sha256:ca63548c22c43fb0f9dbfe1682f5d925c6f79a88c717e87d046da77bd1a35345
crunch-dispatch0/log/main/current:2017-10-01_23:27:51.09007 3opjy-8i9sb-8l1vnvznj0hrl4x 6388  load docker image: start
crunch-dispatch0/log/main/current:2017-10-01_23:27:51.09011 3opjy-8i9sb-8l1vnvznj0hrl4x 6388  stderr starting: ['/bin/bash','-o','pipefail','-ec',' loaded() {   id=$(docker inspect --format="{{.ID}}" sha256\\:ca63548c22c43fb0f9dbfe1682f5d925c6f79a88c717e87d046da77bd1a35345) || return 1   echo "image ID is $id"   [[ ${id} = sha256\\:ca63548c22c43fb0f9dbfe1682f5d925c6f79a88c717e87d046da77bd1a35345 ]] } if loaded >&2 2>/dev/null; then   echo >&2 "image is already present"   exit 0 fi echo >&2 "docker image is not present; loading" arv-get 89f452e02664fbf685bb8b518200ff48\\+342\\/sha256\\:ca63548c22c43fb0f9dbfe1682f5d925c6f79a88c717e87d046da77bd1a35345\\.tar | docker load if ! loaded >&2; then   echo >&2 "`docker load` exited 0, but image is not found (!)"   exit 1 fi echo >&2 "image loaded successfully" ']
crunch-dispatch0/log/main/current:2017-10-01_23:27:51.09013 3opjy-8i9sb-8l1vnvznj0hrl4x 6388  stderr image ID is sha256:ca63548c22c43fb0f9dbfe1682f5d925c6f79a88c717e87d046da77bd1a35345
crunch-dispatch0/log/main/current:2017-10-01_23:27:51.10189 3opjy-8i9sb-8l1vnvznj0hrl4x 6388  stderr image is already present
crunch-dispatch0/log/main/current:2017-10-01_23:27:51.10452 3opjy-8i9sb-8l1vnvznj0hrl4x 6388  load docker image: exit 0
crunch-dispatch0/log/main/current:2017-10-01_23:27:51.11504 3opjy-8i9sb-8l1vnvznj0hrl4x 6388  check --memory-swap feature: start
crunch-dispatch0/log/main/current:2017-10-01_23:27:51.12729 3opjy-8i9sb-8l1vnvznj0hrl4x 6388  stderr starting: ['docker','run','--help']
crunch-dispatch0/log/main/current:2017-10-01_23:27:51.25062 3opjy-8i9sb-8l1vnvznj0hrl4x 6388  check --memory-swap feature: exit 0
crunch-dispatch0/log/main/current:2017-10-01_23:27:51.26089 3opjy-8i9sb-8l1vnvznj0hrl4x 6388  check whether default user is UID 0: start
crunch-dispatch0/log/main/current:2017-10-01_23:27:51.27326 3opjy-8i9sb-8l1vnvznj0hrl4x 6388  stderr starting: ['/bin/sh','-ec','docker run --net=host  sha256:ca63548c22c43fb0f9dbfe1682f5d925c6f79a88c717e87d046da77bd1a35345 id --user']
crunch-dispatch0/log/main/current:2017-10-01_23:27:51.71863 3opjy-8i9sb-8l1vnvznj0hrl4x 6388  check whether default user is UID 0: exit 0
crunch-dispatch0/log/main/current:2017-10-01_23:27:51.72874 3opjy-8i9sb-8l1vnvznj0hrl4x 6388  Container will run with default user
crunch-dispatch0/log/main/current:2017-10-01_23:27:51.73907 3opjy-8i9sb-8l1vnvznj0hrl4x 6388  Looking for version a08e3bb86caa758df7d33a3df3f6b8c333e47838 from repository arvados
crunch-dispatch0/log/main/current:2017-10-01_23:27:51.74958 3opjy-8i9sb-8l1vnvznj0hrl4x 6388  Using local repository '/var/lib/arvados/internal.git'
crunch-dispatch0/log/main/current:2017-10-01_23:27:51.76441 3opjy-8i9sb-8l1vnvznj0hrl4x 6388  Version a08e3bb86caa758df7d33a3df3f6b8c333e47838 is commit a08e3bb86caa758df7d33a3df3f6b8c333e47838
crunch-dispatch0/log/main/current:2017-10-01_23:27:51.99798 3opjy-8i9sb-8l1vnvznj0hrl4x 6388  run install script on all workers: start
crunch-dispatch0/log/main/current:2017-10-01_23:27:52.01147 3opjy-8i9sb-8l1vnvznj0hrl4x 6388  stderr starting: ['sh','-c','mkdir -p /tmp/crunch0/opt && cd /tmp/crunch0 && perl -']
crunch-dispatch0/log/main/current:2017-10-01_23:27:52.04300 3opjy-8i9sb-8l1vnvznj0hrl4x 6388  stderr Extracting archive: c2e1effc24b27bfc978c443705532885
crunch-dispatch0/log/main/current:2017-10-01_23:27:52.22986 3opjy-8i9sb-8l1vnvznj0hrl4x 6388  run install script on all workers: exit 0
crunch-dispatch0/log/main/current:2017-10-01_23:27:52.26353 3opjy-8i9sb-8l1vnvznj0hrl4x 6388  script cwl-runner
crunch-dispatch0/log/main/current:2017-10-01_23:27:52.26356 3opjy-8i9sb-8l1vnvznj0hrl4x 6388  script_version a08e3bb86caa758df7d33a3df3f6b8c333e47838
crunch-dispatch0/log/main/current:2017-10-01_23:27:52.26627 3opjy-8i9sb-8l1vnvznj0hrl4x 6388  script_parameters {"arv:enable_reuse":true,"file1":null,"cwl:tool":"3fd0716d9e4ef16a4fe2b04c53240fb6+60/workflow.cwl#main","arv:on_error":"continue"}
crunch-dispatch0/log/main/current:2017-10-01_23:27:52.27680 3opjy-8i9sb-8l1vnvznj0hrl4x 6388  runtime_constraints {"max_tasks_per_node":0,"min_ram_mb_per_node":1024,"docker_image":"arvados/jobs:a08e3bb86caa758df7d33a3df3f6b8c333e47838"}
crunch-dispatch0/log/main/current:2017-10-01_23:27:52.32018 3opjy-8i9sb-8l1vnvznj0hrl4x ! crunch-job have 1 free slots for 1 initial tasks at this level, 1 nodes, and 8 slots
crunch-dispatch0/log/main/current:2017-10-01_23:27:52.32020 3opjy-8i9sb-8l1vnvznj0hrl4x 6388  start level 0 with 1 slots
crunch-dispatch0/log/main/current:2017-10-01_23:27:52.40196 3opjy-8i9sb-8l1vnvznj0hrl4x 6388  status: 0 done, 0 running, 1 todo
crunch-dispatch0/log/main/current:2017-10-01_23:27:52.41401 3opjy-8i9sb-8l1vnvznj0hrl4x 6388 0 job_task 3opjy-ot0gb-n0nnqshooftstyr
crunch-dispatch0/log/main/current:2017-10-01_23:27:52.42421 3opjy-8i9sb-8l1vnvznj0hrl4x 6388 0 child 12610 started on localhost.1
crunch-dispatch0/log/main/current:2017-10-01_23:27:52.47857 3opjy-8i9sb-8l1vnvznj0hrl4x 6388 0 stderr starting: ['bash','-c','if [ -e \\/tmp\\/crunch0\\/task\\/localhost\\.1 ]; then rm -rf \\/tmp\\/crunch0\\/task\\/localhost\\.1; fi; mkdir -p \\/tmp\\/crunch0 \\/tmp\\/crunch0\\/work \\/tmp\\/crunch0\\/task\\/localhost\\.1 \\/tmp\\/crunch0\\/task\\/localhost\\.1\\.keep && cd \\/tmp\\/crunch0 && MEM=$(awk \'($1 == "MemTotal:"){print $2}\' </proc/meminfo) && SWAP=$(awk \'($1 == "SwapTotal:"){print $2}\' </proc/meminfo) && MEMLIMIT=$(( ($MEM * 95) / (1 * 100) )) && let SWAPLIMIT=$MEMLIMIT+$SWAP && declare -a VOLUMES=() && if which crunchrunner >/dev/null ; then VOLUMES+=("--volume=$(which crunchrunner):/usr/local/bin/crunchrunner:ro") ; fi && if test -f /etc/ssl/certs/ca-certificates.crt ; then VOLUMES+=("--volume=/etc/ssl/certs/ca-certificates.crt:/etc/arvados/ca-certificates.crt:ro") ; elif test -f /etc/pki/tls/certs/ca-bundle.crt ; then VOLUMES+=("--volume=/etc/pki/tls/certs/ca-bundle.crt:/etc/arvados/ca-certificates.crt:ro") ; fi && exec arv-mount --read-write --mount-by-pdh=by_pdh --mount-tmp=tmp --crunchstat-interval=10 --allow-other  \\/tmp\\/crunch0\\/task\\/localhost\\.1\\.keep --exec crunchstat -cgroup-root=\\/sys\\/fs\\/cgroup -cgroup-parent=docker -cgroup-cid=/tmp/crunch0/3opjy-ot0gb-n0nnqshooftstyr-0.cid -poll=10000 docker run --net=host --name=3opjy-ot0gb-n0nnqshooftstyr-0 --attach=stdout --attach=stderr --attach=stdin -i  --cidfile=/tmp/crunch0/3opjy-ot0gb-n0nnqshooftstyr-0.cid --sig-proxy --memory=${MEMLIMIT}k --memory-swap=${SWAPLIMIT}k --volume=\\/tmp\\/crunch0\\/src\\:\\/tmp\\/crunch0\\/src\\:ro --volume=\\/tmp\\/crunch0\\/opt\\:\\/tmp\\/crunch0\\/opt\\:ro --volume=\\/tmp\\/crunch0\\/task\\/localhost\\.1\\.keep\\/by_pdh\\:\\/keep\\:ro --volume=\\/tmp\\/crunch0\\/task\\/localhost\\.1\\.keep\\/tmp\\:\\/keep_tmp --volume=/tmp "${VOLUMES[@]}" --env=CRUNCH_SRC\\=\\/tmp\\/crunch0\\/src --env=CRUNCH_JOB_BIN\\=\\/usr\\/src\\/arvados\\/sdk\\/cli\\/bin\\/crunch\\-job --env=CRUNCH_GIT_ARCHIVE_HASH\\=c2e1effc24b27bfc978c443705532885 --env=CRUNCH_JOB_UUID\\=3opjy\\-8i9sb\\-8l1vnvznj0hrl4x --env=CRUNCH_SRC_URL\\=\\/var\\/lib\\/arvados\\/internal\\.git --env=JOB_PARAMETER_ARV\\:ON_ERROR\\=continue --env=JOB_UUID\\=3opjy\\-8i9sb\\-8l1vnvznj0hrl4x --env=TASK_WORK\\=\\/tmp\\/crunch\\-job\\-task\\-work\\/localhost\\.1 --env=TASK_SEQUENCE\\=0 --env=CRUNCH_JOB_DOCKER_RUN_ARGS\\=\\-\\-net\\=host --env=CRUNCH_JOB_DOCKER_BIN\\=docker --env=CRUNCH_TMP\\=\\/tmp\\/crunch0 --env=TASK_SLOT_NODE\\=localhost --env=TASK_KEEPMOUNT_TMP\\=\\/keep_tmp --env=TASK_SLOT_NUMBER\\=1 --env=ARVADOS_API_TOKEN\\=[...] --env=TASK_TMPDIR\\=\\/tmp\\/crunch\\-job\\-task\\-work\\/localhost\\.1 --env=ARVADOS_API_HOST_INSECURE\\=1 --env=ARVADOS_API_HOST\\=172\\.17\\.0\\.2\\:8000 --env=JOB_PARAMETER_CWL\\:TOOL\\=3fd0716d9e4ef16a4fe2b04c53240fb6\\+60\\/workflow\\.cwl\\#main --env=CRUNCH_REFRESH_TRIGGER\\=\\/tmp\\/crunch_refresh_trigger --env=CRUNCH_INSTALL\\=\\/tmp\\/crunch0\\/opt --env=CRUNCH_WORK\\=\\/tmp\\/crunch0\\/work --env=TASK_KEEPMOUNT\\=\\/keep --env=CRUNCH_SRC_COMMIT\\=a08e3bb86caa758df7d33a3df3f6b8c333e47838 --env=JOB_WORK\\=\\/tmp\\/crunch\\-job\\-work --env=JOB_SCRIPT\\=cwl\\-runner --env=JOB_PARAMETER_ARV\\:ENABLE_REUSE\\=1 --env=CRUNCH_NODE_SLOTS\\=1 --env=TASK_QSEQUENCE\\=0 --env=JOB_PARAMETER_FILE1\\= --env=TASK_UUID\\=3opjy\\-ot0gb\\-n0nnqshooftstyr --env=HOME\\=\\/tmp\\/crunch\\-job\\-task\\-work\\/localhost\\.1 sha256\\:ca63548c22c43fb0f9dbfe1682f5d925c6f79a88c717e87d046da77bd1a35345 /bin/sh -c \'python -c "from pkg_resources import get_distribution as get; print \\"Using Arvados SDK version\\", get(\\"arvados-python-client\\").version">&2 2>/dev/null; mkdir -p "/tmp/crunch-job-work" "/tmp/crunch-job-task-work/localhost.1" && if which stdbuf >/dev/null ; then   exec  stdbuf --output=0 --error=0  \\/tmp\\/crunch0\\/src\\/crunch_scripts\\/cwl\\-runner ; else   exec \\/tmp\\/crunch0\\/src\\/crunch_scripts\\/cwl\\-runner ; fi\'']
crunch-dispatch0/log/main/current:2017-10-01_23:27:52.54833 3opjy-8i9sb-8l1vnvznj0hrl4x 6388  status: 0 done, 1 running, 0 todo
crunch-dispatch0/log/main/current:2017-10-01_23:27:52.95995 3opjy-8i9sb-8l1vnvznj0hrl4x 6388 0 stderr crunchstat: Running [docker run --net=host --name=3opjy-ot0gb-n0nnqshooftstyr-0 --attach=stdout --attach=stderr --attach=stdin -i --cidfile=/tmp/crunch0/3opjy-ot0gb-n0nnqshooftstyr-0.cid --sig-proxy --memory=29342775k --memory-swap=29342775k --volume=/tmp/crunch0/src:/tmp/crunch0/src:ro --volume=/tmp/crunch0/opt:/tmp/crunch0/opt:ro --volume=/tmp/crunch0/task/localhost.1.keep/by_pdh:/keep:ro --volume=/tmp/crunch0/task/localhost.1.keep/tmp:/keep_tmp --volume=/tmp --volume=/usr/local/bin/crunchrunner:/usr/local/bin/crunchrunner:ro --volume=/etc/ssl/certs/ca-certificates.crt:/etc/arvados/ca-certificates.crt:ro --env=CRUNCH_SRC=/tmp/crunch0/src --env=CRUNCH_JOB_BIN=/usr/src/arvados/sdk/cli/bin/crunch-job --env=CRUNCH_GIT_ARCHIVE_HASH=c2e1effc24b27bfc978c443705532885 --env=CRUNCH_JOB_UUID=3opjy-8i9sb-8l1vnvznj0hrl4x --env=CRUNCH_SRC_URL=/var/lib/arvados/internal.git --env=JOB_PARAMETER_ARV:ON_ERROR=continue --env=JOB_UUID=3opjy-8i9sb-8l1vnvznj0hrl4x --env=TASK_WORK=/tmp/crunch-job-task-work/localhost.1 --env=TASK_SEQUENCE=0 --env=CRUNCH_JOB_DOCKER_RUN_ARGS=--net=host --env=CRUNCH_JOB_DOCKER_BIN=docker --env=CRUNCH_TMP=/tmp/crunch0 --env=TASK_SLOT_NODE=localhost --env=TASK_KEEPMOUNT_TMP=/keep_tmp --env=TASK_SLOT_NUMBER=1 --env=ARVADOS_API_TOKEN=4hja3jfqayqm5x5m1vu669ivuak5bk6zsenyndkdtg14ibgs3z --env=TASK_TMPDIR=/tmp/crunch-job-task-work/localhost.1 --env=ARVADOS_API_HOST_INSECURE=1 --env=ARVADOS_API_HOST=172.17.0.2:8000 --env=JOB_PARAMETER_CWL:TOOL=3fd0716d9e4ef16a4fe2b04c53240fb6+60/workflow.cwl#main --env=CRUNCH_REFRESH_TRIGGER=/tmp/crunch_refresh_trigger --env=CRUNCH_INSTALL=/tmp/crunch0/opt --env=CRUNCH_WORK=/tmp/crunch0/work --env=TASK_KEEPMOUNT=/keep --env=CRUNCH_SRC_COMMIT=a08e3bb86caa758df7d33a3df3f6b8c333e47838 --env=JOB_WORK=/tmp/crunch-job-work --env=JOB_SCRIPT=cwl-runner --env=JOB_PARAMETER_ARV:ENABLE_REUSE=1 --env=CRUNCH_NODE_SLOTS=1 --env=TASK_QSEQUENCE=0 --env=JOB_PARAMETER_FILE1= --env=TASK_UUID=3opjy-ot0gb-n0nnqshooftstyr --env=HOME=/tmp/crunch-job-task-work/localhost.1 sha256:ca63548c22c43fb0f9dbfe1682f5d925c6f79a88c717e87d046da77bd1a35345 /bin/sh -c python -c "from pkg_resources import get_distribution as get; print \"Using Arvados SDK version\", get(\"arvados-python-client\").version">&2 2>/dev/null; mkdir -p "/tmp/crunch-job-work" "/tmp/crunch-job-task-work/localhost.1" && if which stdbuf >/dev/null ; then   exec  stdbuf --output=0 --error=0  \/tmp\/crunch0\/src\/crunch_scripts\/cwl\-runner ; else   exec \/tmp\/crunch0\/src\/crunch_scripts\/cwl\-runner ; fi]
crunch-dispatch0/log/main/current:2017-10-01_23:27:53.17098 3opjy-8i9sb-8l1vnvznj0hrl4x 6388 0 stderr WARNING: Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap.
crunch-dispatch0/log/main/current:2017-10-01_23:27:53.19502 3opjy-8i9sb-8l1vnvznj0hrl4x 6388 0 stderr crunchstat: notice: reading stats from /sys/fs/cgroup/cpuacct/docker/faac60713a8bbe459e1edc0a5680cc3604c510470d6c7924df94b89182ad1169/cgroup.procs
crunch-dispatch0/log/main/current:2017-10-01_23:27:53.20549 3opjy-8i9sb-8l1vnvznj0hrl4x 6388 0 stderr crunchstat: notice: reading stats from /sys/fs/cgroup/memory/docker/faac60713a8bbe459e1edc0a5680cc3604c510470d6c7924df94b89182ad1169/memory.stat
crunch-dispatch0/log/main/current:2017-10-01_23:27:53.21576 3opjy-8i9sb-8l1vnvznj0hrl4x 6388 0 stderr crunchstat: mem 0 cache 0 pgmajfault 1503232 rss
crunch-dispatch0/log/main/current:2017-10-01_23:27:53.22603 3opjy-8i9sb-8l1vnvznj0hrl4x 6388 0 stderr crunchstat: notice: reading stats from /sys/fs/cgroup/cpuacct/docker/faac60713a8bbe459e1edc0a5680cc3604c510470d6c7924df94b89182ad1169/cpuacct.stat
crunch-dispatch0/log/main/current:2017-10-01_23:27:53.23634 3opjy-8i9sb-8l1vnvznj0hrl4x 6388 0 stderr crunchstat: notice: reading stats from /sys/fs/cgroup/cpuset/docker/faac60713a8bbe459e1edc0a5680cc3604c510470d6c7924df94b89182ad1169/cpuset.cpus
crunch-dispatch0/log/main/current:2017-10-01_23:27:53.24647 3opjy-8i9sb-8l1vnvznj0hrl4x 6388 0 stderr crunchstat: cpu 0.0200 user 0.0100 sys 8 cpus
crunch-dispatch0/log/main/current:2017-10-01_23:27:53.25694 3opjy-8i9sb-8l1vnvznj0hrl4x 6388 0 stderr crunchstat: notice: reading stats from /sys/fs/cgroup/blkio/docker/faac60713a8bbe459e1edc0a5680cc3604c510470d6c7924df94b89182ad1169/blkio.io_service_bytes
crunch-dispatch0/log/main/current:2017-10-01_23:27:53.26788 3opjy-8i9sb-8l1vnvznj0hrl4x 6388 0 stderr crunchstat: net:docker0 0 tx 0 rx
crunch-dispatch0/log/main/current:2017-10-01_23:27:53.27754 3opjy-8i9sb-8l1vnvznj0hrl4x 6388 0 stderr crunchstat: net:eth0 854023 tx 401647313 rx
crunch-dispatch0/log/main/current:2017-10-01_23:27:53.48866 3opjy-8i9sb-8l1vnvznj0hrl4x 6388 0 stderr Using Arvados SDK version 0.1.20170818194607
crunch-dispatch0/log/main/current:2017-10-01_23:27:54.32346 3opjy-8i9sb-8l1vnvznj0hrl4x 6388 0 stderr arvados.cwl-runner INFO: /tmp/crunch0/src/crunch_scripts/cwl-runner a08e3bb86caa758df7d33a3df3f6b8c333e47838 1.0.20170929203319, arvados-python-client 0.1.20170818194607, cwltool 1.0.20170928192020
crunch-dispatch0/log/main/current:2017-10-01_23:27:56.19649 cwl-runner 3opjy-8i9sb-8l1vnvznj0hrl4x {:todo=>0, :failed=>0, :running=>1, :done=>0}
crunch-dispatch0/log/main/current:2017-10-01_23:27:56.19664 3opjy-8i9sb-8l1vnvznj0hrl4x 6388 0 stderr Workflow checker warning:
crunch-dispatch0/log/main/current:2017-10-01_23:27:56.19668 3opjy-8i9sb-8l1vnvznj0hrl4x 6388 0 stderr keep/3fd0716d9e4ef16a4fe2b04c53240fb6+60/workflow.cwl:9:11: Source 'file1' of type ["null", "File"] is partially incompatible
crunch-dispatch0/log/main/current:2017-10-01_23:27:56.19674 3opjy-8i9sb-8l1vnvznj0hrl4x 6388 0 stderr keep/3fd0716d9e4ef16a4fe2b04c53240fb6+60/workflow.cwl:26:13:   with sink 'file1' of type "File" 
crunch-dispatch0/log/main/current:2017-10-01_23:27:56.19675 3opjy-8i9sb-8l1vnvznj0hrl4x 6388 0 stderr Workflow checker warning:
crunch-dispatch0/log/main/current:2017-10-01_23:27:56.19676 3opjy-8i9sb-8l1vnvznj0hrl4x 6388 0 stderr keep/3fd0716d9e4ef16a4fe2b04c53240fb6+60/workflow.cwl:9:11: Source 'file1' of type ["null", "File"] is partially incompatible
crunch-dispatch0/log/main/current:2017-10-01_23:27:56.19679 3opjy-8i9sb-8l1vnvznj0hrl4x 6388 0 stderr keep/3fd0716d9e4ef16a4fe2b04c53240fb6+60/workflow.cwl:26:13:   with sink 'file1' of type "File" 
crunch-dispatch0/log/main/current:2017-10-01_23:27:56.19681 3opjy-8i9sb-8l1vnvznj0hrl4x 6388 0 stderr [workflow workflow.cwl#main] start
crunch-dispatch0/log/main/current:2017-10-01_23:27:56.19683 3opjy-8i9sb-8l1vnvznj0hrl4x 6388 0 stderr [step step1] start
crunch-dispatch0/log/main/current:2017-10-01_23:27:56.19686 3opjy-8i9sb-8l1vnvznj0hrl4x 6388 0 stderr arvados.cwl-runner INFO: [job step1] reused job 3opjy-8i9sb-0xq1n2f22q4u7aq
crunch-dispatch0/log/main/current:2017-10-01_23:27:56.34585 3opjy-8i9sb-8l1vnvznj0hrl4x 6388 0 stderr [step step1] completed success
crunch-dispatch0/log/main/current:2017-10-01_23:27:56.35637 3opjy-8i9sb-8l1vnvznj0hrl4x 6388 0 stderr [step step2] start
crunch-dispatch0/log/main/current:2017-10-01_23:27:56.56731 3opjy-8i9sb-8l1vnvznj0hrl4x 6388 0 stderr [step step2] completed success
crunch-dispatch0/log/main/current:2017-10-01_23:27:56.57754 3opjy-8i9sb-8l1vnvznj0hrl4x 6388 0 stderr [workflow workflow.cwl#main] completed success
crunch-dispatch0/log/main/current:2017-10-01_23:27:56.58777 3opjy-8i9sb-8l1vnvznj0hrl4x 6388 0 stderr arvados.cwl-runner INFO: Overall process status is success
crunch-dispatch0/log/main/current:2017-10-01_23:27:56.60485 3opjy-8i9sb-8l1vnvznj0hrl4x 6388 0 stderr arvados.cwl-runner INFO: Final output collection b17d5f1b3929239388d8e5b7038931c6+59 "Output of main (2017-10-01T23:27:56.584Z)" (3opjy-4zz18-b3fi6roye3wivbc)
crunch-dispatch0/log/main/current:2017-10-01_23:27:56.93023 3opjy-8i9sb-8l1vnvznj0hrl4x 6388 0 child 12610 on localhost.1 exit 0 success=1
crunch-dispatch0/log/main/current:2017-10-01_23:27:56.94047 3opjy-8i9sb-8l1vnvznj0hrl4x 6388 0 success in 4 seconds
crunch-dispatch0/log/main/current:2017-10-01_23:27:56.99571 3opjy-8i9sb-8l1vnvznj0hrl4x 6388 0 task output (35 bytes): b17d5f1b3929239388d8e5b7038931c6+59
crunch-dispatch0/log/main/current:2017-10-01_23:27:57.00600 3opjy-8i9sb-8l1vnvznj0hrl4x 6388  1 tasks exited (1 succeeded), checking for new tasks from API server.
crunch-dispatch0/log/main/current:2017-10-01_23:27:57.04712 3opjy-8i9sb-8l1vnvznj0hrl4x 6388  Got 0 new tasks from API server.
crunch-dispatch0/log/main/current:2017-10-01_23:27:57.12886 3opjy-8i9sb-8l1vnvznj0hrl4x 6388  status: 1 done, 0 running, 0 todo
crunch-dispatch0/log/main/current:2017-10-01_23:27:57.13918 3opjy-8i9sb-8l1vnvznj0hrl4x 6388  wait for last 0 children to finish
crunch-dispatch0/log/main/current:2017-10-01_23:27:57.14937 3opjy-8i9sb-8l1vnvznj0hrl4x 6388  Freeze not implemented
crunch-dispatch0/log/main/current:2017-10-01_23:27:57.15969 3opjy-8i9sb-8l1vnvznj0hrl4x 6388  collate
crunch-dispatch0/log/main/current:2017-10-01_23:27:57.60841 3opjy-8i9sb-8l1vnvznj0hrl4x 6388  collated output manifest text to send to API server is 110 bytes with access tokens
crunch-dispatch0/log/main/current:2017-10-01_23:27:57.69536 3opjy-8i9sb-8l1vnvznj0hrl4x 6388  job output b17d5f1b3929239388d8e5b7038931c6+59
crunch-dispatch0/log/main/current:2017-10-01_23:27:57.76883 3opjy-8i9sb-8l1vnvznj0hrl4x 6388  finish
crunch-dispatch0/log/main/current:2017-10-01_23:27:57.93659 3opjy-8i9sb-8l1vnvznj0hrl4x 6388  log collection is 2f64a912eb5b5d2958e5d1e059928021+85
crunch-dispatch0/log/main/current:2017-10-01_23:27:58.08029 dispatch: job 3opjy-8i9sb-8l1vnvznj0hrl4x end
crunch-dispatch0/log/main/current:2017-10-01_23:28:02.03382 cwl-runner 3opjy-8i9sb-8l1vnvznj0hrl4x b17d5f1b3929239388d8e5b7038931c6+59
crunch-dispatch1/log/main/current:2017-10-01_23:24:35.57990 dispatch: ["git", "--git-dir=/var/lib/arvados/internal.git", "tag", "3opjy-8i9sb-8l1vnvznj0hrl4x", "a08e3bb86caa758df7d33a3df3f6b8c333e47838"]
crunch-dispatch1/log/main/current:2017-10-01_23:24:35.58508 dispatch: sudo -E -u crunch LD_LIBRARY_PATH= PATH=/var/lib/gems/ruby/2.3.0/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/var/lib/gems/bin:/usr/local/go/bin PERLLIB=/usr/src/arvados/sdk/perl/lib PYTHONPATH= RUBYLIB=/var/lib/gems/gems/bundler-1.15.4/lib GEM_PATH= /usr/src/arvados/sdk/cli/bin/crunch-job --job-api-token 5bg6t2ybjggzim7xude15gmfvk1wk528h0la8dftpkn4580lpn --job 3opjy-8i9sb-8l1vnvznj0hrl4x --git-dir /var/lib/arvados/internal.git --docker-bin docker --docker-run-args --net=host
crunch-dispatch1/log/main/current:2017-10-01_23:24:35.58560 dispatch: job 3opjy-8i9sb-8l1vnvznj0hrl4x
crunch-dispatch1/log/main/current:2017-10-01_23:28:06.87840 3opjy-8i9sb-8l1vnvznj0hrl4x !  6394  Sanity check is `docker ps -q`
crunch-dispatch1/log/main/current:2017-10-01_23:28:06.87843 3opjy-8i9sb-8l1vnvznj0hrl4x !  6394  sanity check: start
crunch-dispatch1/log/main/current:2017-10-01_23:28:06.88624 3opjy-8i9sb-8l1vnvznj0hrl4x !  6394  stderr starting: ['docker','ps','-q']
crunch-dispatch1/log/main/current:2017-10-01_23:28:07.00297 3opjy-8i9sb-8l1vnvznj0hrl4x !  6394  sanity check: exit 0
crunch-dispatch1/log/main/current:2017-10-01_23:28:07.00299 3opjy-8i9sb-8l1vnvznj0hrl4x !  6394  Sanity check OK
crunch-dispatch1/log/main/current:2017-10-01_23:28:07.06402  6394  API method jobs/lock failed: API call /jobs/3opjy-8i9sb-8l1vnvznj0hrl4x/lock failed: 422 Unprocessable Entity {"errors":["#<ArvadosModel::AlreadyLockedError: ArvadosModel::AlreadyLockedError>"],"error_token":"1506900487+40fbf247"}. Retrying at 2017-10-01T23:28:08Z.
crunch-dispatch1/log/main/current:2017-10-01_23:28:08.10061  6394  API method jobs/lock failed: API call /jobs/3opjy-8i9sb-8l1vnvznj0hrl4x/lock failed: 422 Unprocessable Entity {"errors":["#<ArvadosModel::AlreadyLockedError: ArvadosModel::AlreadyLockedError>"],"error_token":"1506900488+e0945fc5"}. Retrying at 2017-10-01T23:28:10Z.
crunch-dispatch1/log/main/current:2017-10-01_23:28:10.13425  6394  API method jobs/lock failed: API call /jobs/3opjy-8i9sb-8l1vnvznj0hrl4x/lock failed: 422 Unprocessable Entity {"errors":["#<ArvadosModel::AlreadyLockedError: ArvadosModel::AlreadyLockedError>"],"error_token":"1506900490+76758d4c"}. Retrying at 2017-10-01T23:28:14Z.
crunch-dispatch1/log/main/current:2017-10-01_23:28:14.16847 3opjy-8i9sb-8l1vnvznj0hrl4x !  6394  Error while locking job, exiting 75
crunch-dispatch1/log/main/current:2017-10-01_23:28:14.17735 dispatch: job 3opjy-8i9sb-8l1vnvznj0hrl4x end

#8 Updated by Ward Vandewege 17 days ago

I instrumented a copy of arvbox to print timestamps, and then parsed those out. This is the result (in seconds). It's missing the timing for test 111 (that's because I'm not printing a timestamp at the end of the run).

jenkins@ci:~/slow-arvbox-test/arvados$ cat screenlog.0 |~/calculate-timings.rb 
Test   1:  43
Test   2:  32
Test   3:  32
Test   4:  77
Test   5: 407
Test   6: 413
Test   7: 422
Test   8: 408
Test   9: 423
Test  10: 604
Test  11: 603
Test  12: 604
Test  13: 619
Test  14: 332
Test  15: 438
Test  16: 287
Test  17: 468
Test  18: 693
Test  19: 288
Test  20: 708
Test  21: 243
Test  22: 438
Test  23: 768
Test  24: 258
Test  25: 633
Test  26: 438
Test  27: 634
Test  28: 152
Test  29: 813
Test  30: 1130
Test  31: 829
Test  32:  62
Test  33: 347
Test  34: 228
Test  35: 287
Test  36: 348
Test  37: 423
Test  38: 227
Test  39: 529
Test  40: 332
Test  41: 513
Test  42: 663
Test  43: 438
Test  44: 347
Test  45: 363
Test  46: 543
Test  47: 152
Test  48:  62
Test  49:  17
Test  50: 108
Test  51:  62
Test  52: 242
Test  53: 693
Test  54: 408
Test  55: 648
Test  56: 258
Test  57: 814
Test  58: 497
Test  59: 624
Test  60: 113
Test  61: 167
Test  62: 273
Test  63: 482
Test  64: 303
Test  65: 678
Test  66: 603
Test  67: 543
Test  68: 558
Test  69: 768
Test  70: 1159
Test  71: 303
Test  72: 362
Test  73: 1054
Test  74: 769
Test  75: 1039
Test  76: 648
Test  77: 498
Test  78: 603
Test  79: 618
Test  80: 618
Test  81:   1
Test  82: 573
Test  83: 435
Test  84: 498
Test  85: 273
Test  86: 587
Test  87: 333
Test  88: 588
Test  89: 603
Test  90: 573
Test  91: 302
Test  92: 679
Test  93: 332
Test  94: 423
Test  95:   1
Test  96: 363
Test  97: 829
Test  98: 633
Test  99:   1
Test 100: 468
Test 101:  92
Test 102:   2
Test 103: 498
Test 104: 303
Test 105: 182
Test 106: 182
Test 107: 392
Test 108: 318
Test 109:   1
Test 110: 303

Here is a comparison with a run on another machine (which is about 50% slower than ci in general):

Test   1:  65
Test   2:  33
Test   3:  33
Test   4:  33
Test   5:  33
Test   6:  41
Test   7:  63
Test   8:  63
Test   9:  78
Test  10:  78
Test  11:  78
Test  12:  78
Test  13:  78
Test  14:  18
Test  15:  62
Test  16:  63
Test  17:  78
Test  18:  78
Test  19:  18
Test  20: 123
Test  21:  18
Test  22:  79
Test  23: 138
Test  24:  18
Test  25: 138
Test  26: 109
Test  27:  78
Test  28:  18
Test  29: 123
Test  30: 153
Test  31: 138
Test  32:  18
Test  33:  94
Test  34:  17
Test  35:  78
Test  36:  79
Test  37:  48
Test  38:  63
Test  39:  93
Test  40:  33
Test  41:  48
Test  42: 138
Test  43:  18
Test  44:  33
Test  45:  94
Test  46:  33
Test  47: 123
Test  48:  78
Test  49:  18
Test  50: 139
Test  51:  63
Test  52:  18
Test  53: 124
Test  54:  18
Test  55: 108
Test  56:  18
Test  57: 124
Test  58:  63
Test  59: 116
Test  60:  42
Test  61:  18
Test  62:  63
Test  63: 108
Test  64:  18
Test  65: 123
Test  66:  63
Test  67:  78
Test  68:  78
Test  69:  93
Test  70: 154
Test  71:  18
Test  72:  63
Test  73: 138
Test  74:  78
Test  75:  94
Test  76:  63
Test  77:  48
Test  78:  63
Test  79:  63
Test  80:  63
Test  81:   2
Test  82:  63
Test  83:  76
Test  84:  33
Test  85:  18
Test  86:  48
Test  87:  63
Test  88:  78
Test  89:  63
Test  90:  63
Test  91:  18
Test  92: 109
Test  93:  17
Test  94:  48
Test  95:   3
Test  96:  48
Test  97: 123
Test  98:  63
Test  99:   2
Test 100:  63
Test 101:  78
Test 102:   4
Test 103:  93
Test 104:  63
Test 105:  18
Test 106:  63
Test 107: 109
Test 108:  63
Test 109:   2
Test 110:  78

#9 Updated by Ward Vandewege 15 days ago

Switching to using the containers API makes a huge improvement, it takes the runtime down to around 36 minutes, even on ci.

That's with crunch-dispatch-local executing one container request at a time.

If I give cwltest the -j=8 parameter (for instance) to run 8 of these things at a time, arvados-cwl-runner bombs out like this:


2017-10-03 23:13:58 arvados.arv_put INFO: Resuming upload from cache file /root/.cache/arvados/arv-put/c5dadc18a2dc00619c0a24e33ed5e703
2017-10-03 23:13:58 arvados.arv_put ERROR: arv-put: Another process is already uploading this data.
         Use --no-cache if this is really what you want.
2017-10-03 23:13:58 cwltool ERROR: Workflow error, try again with --debug for more information:
v1.0/cat3-tool.cwl:7:5: keepdocker exited with code 1

The failures are all to do with multiple jobs trying to arv-put (the same) docker images via arv-keepdocker.

Do we have a way to namespace the arv-keepdocker calls, or serialize them?

#10 Updated by Peter Amstutz 15 days ago

Ward Vandewege wrote:

Switching to using the containers API makes a huge improvement, it takes the runtime down to around 36 minutes, even on ci.

That sounds like a normal amount of time. Looking at the crunch1 dispatching path, the point where it gets hung up seems to be when it runs "sudo crunch-job ..." and crunch2 doesn't have that.

That's with crunch-dispatch-local executing one container request at a time.

If I give cwltest the -j=8 parameter (for instance) to run 8 of these things at a time, arvados-cwl-runner bombs out like this:

[...]

The failures are all to do with multiple jobs trying to arv-put (the same) docker images via arv-keepdocker.

Do we have a way to namespace the arv-keepdocker calls, or serialize them?

I got this error the last time I tried to use the parallel execution option. I thought I filed a bug but maybe not. We should fix it.

#11 Updated by Ward Vandewege 15 days ago

For future reference, this is an example of the runtime on the ci box as of 3623287cdc05121a86e573b89aebb6e4aa6866b5 on branch 12355-make-cwl-conformance-tests-faster. The big gain came from switching to the containers API, though the branch has some other fixes too.

Test   1:  28
Test   2:  19
Test   3:  17
Test   4:  17
Test   5:  17
Test   6:  21
Test   7:  17
Test   8:  17
Test   9:  17
Test  10:  17
Test  11:  17
Test  12:  17
Test  13:  17
Test  14:  17
Test  15:  16
Test  16:  17
Test  17:  17
Test  18:  17
Test  19:  16
Test  20:  18
Test  21:  17
Test  22:  17
Test  23:  32
Test  24:  17
Test  25:  32
Test  26:  32
Test  27:  32
Test  28:  17
Test  29:  17
Test  30:  31
Test  31:  32
Test  32:  17
Test  33:  17
Test  34:  17
Test  35:  17
Test  36:  17
Test  37:  17
Test  38:  17
Test  39:  17
Test  40:  17
Test  41:  17
Test  42:  32
Test  43:  17
Test  44:  17
Test  45:  17
Test  46:  17
Test  47:  32
Test  48:  47
Test  49:   2
Test  50:  47
Test  51:  17
Test  52:   3
Test  53:  17
Test  54:  17
Test  55:  17
Test  56:   2
Test  57:  18
Test  58:  16
Test  59:  23
Test  60:   8
Test  61:   2
Test  62:  17
Test  63:  32
Test  64:  17
Test  65:  17
Test  66:  17
Test  67:  17
Test  68:  17
Test  69:  32
Test  70:  32
Test  71:  17
Test  72:  17
Test  73:  32
Test  74:  31
Test  75:  48
Test  76:  17
Test  77:  17
Test  78:  17
Test  79:  17
Test  80:  17
Test  81:   1
Test  82:  17
Test  83:  45
Test  84:  19
Test  85:   2
Test  86:  17
Test  87:  17
Test  88:  17
Test  89:  32
Test  90:  17
Test  91:  17
Test  92:  17
Test  93:  17
Test  94:  17
Test  95:   2
Test  96:   2
Test  97:  17
Test  98:  17
Test  99:   1
Test 100:  18
Test 101:  17
Test 102:   2
Test 103:  32
Test 104:  17
Test 105:  17
Test 106:  17
Test 107:  17
Test 108:  17
Test 109:   2
Test 110:  17
107 tests passed, 4 unsupported features

All tool tests succeeded
+ CODE=0
+ test 0 = 0
+ arvbox stop
cwltest
cwltest
+ exit 0

real    35m47.248s
user    0m0.324s
sys    0m0.332s

#12 Updated by Ward Vandewege 15 days ago

ready for review: branch 12355-make-cwl-conformance-tests-faster at 832e319fe2ba70e3ae18410238f07281aa929af9

#13 Updated by Peter Amstutz 14 days ago

@@ -102,7 +102,7 @@ EOF2
 chmod +x /tmp/cwltest/arv-cwl-containers

 env
-exec ./run_test.sh EXTRA=--compute-checksum $@
+exec ./run_test.sh RUNNER=/tmp/cwltest/arv-cwl-containers EXTRA=--compute-checksum $@
 EOF

 CODE=$?

Instead of changing the test_with_arvbox.sh script, you can pass this in on the test_with_arvbox.sh command line in the Jenkins configuration:

test_with_arvbox.sh --config localdemo RUNNER=/tmp/cwltest/arv-cwl-containers

Rest of the changes LGTM.

#14 Updated by Ward Vandewege 14 days ago

Peter Amstutz wrote:

[...]

Instead of changing the test_with_arvbox.sh script, you can pass this in on the test_with_arvbox.sh command line in the Jenkins configuration:

[...]

Rest of the changes LGTM.

Well, sure, we could, but test_with_arvbox.sh with the jobs api is malfunctioning (very slow in general; especially so on the ci box). Should we really default to that!?

#15 Updated by Peter Amstutz 14 days ago

Ward Vandewege wrote:

Peter Amstutz wrote:

[...]

Instead of changing the test_with_arvbox.sh script, you can pass this in on the test_with_arvbox.sh command line in the Jenkins configuration:

[...]

Rest of the changes LGTM.

Well, sure, we could, but test_with_arvbox.sh with the jobs api is malfunctioning (very slow in general; especially so on the ci box). Should we really default to that!?

Guess not. On further thought, setting RUNNER= on the test_with_arvbox.sh command line should still override the default, so it doesn't break anything.

LGTM, please merge.

#16 Updated by Anonymous 14 days ago

  • Status changed from In Progress to Resolved
  • % Done changed from 0 to 100

Applied in changeset arvados|commit:3910be3219ace18462a128571bfac0b35446a392.

Also available in: Atom PDF