Project

General

Profile

Actions

Idea #11458

closed

Jenkins step to run test_with_arvbox.sh

Added by Peter Amstutz about 7 years ago. Updated almost 7 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
-
Target version:
Start date:
06/21/2017
Due date:
Story points:
-

Description

Add a jenkins step to run the sdk/cwl/test_with_arvbox.sh script. This corresponds to the test step run on ci.commonwl.org and tests that "arvbox localdemo" works and that CWL conformance tests pass (without having deploy to a cloud cluster.)

Concurrent with build-packages-python-ruby. Shouldn't block anyone beyond.

Watch out for resource requirements a bit. Mostly RAM requirements.


Subtasks 1 (0 open1 closed)

Task #11575: ReviewResolvedWard Vandewege06/21/2017Actions
Actions #1

Updated by Peter Amstutz about 7 years ago

  • Description updated (diff)
Actions #2

Updated by Tom Morris about 7 years ago

  • Project changed from Arvados to 40
  • Target version set to 2017-04-26 sprint
Actions #3

Updated by Ward Vandewege about 7 years ago

  • Description updated (diff)
  • Assigned To set to Ward Vandewege
Actions #4

Updated by Ward Vandewege about 7 years ago

  • Description updated (diff)
Actions #5

Updated by Ward Vandewege about 7 years ago

I added the build step. Waiting for CI to run it.

Actions #6

Updated by Ward Vandewege about 7 years ago

Failures all around (looking at build #5 and newer at https://ci.curoverse.com/job/run-arvados-cwl-conformance-tests/).

Weirdly, the runs seem to fail immediately exactly two out of three runs. Daemon.log:

2017-04-18T19:54:15+00:00 time="2017-04-18T19:54:15.6435555Z" level=error msg="containerd: notify OOM events" error="cgroup path for memory not found" 
2017-04-18T19:54:15+00:00 time="2017-04-18T19:54:15.780151646Z" level=error msg="containerd: notify OOM events" error="cgroup path for memory not found" 
2017-04-18T19:54:16+00:00 time="2017-04-18T19:54:16.738829295Z" level=error msg="Error closing logger: invalid argument" 
2017-04-18T19:54:17+00:00 time="2017-04-18T19:54:17.160931883Z" level=error msg="Error setting up exec command in container cwltest: Container 62bbc7e4421cc50554ccacba70b8eb1cc0f89e51826f6c4dfcebad4d5555bb2f is not running" 
2017-04-18T19:54:17+00:00 time="2017-04-18T19:54:17.160984784Z" level=error msg="Handler for POST /v1.26/containers/cwltest/exec returned error: Container 62bbc7e4421cc50554ccacba70b8eb1cc0f89e51826f6c4dfcebad4d5555bb2f is not running" 
2017-04-18T19:54:17+00:00 time="2017-04-18T19:54:17.190574870Z" level=error msg="Error setting up exec command in container cwltest: Container 62bbc7e4421cc50554ccacba70b8eb1cc0f89e51826f6c4dfcebad4d5555bb2f is not running" 
2017-04-18T19:54:17+00:00 time="2017-04-18T19:54:17.190622395Z" level=error msg="Handler for POST /v1.26/containers/cwltest/exec returned error: Container 62bbc7e4421cc50554ccacba70b8eb1cc0f89e51826f6c4dfcebad4d5555bb2f is not running" 

The third run, daemon.log:

2017-04-18T19:56:56+00:00 time="2017-04-18T19:56:56.312489186Z" level=error msg="containerd: notify OOM events" error="cgroup path for memory not found" 

This one loops through all the tests (that is good at least!), but fails them with errors like this:

"HttpError 422 when requesting https://localhost:8000/arvados/v1/jobs?alt=json&find_or_create=true returned "error searching arvados from '570509ab4d2ef93d870fd2b1f2eab178afb1bad9' to 'master', excluding []" this looks like it is unable to find a crunch script commit, ummm

caused by:

Error response from daemon: Container c3a8069044aa7724eb1ada3fd5a5d52df675902a5a31424a6fc7b7dfbb981c02 is not running

Command line testing as the jenkins user, with:

HOME=/var/lib/jenkins/workspace/run-arvados-cwl-conformance-tests ARVBOX_BASE=/var/lib/jenkins/workspace/run-arvados-cwl-conformance-tests/.arvbox ARVADOS_ROOT=/var/lib/jenkins/workspace/run-arvados-cwl-conformance-tests /var/lib/jenkins/workspace/run-arvados-cwl-conformance-tests/sdk/cwl/test_with_arvbox.sh --tag fc2eaa20275146369e451da6cf14f4461c68117c
Actions #7

Updated by Nico César about 7 years ago

this happened to be a 10G limitation of the device mapper default ..

I changed /etc/systemd/system/docker.service.d/local.conf and added --storage-opt dm.basesize=30G to docker daemon

Actions #8

Updated by Tom Morris about 7 years ago

  • Status changed from New to Resolved
  • Target version changed from 2017-04-26 sprint to 2017-05-10 sprint
Actions #9

Updated by Tom Morris about 7 years ago

  • Status changed from Resolved to In Progress
Actions #10

Updated by Ward Vandewege almost 7 years ago

I cleaned up docker on Jenkins today. That made docker behave better, but most (80 out of 85 non-disabled) of the tests are failing now. For example, on run 49 at https://ci.curoverse.com/view/Arvados%20build%20pipeline/job/run-arvados-cwl-conformance-tests/49/console

...
Test [49/87] 
Test failed: /usr/local/bin/cwl-runner --outdir=/tmp/tmp71JP0l --quiet v1.0/metadata.cwl v1.0/cat-job.json
Test metadata
Returned non-zero
2017-05-10 01:52:31 arvados.cwl-runner WARNING: Overall process status is permanentFail
2017-05-10 01:52:31 cwltool WARNING: Final process status is permanentFail
...

I logged into the running docker container and re-ran one test with --debug:

root@da305c7c144b:/tmp/cwltest/common-workflow-language/v1.0# ARVADOS_API_HOST_INSECURE=1 ARVADOS_API_TOKEN=TEMPORARYTOKENREMOVEDFORGOODMEASURE ARVADOS_API_HOST=localhost:8000 /usr/local/bin/cwl-runner --outdir=/tmp/tmp71JP0l --debug v1.0/metadata.cwl v1.0/cat-job.json
2017-05-10 01:56:01 cwltool INFO: /usr/local/bin/cwl-runner 9bf790498c20c9f8ab9bffbf9c8e5311864bd91e 1.0.20170508160600, arvados-python-client 0.1.20170508160600, cwltool 1.0.20170413194156
2017-05-10 01:56:01 cwltool INFO: Resolved 'v1.0/metadata.cwl' to 'file:///tmp/cwltest/common-workflow-language/v1.0/v1.0/metadata.cwl'
2017-05-10 01:56:02 arvados.arv-run INFO: Upload local files: "dcterms.rdf" "foaf.rdf" 
2017-05-10 01:56:02 arvados.keep DEBUG: {u'3krnd-bi6l4-5koabb5k5mkm3kk': OrderedDict([(u'href', u'/keep_services/3krnd-bi6l4-5koabb5k5mkm3kk'), (u'kind', u'arvados#keepService'), (u'etag', u'dray2z2arezfvjd4lgxn7ovg3'), (u'uuid', u'3krnd-bi6l4-5koabb5k5mkm3kk'), (u'owner_uuid', u'3krnd-tpzed-000000000000000'), (u'created_at', u'2017-05-10T01:28:26.974849000Z'), (u'modified_by_client_uuid', u'3krnd-ozdt8-cucbpj0cnxad6ig'), (u'modified_by_user_uuid', u'3krnd-tpzed-000000000000000'), (u'modified_at', u'2017-05-10T01:28:26.974849000Z'), (u'service_host', u'172.17.0.2'), (u'service_port', 25108), (u'service_ssl_flag', False), (u'service_type', u'disk'), (u'read_only', False), ('_service_root', 'http://172.17.0.2:25108/')]), u'3krnd-bi6l4-ol44uiiwf1npa55': OrderedDict([(u'href', u'/keep_services/3krnd-bi6l4-ol44uiiwf1npa55'), (u'kind', u'arvados#keepService'), (u'etag', u'5k7vcnfawclmkrj9msvrq0oso'), (u'uuid', u'3krnd-bi6l4-ol44uiiwf1npa55'), (u'owner_uuid', u'3krnd-tpzed-000000000000000'), (u'created_at', u'2017-05-10T01:28:27.942494000Z'), (u'modified_by_client_uuid', u'3krnd-ozdt8-cucbpj0cnxad6ig'), (u'modified_by_user_uuid', u'3krnd-tpzed-000000000000000'), (u'modified_at', u'2017-05-10T01:28:27.942494000Z'), (u'service_host', u'172.17.0.2'), (u'service_port', 25107), (u'service_ssl_flag', False), (u'service_type', u'disk'), (u'read_only', False), ('_service_root', 'http://172.17.0.2:25107/')])}
2017-05-10 01:56:02 arvados.keep DEBUG: 2456f407b6ff0759723f8f1f9720b2c6+131380: ['http://172.17.0.2:25108/', 'http://172.17.0.2:25107/']
2017-05-10 01:56:02 arvados.keep DEBUG: Pool max threads is 1
2017-05-10 01:56:02 arvados.keep DEBUG: Request: PUT http://172.17.0.2:25108/2456f407b6ff0759723f8f1f9720b2c6
2017-05-10 01:56:02 arvados.keep INFO: PUT 200: 131380 bytes in 39.9429798126 msec (3.137 MiB/sec)
2017-05-10 01:56:02 arvados.keep DEBUG: KeepWriterThread <KeepWriterThread(Thread-1, started daemon 139783366399744)> succeeded 2456f407b6ff0759723f8f1f9720b2c6+131380 http://172.17.0.2:25108/
2017-05-10 01:56:02 arvados.arv-run INFO: Uploaded to 2817d643fe37ca5a3899ef3c4b59047d+83 (3krnd-4zz18-wnw2wji8huuaqb4)
2017-05-10 01:56:02 arvados.arv-run INFO: Upload local files: "hello.txt" 
2017-05-10 01:56:02 arvados.arv-run INFO: Using collection 6acd28dff09b88f5f550950fcb7203a3+53 (3krnd-4zz18-j24auaop34aumx6)
2017-05-10 01:56:02 arvados.cwl-runner.metrics DEBUG: ENTER jobiter 1494381362.65
2017-05-10 01:56:02 arvados.cwl-runner.metrics DEBUG: EXIT jobiter 1494381362.65 0.000149011611938
2017-05-10 01:56:02 arvados.cwl-runner.metrics DEBUG: ENTER run 1494381362.65
2017-05-10 01:56:02 arvados.keep DEBUG: {u'3krnd-bi6l4-5koabb5k5mkm3kk': OrderedDict([(u'href', u'/keep_services/3krnd-bi6l4-5koabb5k5mkm3kk'), (u'kind', u'arvados#keepService'), (u'etag', u'dray2z2arezfvjd4lgxn7ovg3'), (u'uuid', u'3krnd-bi6l4-5koabb5k5mkm3kk'), (u'owner_uuid', u'3krnd-tpzed-000000000000000'), (u'created_at', u'2017-05-10T01:28:26.974849000Z'), (u'modified_by_client_uuid', u'3krnd-ozdt8-cucbpj0cnxad6ig'), (u'modified_by_user_uuid', u'3krnd-tpzed-000000000000000'), (u'modified_at', u'2017-05-10T01:28:26.974849000Z'), (u'service_host', u'172.17.0.2'), (u'service_port', 25108), (u'service_ssl_flag', False), (u'service_type', u'disk'), (u'read_only', False), ('_service_root', 'http://172.17.0.2:25108/')]), u'3krnd-bi6l4-ol44uiiwf1npa55': OrderedDict([(u'href', u'/keep_services/3krnd-bi6l4-ol44uiiwf1npa55'), (u'kind', u'arvados#keepService'), (u'etag', u'5k7vcnfawclmkrj9msvrq0oso'), (u'uuid', u'3krnd-bi6l4-ol44uiiwf1npa55'), (u'owner_uuid', u'3krnd-tpzed-000000000000000'), (u'created_at', u'2017-05-10T01:28:27.942494000Z'), (u'modified_by_client_uuid', u'3krnd-ozdt8-cucbpj0cnxad6ig'), (u'modified_by_user_uuid', u'3krnd-tpzed-000000000000000'), (u'modified_at', u'2017-05-10T01:28:27.942494000Z'), (u'service_host', u'172.17.0.2'), (u'service_port', 25107), (u'service_ssl_flag', False), (u'service_type', u'disk'), (u'read_only', False), ('_service_root', 'http://172.17.0.2:25107/')])}
2017-05-10 01:56:02 arvados.keep DEBUG: bfea1a98a1205ac8c1ee736df5a74154+1177: ['http://172.17.0.2:25108/', 'http://172.17.0.2:25107/']
2017-05-10 01:56:02 arvados.keep DEBUG: Pool max threads is 1
2017-05-10 01:56:02 arvados.keep DEBUG: Request: PUT http://172.17.0.2:25108/bfea1a98a1205ac8c1ee736df5a74154
2017-05-10 01:56:02 arvados.keep INFO: PUT 200: 1177 bytes in 0.91814994812 msec (1.223 MiB/sec)
2017-05-10 01:56:02 arvados.keep DEBUG: KeepWriterThread <KeepWriterThread(Thread-3, started daemon 139783356172032)> succeeded bfea1a98a1205ac8c1ee736df5a74154+1177 http://172.17.0.2:25108/
2017-05-10 01:56:02 arvados.cwl-runner INFO: Using collection 3krnd-4zz18-ils99lp1dnxqob7
2017-05-10 01:56:06 arvados.cwl-runner INFO: Created pipeline 3krnd-d1hrv-kqfi0zqcqe8t7dw
2017-05-10 01:56:06 arvados.cwl-runner.metrics DEBUG: EXIT run 1494381366.67 4.02303695679
2017-05-10 01:56:06 arvados.cwl-runner.metrics DEBUG: ENTER jobiter 1494381366.67
2017-05-10 01:56:06 arvados.cwl-runner.metrics DEBUG: EXIT jobiter 1494381366.67 6.19888305664e-05
2017-05-10 01:56:17 arvados.cwl-runner INFO: [job metadata.cwl] 3krnd-8i9sb-njs3owbh8b8epqq is Failed
2017-05-10 01:56:17 arvados.cwl-runner.metrics DEBUG: ENTER done metadata.cwl 1494381377.69
2017-05-10 01:56:17 arvados.keep DEBUG: 587fa1d6ea668aec313685ef8d6f0086+15210+A3234d087bdfcb4b4c803bd40032c76de01f74705@5924e841: ['http://172.17.0.2:25108/', 'http://172.17.0.2:25107/']
2017-05-10 01:56:17 arvados.keep DEBUG: Request: GET http://172.17.0.2:25108/587fa1d6ea668aec313685ef8d6f0086+15210+A3234d087bdfcb4b4c803bd40032c76de01f74705@5924e841
2017-05-10 01:56:17 arvados.keep INFO: GET 200: 15210 bytes in 0.676870346069 msec (21.430 MiB/sec)
2017-05-10 01:56:17 arvados.cwl-runner INFO: [job metadata.cwl] error log:
2017-05-10 01:56:17 arvados.cwl-runner INFO: 
  2017-05-10_01:56:10 3krnd-8i9sb-njs3owbh8b8epqq 26430  check whether default user is UID 0: exit 0
  2017-05-10_01:56:10 3krnd-8i9sb-njs3owbh8b8epqq 26430  Container will run with default user
  2017-05-10_01:56:10 3krnd-8i9sb-njs3owbh8b8epqq 26430  Looking for version 8ee2f83955b1b71571aa428335340f505c3c2264 from repository arvados
  2017-05-10_01:56:10 3krnd-8i9sb-njs3owbh8b8epqq 26430  Using local repository '/var/lib/arvados/internal.git'
  2017-05-10_01:56:10 3krnd-8i9sb-njs3owbh8b8epqq 26430  Version 8ee2f83955b1b71571aa428335340f505c3c2264 is commit 8ee2f83955b1b71571aa428335340f505c3c2264
  2017-05-10_01:56:10 3krnd-8i9sb-njs3owbh8b8epqq 26430  run install script on all workers: start
  2017-05-10_01:56:11 3krnd-8i9sb-njs3owbh8b8epqq 26430  stderr starting: ['sh','-c','mkdir -p /tmp/crunch0/opt && cd /tmp/crunch0 && perl -']
  2017-05-10_01:56:11 3krnd-8i9sb-njs3owbh8b8epqq 26430  stderr Extracting archive: b0827538b6efe39168867e0b0d32db80
  2017-05-10_01:56:11 3krnd-8i9sb-njs3owbh8b8epqq 26430  run install script on all workers: exit 0
  2017-05-10_01:56:11 3krnd-8i9sb-njs3owbh8b8epqq 26430  script cwl-runner
  2017-05-10_01:56:11 3krnd-8i9sb-njs3owbh8b8epqq 26430  script_version 8ee2f83955b1b71571aa428335340f505c3c2264
  2017-05-10_01:56:11 3krnd-8i9sb-njs3owbh8b8epqq 26430  script_parameters {"cwl:tool":"105cc5a0ce5979e14055636c29b63594+60/workflow.cwl#main","arv:on_error":"continue","file1":{"class":"File","location":"keep:6acd28dff09b88f5f550950fcb7203a3+53/hello.txt","basename":"hello.txt"},"arv:enable_reuse":true}
  2017-05-10_01:56:11 3krnd-8i9sb-njs3owbh8b8epqq 26430  runtime_constraints {"max_tasks_per_node":0,"min_ram_mb_per_node":1024,"docker_image":"arvados/jobs:9bf790498c20c9f8ab9bffbf9c8e5311864bd91e"}
  2017-05-10_01:56:11 3krnd-8i9sb-njs3owbh8b8epqq 26430  start level 0 with 1 slots
  2017-05-10_01:56:11 3krnd-8i9sb-njs3owbh8b8epqq 26430  status: 0 done, 0 running, 1 todo
  2017-05-10_01:56:11 3krnd-8i9sb-njs3owbh8b8epqq 26430 0 job_task 3krnd-ot0gb-6e22x3km2eccgc7
  2017-05-10_01:56:11 3krnd-8i9sb-njs3owbh8b8epqq 26430 0 child 26574 started on localhost.1
  2017-05-10_01:56:11 3krnd-8i9sb-njs3owbh8b8epqq 26430 0 stderr starting: ['bash','-c','if [ -e \\/tmp\\/crunch0\\/task\\/localhost\\.1 ]; then rm -rf \\/tmp\\/crunch0\\/task\\/localhost\\.1; fi; mkdir -p \\/tmp\\/crunch0 \\/tmp\\/crunch0\\/work \\/tmp\\/crunch0\\/task\\/localhost\\.1 \\/tmp\\/crunch0\\/task\\/localhost\\.1\\.keep && cd \\/tmp\\/crunch0 && MEM=$(awk \'($1 == "MemTotal:"){print $2}\' </proc/meminfo) && SWAP=$(awk \'($1 == "SwapTotal:"){print $2}\' </proc/meminfo) && MEMLIMIT=$(( ($MEM * 95) / (1 * 100) )) && let SWAPLIMIT=$MEMLIMIT+$SWAP && declare -a VOLUMES=() && if which crunchrunner >/dev/null ; then VOLUMES+=("--volume=$(which crunchrunner):/usr/local/bin/crunchrunner:ro") ; fi && if test -f /etc/ssl/certs/ca-certificates.crt ; then VOLUMES+=("--volume=/etc/ssl/certs/ca-certificates.crt:/etc/arvados/ca-certificates.crt:ro") ; elif test -f /etc/pki/tls/certs/ca-bundle.crt ; then VOLUMES+=("--volume=/etc/pki/tls/certs/ca-bundle.crt:/etc/arvados/ca-certificates.crt:ro") ; fi && exec arv-mount --read-write --mount-by-pdh=by_pdh --mount-tmp=tmp --crunchstat-interval=10 --allow-other  \\/tmp\\/crunch0\\/task\\/localhost\\.1\\.keep --exec crunchstat -cgroup-root=\\/sys\\/fs\\/cgroup -cgroup-parent=docker -cgroup-cid=/tmp/crunch0/3krnd-ot0gb-6e22x3km2eccgc7-0.cid -poll=10000 docker run --net=host --name=3krnd-ot0gb-6e22x3km2eccgc7-0 --attach=stdout --attach=stderr --attach=stdin -i  --cidfile=/tmp/crunch0/3krnd-ot0gb-6e22x3km2eccgc7-0.cid --sig-proxy --memory=${MEMLIMIT}k --memory-swap=${SWAPLIMIT}k --volume=\\/tmp\\/crunch0\\/src\\:\\/tmp\\/crunch0\\/src\\:ro --volume=\\/tmp\\/crunch0\\/opt\\:\\/tmp\\/crunch0\\/opt\\:ro --volume=\\/tmp\\/crunch0\\/task\\/localhost\\.1\\.keep\\/by_pdh\\:\\/keep\\:ro --volume=\\/tmp\\/crunch0\\/task\\/localhost\\.1\\.keep\\/tmp\\:\\/keep_tmp --volume=/tmp "${VOLUMES[@]}" --env=CRUNCH_WORK\\=\\/tmp\\/crunch0\\/work --env=TASK_TMPDIR\\=\\/tmp\\/crunch\\-job\\-task\\-work\\/localhost\\.1 --env=CRUNCH_INSTALL\\=\\/tmp\\/crunch0\\/opt --env=CRUNCH_TMP\\=\\/tmp\\/crunch0 --env=TASK_SEQUENCE\\=0 --env=CRUNCH_SRC_URL\\=\\/var\\/lib\\/arvados\\/internal\\.git --env=TASK_SLOT_NODE\\=localhost --env=JOB_PARAMETER_ARV\\:ON_ERROR\\=continue --env=JOB_PARAMETER_FILE1\\=HASH\\(0x187fb00\\) --env=JOB_PARAMETER_CWL\\:TOOL\\=105cc5a0ce5979e14055636c29b63594\\+60\\/workflow\\.cwl\\#main --env=CRUNCH_GIT_ARCHIVE_HASH\\=b0827538b6efe39168867e0b0d32db80 --env=TASK_KEEPMOUNT_TMP\\=\\/keep_tmp --env=CRUNCH_SRC_COMMIT\\=8ee2f83955b1b71571aa428335340f505c3c2264 --env=CRUNCH_JOB_DOCKER_RUN_ARGS\\=\\-\\-net\\=host --env=CRUNCH_JOB_DOCKER_BIN\\=docker --env=TASK_UUID\\=3krnd\\-ot0gb\\-6e22x3km2eccgc7 --env=CRUNCH_NODE_SLOTS\\=1 --env=ARVADOS_API_HOST_INSECURE\\=1 --env=TASK_WORK\\=\\/tmp\\/crunch\\-job\\-task\\-work\\/localhost\\.1 --env=CRUNCH_JOB_UUID\\=3krnd\\-8i9sb\\-njs3owbh8b8epqq --env=CRUNCH_SRC\\=\\/tmp\\/crunch0\\/src --env=ARVADOS_API_TOKEN\\=[...] --env=CRUNCH_REFRESH_TRIGGER\\=\\/tmp\\/crunch_refresh_trigger --env=TASK_SLOT_NUMBER\\=1 --env=TASK_QSEQUENCE\\=0 --env=JOB_WORK\\=\\/tmp\\/crunch\\-job\\-work --env=TASK_KEEPMOUNT\\=\\/keep --env=JOB_PARAMETER_ARV\\:ENABLE_REUSE\\=true --env=JOB_SCRIPT\\=cwl\\-runner --env=ARVADOS_API_HOST\\=172\\.17\\.0\\.2\\:8000 --env=JOB_UUID\\=3krnd\\-8i9sb\\-njs3owbh8b8epqq --env=CRUNCH_JOB_BIN\\=\\/usr\\/src\\/arvados\\/sdk\\/cli\\/bin\\/crunch\\-job --env=HOME\\=\\/tmp\\/crunch\\-job\\-task\\-work\\/localhost\\.1 b3d4edfc216115ee0dc474945ada9afb8bbb98d8f72e5c975343535a17697e97 /bin/sh -c \'python -c "from pkg_resources import get_distribution as get; print \\"Using Arvados SDK version\\", get(\\"arvados-python-client\\").version">&2 2>/dev/null; mkdir -p "/tmp/crunch-job-work" "/tmp/crunch-job-task-work/localhost.1" && if which stdbuf >/dev/null ; then   exec  stdbuf --output=0 --error=0  \\/tmp\\/crunch0\\/src\\/crunch_scripts\\/cwl\\-runner ; else   exec \\/tmp\\/crunch0\\/src\\/crunch_scripts\\/cwl\\-runner ; fi\'']
  2017-05-10_01:56:11 3krnd-8i9sb-njs3owbh8b8epqq 26430  status: 0 done, 1 running, 0 todo
  2017-05-10_01:56:12 3krnd-8i9sb-njs3owbh8b8epqq 26430 0 stderr WARNING: Your kernel does not support memory limit capabilities. Limitation discarded.
  2017-05-10_01:56:12 3krnd-8i9sb-njs3owbh8b8epqq 26430 0 stderr Using Arvados SDK version 0.1.20170508160600
  2017-05-10_01:56:12 3krnd-8i9sb-njs3owbh8b8epqq 26430 0 stderr arvados.cwl-runner INFO: /tmp/crunch0/src/crunch_scripts/cwl-runner 9bf790498c20c9f8ab9bffbf9c8e5311864bd91e 1.0.20170508160600, arvados-python-client 0.1.20170508160600, cwltool 1.0.20170413194156
  2017-05-10_01:56:14 3krnd-8i9sb-njs3owbh8b8epqq 26430 0 stderr arvados.cwl-runner INFO: [job workflow.cwl#main] reused job 3krnd-8i9sb-akhdtjdyy0er3ga
  2017-05-10_01:56:14 3krnd-8i9sb-njs3owbh8b8epqq 26430 0 stderr arvados.cwl-runner ERROR: Got unknown exception while collecting output for job workflow.cwl#main:
  2017-05-10_01:56:14 3krnd-8i9sb-njs3owbh8b8epqq 26430 0 stderr Traceback (most recent call last):
  2017-05-10_01:56:14 3krnd-8i9sb-njs3owbh8b8epqq 26430 0 stderr   File "/usr/lib/python2.7/dist-packages/arvados_cwl/arvjob.py", line 196, in done
  2017-05-10_01:56:14 3krnd-8i9sb-njs3owbh8b8epqq 26430 0 stderr     log = logc.open(logc.keys()[0])
  2017-05-10_01:56:14 3krnd-8i9sb-njs3owbh8b8epqq 26430 0 stderr TypeError: 'dict_keys' object does not support indexing
  2017-05-10_01:56:14 3krnd-8i9sb-njs3owbh8b8epqq 26430 0 stderr arvados.cwl-runner WARNING: Overall process status is permanentFail
  2017-05-10_01:56:14 3krnd-8i9sb-njs3owbh8b8epqq 26430 0 stderr arvados.cwl-runner INFO: Final output collection 282fbb8a925551343e98e762fa1ec1be+57 "Output of main (2017-05-10T01:56:14.490Z)" (3krnd-4zz18-q6zi52bo3552mkq)
  2017-05-10_01:56:14 3krnd-8i9sb-njs3owbh8b8epqq 26430 0 child 26574 on localhost.1 exit 0 success=false
  2017-05-10_01:56:14 3krnd-8i9sb-njs3owbh8b8epqq 26430 0 failure (#1, permanent) after 3 seconds
  2017-05-10_01:56:15 3krnd-8i9sb-njs3owbh8b8epqq 26430 0 task output (35 bytes): 282fbb8a925551343e98e762fa1ec1be+57
  2017-05-10_01:56:15 3krnd-8i9sb-njs3owbh8b8epqq 26430  status: 0 done, 0 running, 1 todo
  2017-05-10_01:56:15 3krnd-8i9sb-njs3owbh8b8epqq 26430  wait for last 0 children to finish
  2017-05-10_01:56:15 3krnd-8i9sb-njs3owbh8b8epqq 26430  Freeze not implemented
  2017-05-10_01:56:15 3krnd-8i9sb-njs3owbh8b8epqq 26430  collate
  2017-05-10_01:56:15 3krnd-8i9sb-njs3owbh8b8epqq 26430  collated output manifest text to send to API server is 108 bytes with access tokens
  2017-05-10_01:56:15 3krnd-8i9sb-njs3owbh8b8epqq 26430  job output 282fbb8a925551343e98e762fa1ec1be+57
  2017-05-10_01:56:15 3krnd-8i9sb-njs3owbh8b8epqq 26430  finish
2017-05-10 01:56:17 arvados.keep DEBUG: 99914b932bd37a50b983c5e7c90ae93b+2+A8dacaf148d74af352e16a494270d2fc9897da3f2@5924e841: ['http://172.17.0.2:25108/', 'http://172.17.0.2:25107/']
2017-05-10 01:56:17 arvados.keep DEBUG: Request: GET http://172.17.0.2:25108/99914b932bd37a50b983c5e7c90ae93b+2+A8dacaf148d74af352e16a494270d2fc9897da3f2@5924e841
2017-05-10 01:56:17 arvados.keep INFO: GET 200: 2 bytes in 0.586032867432 msec (0.003 MiB/sec)
2017-05-10 01:56:17 arvados.cwl-runner WARNING: Overall process status is permanentFail
2017-05-10 01:56:17 arvados.cwl-runner.metrics DEBUG: EXIT done metadata.cwl 1494381377.76 0.0739569664001
2017-05-10 01:56:17 arvados.cwl-runner INFO: Final output collection 282fbb8a925551343e98e762fa1ec1be+57
{}
2017-05-10 01:56:17 cwltool WARNING: Final process status is permanentFail

From the above, this appears to be the issue

....
  2017-05-10_01:56:12 3krnd-8i9sb-njs3owbh8b8epqq 26430 0 stderr arvados.cwl-runner INFO: /tmp/crunch0/src/crunch_scripts/cwl-runner 9bf790498c20c9f8ab9bffbf9c8e5311864bd91e 1.0.20170508160600, arvados-python-client 0.1.20170508160600, cwltool 1.0.20170413194156
  2017-05-10_01:56:14 3krnd-8i9sb-njs3owbh8b8epqq 26430 0 stderr arvados.cwl-runner INFO: [job workflow.cwl#main] reused job 3krnd-8i9sb-akhdtjdyy0er3ga
  2017-05-10_01:56:14 3krnd-8i9sb-njs3owbh8b8epqq 26430 0 stderr arvados.cwl-runner ERROR: Got unknown exception while collecting output for job workflow.cwl#main:
  2017-05-10_01:56:14 3krnd-8i9sb-njs3owbh8b8epqq 26430 0 stderr Traceback (most recent call last):
  2017-05-10_01:56:14 3krnd-8i9sb-njs3owbh8b8epqq 26430 0 stderr   File "/usr/lib/python2.7/dist-packages/arvados_cwl/arvjob.py", line 196, in done
  2017-05-10_01:56:14 3krnd-8i9sb-njs3owbh8b8epqq 26430 0 stderr     log = logc.open(logc.keys()[0])
  2017-05-10_01:56:14 3krnd-8i9sb-njs3owbh8b8epqq 26430 0 stderr TypeError: 'dict_keys' object does not support indexing
  2017-05-10_01:56:14 3krnd-8i9sb-njs3owbh8b8epqq 26430 0 stderr arvados.cwl-runner WARNING: Overall process status is permanentFail
....

Hmmm, that looks like legit breakage due to the Python3 support merge. See issue #11308.

In other words, it's possible that this story is now actually done!

Actions #11

Updated by Tom Clegg almost 7 years ago

The dict_keys thing was addressed in #11308#note-18 (notes 18 to 21), merged in 4bc2f36b8e475b0dc1145fa27e665cdd8c85f809. But then I merged an apiserver branch in #11629 that workbench units reveal is broken. Working on that now...

https://ci.curoverse.com/job/run-tests-apps-workbench-units/888/console

Actions #12

Updated by Tom Morris almost 7 years ago

  • Target version changed from 2017-05-10 sprint to 2017-05-24 sprint
Actions #13

Updated by Ward Vandewege almost 7 years ago

  • Target version changed from 2017-05-24 sprint to 2017-06-07 sprint
Actions #14

Updated by Tom Morris almost 7 years ago

  • Project changed from 40 to Arvados
  • Assigned To changed from Ward Vandewege to Peter Amstutz
  • Target version changed from 2017-06-07 sprint to 2017-06-21 sprint
Actions #15

Updated by Nico César almost 7 years ago

nico@ci:~$ docker run -ti --rm -v /data/1/jenkins/workspace/build-packages-centos7/build:/jenkins -v /var/lib/jenkins/workspace/build-packages-centos7:/arvados -v /arvados/services/api/vendor/bundle -v /arvados/apps/workbench/vendor/bundle -v /var/lib/jenkins/workspace/build-packages-centos7/services/api/vendor/cache-centos7:/arvados/services/api/vendor/cache -v /var/lib/jenkins/workspace/build-packages-centos7/apps/workbench/vendor/cache-centos7:/arvados/apps/workbench/vendor/cache --env ARVADOS_DEBUG= --env TARGET=centos7 --env WORKSPACE=/arvados 47cbfd26fc37
[root@be74fe696267 /]# ls /usr/lib64/python2.7/site-packages -l
total 2140
-rw-r--r-- 1 root root    119 Nov  6  2016 README
-rwxr-xr-x 1 root root 166248 Jun 10  2014 _dbus_bindings.so
-rwxr-xr-x 1 root root  11976 Jun 10  2014 _dbus_glib_bindings.so
-rwxr-xr-x 1 root root  50208 Jun 10  2014 _sqlitecache.so
drwxr-xr-x 2 root root     62 Jun  5 03:08 curl
drwxr-xr-x 3 root root   4096 Jun  5 03:08 dbus
-rw-r--r-- 1 root root  15061 Jun 23  2016 drv_libxml2.py
-rw-r--r-- 2 root root  11202 Jun 23  2016 drv_libxml2.pyc
-rw-r--r-- 2 root root  11202 Jun 23  2016 drv_libxml2.pyo
drwxr-xr-x 5 root root   4096 Jun  5 03:08 gi
drwxr-xr-x 2 root root    134 Jun  5 03:08 gpgme
-rw-r--r-- 1 root root    164 Jun 10  2014 liblzma.py
-rw-r--r-- 2 root root    361 Jun 10  2014 liblzma.pyc
-rw-r--r-- 2 root root    361 Jun 10  2014 liblzma.pyo
-rw-r--r-- 1 root root 345539 Jun 23  2016 libxml2.py
-rw-r--r-- 2 root root 449468 Jun 23  2016 libxml2.pyc
-rw-r--r-- 2 root root 449468 Jun 23  2016 libxml2.pyo
-rwxr-xr-x 1 root root 427912 Jun 23  2016 libxml2mod.so
-rwxr-xr-x 1 root root  67888 Jun 10  2014 lzma.so
-rw-r--r-- 1 root root    366 Nov  5  2016 pycurl-7.19.0-py2.7.egg-info
-rwxr-xr-x 1 root root  70744 Nov  5  2016 pycurl.so
-rw-r--r-- 1 root root   1013 Nov 20  2015 pygobject-3.14.0-py2.7-linux-x86_64.egg-info
-rw-r--r-- 1 root root    889 Jun  9  2014 pygpgme-0.3-py2.7.egg-info
drwxr-xr-x 2 root root   4096 Jun  5 03:08 pygtkcompat
drwxr-xr-x 2 root root     86 Jun  5 03:08 pyliblzma-0.5.3-py2.7.egg-info
drwxr-xr-x 2 root root     86 Jun  5 03:08 pyxattr-0.5.1-py2.7.egg-info
drwxr-xr-x 2 root root   4096 Jun  5 03:08 rpm
-rw-r--r-- 1 root root   2510 Aug 18  2009 sqlitecachec.py
-rw-r--r-- 2 root root   2170 Jun 10  2014 sqlitecachec.pyc
-rw-r--r-- 2 root root   2170 Jun 10  2014 sqlitecachec.pyo
-rw-r--r-- 1 root root  32520 Jun  9  2014 xattr.so
-rw-r--r-- 1 root root    213 Jun 10  2014 yum_metadata_parser-1.1.4-py2.7.egg-info
[root@be74fe696267 /]# yum install python-pycurl
Loaded plugins: fastestmirror, ovl
base                                                                                                                                                 | 3.6 kB  00:00:00
centos-sclo-rh                                                                                                                                       | 2.9 kB  00:00:00
centos-sclo-sclo                                                                                                                                     | 2.9 kB  00:00:00
extras                                                                                                                                               | 3.4 kB  00:00:00
localrepo                                                                                                                                            | 2.9 kB  00:00:00
updates                                                                                                                                              | 3.4 kB  00:00:00
localrepo/primary_db                                                                                                                                 | 501 kB  00:00:00
Loading mirror speeds from cached hostfile
 * base: ftpmirror.your.org
 * extras: mirrors.sonic.net
 * updates: mirror.steadfast.net
Resolving Dependencies
--> Running transaction check
---> Package python-pycurl.x86_64 0:7.19.0-19.el7 will be updated
---> Package python-pycurl.x86_64 0:7.19.5.3-3 will be an update
--> Finished Dependency Resolution

Dependencies Resolved

============================================================================================================================================================================
 Package                                     Arch                                 Version                                     Repository                               Size
============================================================================================================================================================================
Updating:
 python-pycurl                               x86_64                               7.19.5.3-3                                  localrepo                               165 k

Transaction Summary
============================================================================================================================================================================
Upgrade  1 Package

Total download size: 165 k
Is this ok [y/d/N]: y
Downloading packages:
Running transaction check
Running transaction test
Transaction test succeeded
Running transaction
  Updating   : python-pycurl-7.19.5.3-3.x86_64                                                                                                                          1/2
  Cleanup    : python-pycurl-7.19.0-19.el7.x86_64                                                                                                                       2/2
  Verifying  : python-pycurl-7.19.5.3-3.x86_64                                                                                                                          1/2
  Verifying  : python-pycurl-7.19.0-19.el7.x86_64                                                                                                                       2/2

Updated:
  python-pycurl.x86_64 0:7.19.5.3-3

Complete!
[root@be74fe696267 /]# ls /usr/lib64/python2.7/site-packages -l
ls: cannot access /usr/lib64/python2.7/site-packages/pycurl-7.19.0-py2.7.egg-info: No such file or directory
ls: cannot access /usr/lib64/python2.7/site-packages/pycurl.so: No such file or directory
total 2064
-rw-r--r-- 1 root root    119 Nov  6  2016 README
-rwxr-xr-x 1 root root 166248 Jun 10  2014 _dbus_bindings.so
-rwxr-xr-x 1 root root  11976 Jun 10  2014 _dbus_glib_bindings.so
-rwxr-xr-x 1 root root  50208 Jun 10  2014 _sqlitecache.so
drwxr-xr-x 1 root root     62 Jun 12 14:40 curl
drwxr-xr-x 3 root root   4096 Jun  5 03:08 dbus
-rw-r--r-- 1 root root  15061 Jun 23  2016 drv_libxml2.py
-rw-r--r-- 2 root root  11202 Jun 23  2016 drv_libxml2.pyc
-rw-r--r-- 2 root root  11202 Jun 23  2016 drv_libxml2.pyo
drwxr-xr-x 5 root root   4096 Jun  5 03:08 gi
drwxr-xr-x 2 root root    134 Jun  5 03:08 gpgme
-rw-r--r-- 1 root root    164 Jun 10  2014 liblzma.py
-rw-r--r-- 2 root root    361 Jun 10  2014 liblzma.pyc
-rw-r--r-- 2 root root    361 Jun 10  2014 liblzma.pyo
-rw-r--r-- 1 root root 345539 Jun 23  2016 libxml2.py
-rw-r--r-- 2 root root 449468 Jun 23  2016 libxml2.pyc
-rw-r--r-- 2 root root 449468 Jun 23  2016 libxml2.pyo
-rwxr-xr-x 1 root root 427912 Jun 23  2016 libxml2mod.so
-rwxr-xr-x 1 root root  67888 Jun 10  2014 lzma.so
?????????? ? ?    ?         ?            ? pycurl-7.19.0-py2.7.egg-info
?????????? ? ?    ?         ?            ? pycurl.so
-rw-r--r-- 1 root root   1013 Nov 20  2015 pygobject-3.14.0-py2.7-linux-x86_64.egg-info
-rw-r--r-- 1 root root    889 Jun  9  2014 pygpgme-0.3-py2.7.egg-info
drwxr-xr-x 2 root root   4096 Jun  5 03:08 pygtkcompat
drwxr-xr-x 2 root root     86 Jun  5 03:08 pyliblzma-0.5.3-py2.7.egg-info
drwxr-xr-x 2 root root     86 Jun  5 03:08 pyxattr-0.5.1-py2.7.egg-info
drwxr-xr-x 2 root root   4096 Jun  5 03:08 rpm
-rw-r--r-- 1 root root   2510 Aug 18  2009 sqlitecachec.py
-rw-r--r-- 2 root root   2170 Jun 10  2014 sqlitecachec.pyc
-rw-r--r-- 2 root root   2170 Jun 10  2014 sqlitecachec.pyo
-rw-r--r-- 1 root root  32520 Jun  9  2014 xattr.so
-rw-r--r-- 1 root root    213 Jun 10  2014 yum_metadata_parser-1.1.4-py2.7.egg-info

Actions #16

Updated by Nico César almost 7 years ago

same with this:

[root@38c1c0e4d57e /]# ls /usr/lib64/python2.7/site-packages/curl -l
ls: cannot access /usr/lib64/python2.7/site-packages/curl/__init__.py: No such file or directory
ls: cannot access /usr/lib64/python2.7/site-packages/curl/__init__.pyc: No such file or directory
ls: cannot access /usr/lib64/python2.7/site-packages/curl/__init__.pyo: No such file or directory
total 0
?????????? ? ? ? ?            ? __init__.py
?????????? ? ? ? ?            ? __init__.pyc
?????????? ? ? ? ?            ? __init__.pyo

the package that was installed was https://www.rpmfind.net/linux/RPM/centos/7.3.1611/x86_64/Packages/python-pycurl-7.19.0-19.el7.x86_64.html

I also tested

yumdownloader python-pycurl ; rpm -U python-pycurl-7.19.5.3-3.x86_64.rpm
file $( which rpm )
/usr/bin/rpm: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.32, BuildID[sha1]=c2bdddc22ab85cbd7e8ed1dd33afbd03231ce6dc, stripped

which uses rpm (ELF binary not a python script like YUM) with the same results

Actions #17

Updated by Nico César almost 7 years ago

from the latest bug, I confirmed in a new debian8 VM running docker 17.05.0-ce with AUFS in 3.16 kernel DOESN'T have the bug. so this tells us that the overlay2 driver is the issue or the kernel version.

now I'm installing the kernel from bpo

Actions #18

Updated by Peter Amstutz almost 7 years ago

Should be all fixed, now.

Actions #19

Updated by Peter Amstutz almost 7 years ago

  • Status changed from In Progress to Resolved
Actions

Also available in: Atom PDF