Actions
Bug #8828
closed[Crunch] be more resilient when crunchrunner is not available; also don't test for crunchrunner on api server
Story points:
-
Description
I noticed two small issues introduced after the changes in 8815 - I should have spotted them in review, sorry. Specifically, we had the diagnostics fail on c97qk with:
2016-03-30_01:08:05 c97qk-8i9sb-etuzc7s3j88iym9 1470 0 stderr Running [docker.io run --name=c97qk-ot0gb-n67uuhx6ng9hzib-0 --attach=stdout --attach=stderr --attach=stdin -i --cidfile=/tmp/crunch-job/c97qk-ot0gb-n67uuhx6ng9hzib-0.cid --sig-proxy --memory=3346971k --memory-swap=3346971k --volume=/tmp/crunch-job/src:/tmp/crunch-job/src:ro --volume=/tmp/crunch-job/opt:/tmp/crunch-job/opt:ro --volume=/tmp/crunch-job/task/compute3.1.keep/by_pdh:/keep:ro --volume=/tmp/crunch-job/task/compute3.1.keep/tmp:/keep_tmp --volume=/tmp --volume=:/usr/local/bin/crunchrunner --volume=/etc/ssl/certs/ca-certificates.crt:/etc/arvados/ca-certificates.crt --env=TASK_KEEPMOUNT_TMP=/keep_tmp --env=CRUNCH_GIT_ARCHIVE_HASH=8e18a89ea517fde50d24eb17b884bc86 --env=CRUNCH_SRC=/tmp/crunch-job/src --env=JOB_UUID=c97qk-8i9sb-etuzc7s3j88iym9 --env=TASK_QSEQUENCE=0 --env=CRUNCH_REFRESH_TRIGGER=/tmp/crunch_refresh_trigger --env=ARVADOS_API_HOST=c97qk.arvadosapi.com --env=TASK_TMPDIR=/tmp/crunch-job-task-work/compute3.1 --env=JOB_WORK=/tmp/crunch-job-work --env=CRUNCH_TMP=/tmp/crunch-job --env=TASK_SLOT_NODE=compute3 --env=CRUNCH_SRC_URL=/var/lib/arvados/internal.git --env=JOB_SCRIPT=hash --env=CRUNCH_WORK=/tmp/crunch-job/work --env=CRUNCH_NODE_SLOTS=1 --env=TASK_SEQUENCE=0 --env=TASK_WORK=/tmp/crunch-job-task-work/compute3.1 --env=JOB_PARAMETER_INPUT=1724fc6b2145c148b894a8da81132ef8+53 --env=ARVADOS_API_TOKEN=42qrvz14riharlxo9qqdighalbu1022iuoyrlj859nbfx8bfyk --env=CRUNCH_JOB_BIN=/usr/local/arvados/src/services/crunch/crunch-job --env=TASK_UUID=c97qk-ot0gb-n67uuhx6ng9hzib --env=TASK_SLOT_NUMBER=1 --env=TASK_KEEPMOUNT=/keep --env=CRUNCH_JOB_UUID=c97qk-8i9sb-etuzc7s3j88iym9 --env=CRUNCH_SRC_COMMIT=4d4c3442e04310d7a88894c105a7cf351fd9f373 --env=CRUNCH_INSTALL=/tmp/crunch-job/opt --env=HOME=/tmp/crunch-job-task-work/compute3.1 f30fae7189adac0948eef3b3386e9ef254f69a8187f9eab99004e2d3650605cd /bin/sh -c python -c "from pkg_resources import get_distribution as get; print \"Using Arvados SDK version\", get(\"arvados-python-client\").version">&2 2>/dev/null; mkdir -p "/tmp/crunch-job-work" "/tmp/crunch-job-task-work/compute3.1" && if which stdbuf >/dev/null ; then exec stdbuf --output=0 --error=0 \/tmp\/crunch\-job\/src\/crunch_scripts\/hash ; else exec \/tmp\/crunch\-job\/src\/crunch_scripts\/hash ; fi] 2016-03-30_01:08:05 c97qk-8i9sb-etuzc7s3j88iym9 1470 0 stderr invalid value ":/usr/local/bin/crunchrunner" for flag --volume: bad format for volumes: :/usr/local/bin/crunchrunner 2016-03-30_01:08:05 c97qk-8i9sb-etuzc7s3j88iym9 1470 0 stderr See 'docker.io run --help'.
Two issues here:
a) the 'which crunchrunner' lookup is apparently happening on the API server, not on the compute node: the compute node had crunchrunner installed, but the API server did not. I installed it there, and that fixed the problem. Clearly, the test should happen on the compute node.
b) when 'which crunchrunner' doesn't find the executable, we shouldn't try to append half a volume statement to the docker run command, that breaks the invocation (and thus fails the job).
Actions