Project

General

Profile

Actions

Bug #8828

closed

[Crunch] be more resilient when crunchrunner is not available; also don't test for crunchrunner on api server

Added by Ward Vandewege about 8 years ago. Updated about 8 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
-
Target version:
Story points:
-

Description

I noticed two small issues introduced after the changes in 8815 - I should have spotted them in review, sorry. Specifically, we had the diagnostics fail on c97qk with:

2016-03-30_01:08:05 c97qk-8i9sb-etuzc7s3j88iym9 1470 0 stderr Running [docker.io run --name=c97qk-ot0gb-n67uuhx6ng9hzib-0 --attach=stdout --attach=stderr --attach=stdin -i --cidfile=/tmp/crunch-job/c97qk-ot0gb-n67uuhx6ng9hzib-0.cid --sig-proxy --memory=3346971k --memory-swap=3346971k --volume=/tmp/crunch-job/src:/tmp/crunch-job/src:ro --volume=/tmp/crunch-job/opt:/tmp/crunch-job/opt:ro --volume=/tmp/crunch-job/task/compute3.1.keep/by_pdh:/keep:ro --volume=/tmp/crunch-job/task/compute3.1.keep/tmp:/keep_tmp --volume=/tmp --volume=:/usr/local/bin/crunchrunner --volume=/etc/ssl/certs/ca-certificates.crt:/etc/arvados/ca-certificates.crt --env=TASK_KEEPMOUNT_TMP=/keep_tmp --env=CRUNCH_GIT_ARCHIVE_HASH=8e18a89ea517fde50d24eb17b884bc86 --env=CRUNCH_SRC=/tmp/crunch-job/src --env=JOB_UUID=c97qk-8i9sb-etuzc7s3j88iym9 --env=TASK_QSEQUENCE=0 --env=CRUNCH_REFRESH_TRIGGER=/tmp/crunch_refresh_trigger --env=ARVADOS_API_HOST=c97qk.arvadosapi.com --env=TASK_TMPDIR=/tmp/crunch-job-task-work/compute3.1 --env=JOB_WORK=/tmp/crunch-job-work --env=CRUNCH_TMP=/tmp/crunch-job --env=TASK_SLOT_NODE=compute3 --env=CRUNCH_SRC_URL=/var/lib/arvados/internal.git --env=JOB_SCRIPT=hash --env=CRUNCH_WORK=/tmp/crunch-job/work --env=CRUNCH_NODE_SLOTS=1 --env=TASK_SEQUENCE=0 --env=TASK_WORK=/tmp/crunch-job-task-work/compute3.1 --env=JOB_PARAMETER_INPUT=1724fc6b2145c148b894a8da81132ef8+53 --env=ARVADOS_API_TOKEN=42qrvz14riharlxo9qqdighalbu1022iuoyrlj859nbfx8bfyk --env=CRUNCH_JOB_BIN=/usr/local/arvados/src/services/crunch/crunch-job --env=TASK_UUID=c97qk-ot0gb-n67uuhx6ng9hzib --env=TASK_SLOT_NUMBER=1 --env=TASK_KEEPMOUNT=/keep --env=CRUNCH_JOB_UUID=c97qk-8i9sb-etuzc7s3j88iym9 --env=CRUNCH_SRC_COMMIT=4d4c3442e04310d7a88894c105a7cf351fd9f373 --env=CRUNCH_INSTALL=/tmp/crunch-job/opt --env=HOME=/tmp/crunch-job-task-work/compute3.1 f30fae7189adac0948eef3b3386e9ef254f69a8187f9eab99004e2d3650605cd /bin/sh -c python -c "from pkg_resources import get_distribution as get; print \"Using Arvados SDK version\", get(\"arvados-python-client\").version">&2 2>/dev/null; mkdir -p "/tmp/crunch-job-work" "/tmp/crunch-job-task-work/compute3.1" && if which stdbuf >/dev/null ; then   exec  stdbuf --output=0 --error=0  \/tmp\/crunch\-job\/src\/crunch_scripts\/hash ; else   exec \/tmp\/crunch\-job\/src\/crunch_scripts\/hash ; fi]
2016-03-30_01:08:05 c97qk-8i9sb-etuzc7s3j88iym9 1470 0 stderr invalid value ":/usr/local/bin/crunchrunner" for flag --volume: bad format for volumes: :/usr/local/bin/crunchrunner
2016-03-30_01:08:05 c97qk-8i9sb-etuzc7s3j88iym9 1470 0 stderr See 'docker.io run --help'.

Two issues here:

a) the 'which crunchrunner' lookup is apparently happening on the API server, not on the compute node: the compute node had crunchrunner installed, but the API server did not. I installed it there, and that fixed the problem. Clearly, the test should happen on the compute node.

b) when 'which crunchrunner' doesn't find the executable, we shouldn't try to append half a volume statement to the docker run command, that breaks the invocation (and thus fails the job).


Subtasks 1 (0 open1 closed)

Task #8861: Review 8828-which-crunchrunnerResolvedPeter Amstutz03/31/2016Actions

Related issues

Related to Arvados - Idea #8815: [Crunch] crunch-job bind mounts crunchrunner & host certs file at well known location inside containerResolvedPeter Amstutz03/29/2016Actions
Actions

Also available in: Atom PDF