Project

General

Profile

Actions

Bug #8828

closed

[Crunch] be more resilient when crunchrunner is not available; also don't test for crunchrunner on api server

Added by Ward Vandewege about 8 years ago. Updated about 8 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
-
Target version:
Story points:
-

Description

I noticed two small issues introduced after the changes in 8815 - I should have spotted them in review, sorry. Specifically, we had the diagnostics fail on c97qk with:

2016-03-30_01:08:05 c97qk-8i9sb-etuzc7s3j88iym9 1470 0 stderr Running [docker.io run --name=c97qk-ot0gb-n67uuhx6ng9hzib-0 --attach=stdout --attach=stderr --attach=stdin -i --cidfile=/tmp/crunch-job/c97qk-ot0gb-n67uuhx6ng9hzib-0.cid --sig-proxy --memory=3346971k --memory-swap=3346971k --volume=/tmp/crunch-job/src:/tmp/crunch-job/src:ro --volume=/tmp/crunch-job/opt:/tmp/crunch-job/opt:ro --volume=/tmp/crunch-job/task/compute3.1.keep/by_pdh:/keep:ro --volume=/tmp/crunch-job/task/compute3.1.keep/tmp:/keep_tmp --volume=/tmp --volume=:/usr/local/bin/crunchrunner --volume=/etc/ssl/certs/ca-certificates.crt:/etc/arvados/ca-certificates.crt --env=TASK_KEEPMOUNT_TMP=/keep_tmp --env=CRUNCH_GIT_ARCHIVE_HASH=8e18a89ea517fde50d24eb17b884bc86 --env=CRUNCH_SRC=/tmp/crunch-job/src --env=JOB_UUID=c97qk-8i9sb-etuzc7s3j88iym9 --env=TASK_QSEQUENCE=0 --env=CRUNCH_REFRESH_TRIGGER=/tmp/crunch_refresh_trigger --env=ARVADOS_API_HOST=c97qk.arvadosapi.com --env=TASK_TMPDIR=/tmp/crunch-job-task-work/compute3.1 --env=JOB_WORK=/tmp/crunch-job-work --env=CRUNCH_TMP=/tmp/crunch-job --env=TASK_SLOT_NODE=compute3 --env=CRUNCH_SRC_URL=/var/lib/arvados/internal.git --env=JOB_SCRIPT=hash --env=CRUNCH_WORK=/tmp/crunch-job/work --env=CRUNCH_NODE_SLOTS=1 --env=TASK_SEQUENCE=0 --env=TASK_WORK=/tmp/crunch-job-task-work/compute3.1 --env=JOB_PARAMETER_INPUT=1724fc6b2145c148b894a8da81132ef8+53 --env=ARVADOS_API_TOKEN=42qrvz14riharlxo9qqdighalbu1022iuoyrlj859nbfx8bfyk --env=CRUNCH_JOB_BIN=/usr/local/arvados/src/services/crunch/crunch-job --env=TASK_UUID=c97qk-ot0gb-n67uuhx6ng9hzib --env=TASK_SLOT_NUMBER=1 --env=TASK_KEEPMOUNT=/keep --env=CRUNCH_JOB_UUID=c97qk-8i9sb-etuzc7s3j88iym9 --env=CRUNCH_SRC_COMMIT=4d4c3442e04310d7a88894c105a7cf351fd9f373 --env=CRUNCH_INSTALL=/tmp/crunch-job/opt --env=HOME=/tmp/crunch-job-task-work/compute3.1 f30fae7189adac0948eef3b3386e9ef254f69a8187f9eab99004e2d3650605cd /bin/sh -c python -c "from pkg_resources import get_distribution as get; print \"Using Arvados SDK version\", get(\"arvados-python-client\").version">&2 2>/dev/null; mkdir -p "/tmp/crunch-job-work" "/tmp/crunch-job-task-work/compute3.1" && if which stdbuf >/dev/null ; then   exec  stdbuf --output=0 --error=0  \/tmp\/crunch\-job\/src\/crunch_scripts\/hash ; else   exec \/tmp\/crunch\-job\/src\/crunch_scripts\/hash ; fi]
2016-03-30_01:08:05 c97qk-8i9sb-etuzc7s3j88iym9 1470 0 stderr invalid value ":/usr/local/bin/crunchrunner" for flag --volume: bad format for volumes: :/usr/local/bin/crunchrunner
2016-03-30_01:08:05 c97qk-8i9sb-etuzc7s3j88iym9 1470 0 stderr See 'docker.io run --help'.

Two issues here:

a) the 'which crunchrunner' lookup is apparently happening on the API server, not on the compute node: the compute node had crunchrunner installed, but the API server did not. I installed it there, and that fixed the problem. Clearly, the test should happen on the compute node.

b) when 'which crunchrunner' doesn't find the executable, we shouldn't try to append half a volume statement to the docker run command, that breaks the invocation (and thus fails the job).


Subtasks 1 (0 open1 closed)

Task #8861: Review 8828-which-crunchrunnerResolvedPeter Amstutz03/31/2016Actions

Related issues

Related to Arvados - Idea #8815: [Crunch] crunch-job bind mounts crunchrunner & host certs file at well known location inside containerResolvedPeter Amstutz03/29/2016Actions
Actions #1

Updated by Ward Vandewege about 8 years ago

  • Description updated (diff)
Actions #2

Updated by Peter Amstutz about 8 years ago

Pushed branch 8828-which-crunchrunner

Actions #3

Updated by Brett Smith about 8 years ago

  • Status changed from New to In Progress
  • Assigned To set to Peter Amstutz
  • Target version set to 2016-04-13 sprint
Actions #4

Updated by Brett Smith about 8 years ago

Both the VOLUME_CERTS declarations need to specify that the certs are being mounted at /etc/arvados/ca-certificates.crt. Otherwise this will break the behavior specified in #8815.

With that fix, this is good to merge, thanks.

(I wonder when we're going to hit limit on the maximum size of a single command line...)

Actions #5

Updated by Peter Amstutz about 8 years ago

Brett Smith wrote:

Both the VOLUME_CERTS declarations need to specify that the certs are being mounted at /etc/arvados/ca-certificates.crt. Otherwise this will break the behavior specified in #8815.

With that fix, this is good to merge, thanks.

Whoops, thanks for catching that, that's what I get for rushing. Fixed, tested with arvbox, merged, pushd.

(I wonder when we're going to hit limit on the maximum size of a single command line...)

Actions #6

Updated by Peter Amstutz about 8 years ago

  • Status changed from In Progress to Resolved
  • % Done changed from 0 to 100

Applied in changeset arvados|commit:49743c080265b270693154d7a327d0433b0a7dbe.

Actions

Also available in: Atom PDF