Project

General

Profile

Actions

Bug #12298

closed

[Crunch2] Invalid container output_path causes infinite loop of futile dispatch attempts

Added by Tom Clegg over 6 years ago. Updated over 6 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
Crunch
Target version:
Story points:
-

Description

Submitting a container request with no mounts and {"output_path":"/out"} results in the container being attempted repeatedly with the same failure:

2017-09-20T20:12:56.920079Z Container 9tee4-dz642-6545kgubg82ssq7 was taken from the queue by a dispatch process
2017-09-20T20:12:59.919298925Z Executing container '9tee4-dz642-6545kgubg82ssq7'
2017-09-20T20:12:59.919400984Z Executing on host 'compute0.9tee4.arvadosapi.com'
2017-09-20T20:12:59.978462723Z Fetching Docker image from collection '9e0a4880d0cde36f8dd691345399a1bf+335'
2017-09-20T20:13:00.072853031Z Using Docker image id 'dada2262dd3bc92f615fea9503116516481ef546c5bcf2014901e686d8049b0b'
2017-09-20T20:13:00.076115858Z Docker image is available
2017-09-20T20:13:00.076335565Z While setting up mounts: Output path does not correspond to a writable mount point
2017-09-20T20:13:00.076354805Z Cancelled
2017-09-20T20:13:18.922196414Z arvados API server error: Log cannot be modified in this state (nil, "4fbfdb5c0f48fe803e8ca641e7477e52+60") (422: 422 Unprocessable Entity) returned by 9tee4.arvadosapi.com
2017-09-20T20:13:19.233763Z Container 9tee4-dz642-6545kgubg82ssq7 was returned to the queue
...

Container record:

{
  "uuid": "9tee4-dz642-6545kgubg82ssq7",
  "owner_uuid": "9tee4-tpzed-000000000000000",
  "created_at": "2017-09-20 20:02:51 UTC",
  "modified_at": "2017-09-20 20:15:19 UTC",
  "modified_by_client_uuid": "9tee4-ozdt8-wt0x6s6j9yhycfh",
  "modified_by_user_uuid": "9tee4-tpzed-000000000000000",
  "state": "Queued",
  "started_at": null,
  "finished_at": null,
  "log": null,
  "environment": {
  },
  "cwd": ".",
  "command": [
    "foobar" 
  ],
  "output_path": "/out",
  "mounts": {
  },
  "runtime_constraints": {
    "keep_cache_ram": 268435456,
    "ram": 1000000,
    "vcpus": 1
  },
  "output": null,
  "container_image": "9e0a4880d0cde36f8dd691345399a1bf+335",
  "progress": null,
  "priority": 1,
  "updated_at": null,
  "exit_code": null,
  "auth_uuid": null,
  "locked_by_uuid": null,
  "scheduling_parameters": {
  }
}

Proposed fix

Ideally, add a container request validation so this mistake prevents the non-runnable container from being created in the first place.

Either way, fix the "Log cannot be modified in this state (nil, "7951799e5e3e3c02fee1567e718044e7+60")" error so crunch-run can cancel the container instead of retrying it ad nauseum.

Perhaps in source:services/api/app/models/container.rb

@@ -389,7 +389,7 @@ class Container < ArvadosModel
       when Running
         permitted.push :finished_at, :output, :log
       when Queued, Locked
-        permitted.push :finished_at
+        permitted.push :finished_at, :log
       end

     else

Subtasks 1 (0 open1 closed)

Task #12299: Review 12298-cancel-failResolvedTom Clegg09/20/2017Actions

Related issues

Related to Arvados - Bug #12246: [Crunch] Better crunch-run error when command not foundResolvedTom Clegg09/27/2017Actions
Related to Arvados - Bug #12349: [API] Validate container requests "output_path must be in a writable mount"NewActions
Actions #1

Updated by Tom Clegg over 6 years ago

  • Status changed from New to In Progress
  • Assigned To set to Tom Clegg
  • Target version set to 2017-09-27 Sprint
Actions #2

Updated by Peter Amstutz over 6 years ago

12298-cancel-fail LGTM

Actions #3

Updated by Tom Clegg over 6 years ago

  • Status changed from In Progress to Resolved

Moved the "ideally" part of the proposed fix to #12349

Actions

Also available in: Atom PDF