Project

General

Profile

Actions

Bug #19437

closed

[crunch-run] Require >1 watchdog errors before giving up and killing docker container

Added by Peter Amstutz over 1 year ago. Updated over 1 year ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
Crunch
Target version:
Story points:
-
Release relationship:
Auto

Description

Observed on customer cluster, this seems to have failed multiple times but eventually succeeded (it seems to have run to completion and was only canceled at the very end).

2022-08-31T00:00:01.820945772Z Creating Docker container
2022-08-31T00:00:09.932234553Z Starting container
2022-08-31T00:00:10.896745626Z Waiting for container to finish
2022-08-31T02:25:10.898243240Z Error inspecting container: Get "http://%2Fvar%2Frun%2Fdocker.sock/v1.21/containers/230188325e24f42d3ad8dfd8ceef5c7069733bacdaafe7adaf5bf5a3c4c644f5/json": context deadline exceeded
2022-08-31T02:25:10.898483541Z error in Run: Get "http://%2Fvar%2Frun%2Fdocker.sock/v1.21/containers/230188325e24f42d3ad8dfd8ceef5c7069733bacdaafe7adaf5bf5a3c4c644f5/json": context deadline exceeded
2022-08-31T02:38:12.612609772Z copying "/temp.txt" (0 bytes)
2022-08-31T02:38:13.468649279Z Cancelled

Subtasks 1 (0 open1 closed)

Task #19443: Review 19437-docker-watchdogResolvedPeter Amstutz09/02/2022Actions

Related issues

Related to Arvados - Bug #20595: "error inspecting container" causing containers to be abandonedResolvedPeter AmstutzActions
Actions

Also available in: Atom PDF