Project

General

Profile

Actions

Bug #5956

closed

[Deployment] Docker configuration changes restart Docker on compute nodes, interrupting running jobs

Added by Sarah Guthrie almost 9 years ago. Updated over 7 years ago.

Status:
Duplicate
Priority:
High
Assigned To:
-
Category:
Deployment
Target version:
-
Story points:
-

Description

Two types of errors causing long-running pipeline jobs to fail. The jobs fail on different inputs. The priority here is high because the outputs of this pipeline are needed for the paper.

The first failed on the 121st file: data_HG01927_cg_data_ASM_blood_var-GS000013202-ASM.fj (no such file error)
The second failed at the very beginning of the job (no such file error)
The third failed on the 79th file: data_HG00663_cg_data_ASM_lcl_var-GS000016983-ASM.fj (no such file error)

Type 1:

2015-05-07_22:28:38 su92l-8i9sb-d3zljjdswotmscp 13424 602 stderr time="2015-05-07T22:28:38Z" level=fatal msg="Post http:///var/run/docker.sock/v1.18/containers/6cac91fd0d4f232ff19350b1017598fa1a9ee381e414909a6b1d3eb82384d7a9/wait: write unix /var/run/docker.sock: broken pipe. Are you trying to connect to a TLS-enabled daemon without TLS?"

Type 2:

2015-05-07_22:35:39 su92l-8i9sb-d3zljjdswotmscp 13424 599 stderr time="2015-05-07T22:35:39Z" level=fatal msg="Post http:///var/run/docker.sock/v1.18/containers/97d5c61fc5fc9ae76f06db990f95c5df479658d69cfc6737387c98d5ee0957ac/wait: dial unix /var/run/docker.sock: no such file or directory. Are you trying to connect to a TLS-enabled daemon without TLS?"

Pipeline instance this failed on is:

https://workbench.su92l.arvadosapi.com/pipeline_instances/su92l-d1hrv-kqz04a5cri08gsc


Related issues

Related to Arvados - Bug #7481: Docker Daemon failure or FUSE problemDuplicate10/08/2015Actions
Has duplicate Arvados - Bug #5959: Failed Jobs on 5/7 (Docker issues?)Closed05/08/2015Actions
Actions

Also available in: Atom PDF