Project

General

Profile

Actions

Bug #4124

closed

[Crunch] Socket timed out on send/recv operation causes pipeline failure

Added by Sarah Guthrie over 9 years ago. Updated about 7 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
Crunch
Target version:
Story points:
-

Description

This pipeline uses one_task_per_input_file. There are over 800 input files. Each task takes about 20 minutes to run. Soon after one of the tasks finish successfully, this error occurs:

...
Tue Oct 7 17:02:42 2014 9tee4-8i9sb-blb48qaou8uatsi 8876 21 stderr srun: error: slurm_receive_msgs: Socket timed out on send/recv operation
Tue Oct 7 17:02:42 2014 9tee4-8i9sb-blb48qaou8uatsi 8876 21 stderr srun: error: Task launch for 98.24 failed on node compute0: Socket timed out on send/recv operation
Tue Oct 7 17:02:42 2014 9tee4-8i9sb-blb48qaou8uatsi 8876 21 stderr srun: error: Application launch failed: Socket timed out on send/recv operation

Seen on 9tee4 on pipeline instance: https://workbench.9tee4.arvadosapi.com/pipeline_instances/9tee4-d1hrv-5zc2cqp2yzhcmd8


Files

@40000000570d24e3389fa8ac.s (121 KB) @40000000570d24e3389fa8ac.s Nico César, 03/29/2017 09:17 PM

Subtasks 2 (0 open2 closed)

Task #11485: ReviewClosed10/07/2014Actions
Task #11483: ReviewResolvedTom Clegg10/07/2014Actions
Actions

Also available in: Atom PDF