Project

General

Profile

Actions

Bug #3384

closed

[Crunch] Termination of jobs due to 'Connection timed out'?

Added by Abram Connelly over 9 years ago. Updated over 9 years ago.

Status:
Closed
Priority:
Normal
Assigned To:
Tim Pierce
Category:
Crunch
Target version:
Story points:
-

Description

Pipeline instance qr1hi-8i9sb-n1yv047kymyjtxs failed when it was working before. Looking at the output log collection 2482f18b2f601d248bb4fe93e296b862+87, there is a line that says:

2014-07-28_14:02:15 qr1hi-8i9sb-n1yv047kymyjtxs 10767 50 stderr socket.error: [Errno 110] Connection timed out
2014-07-28_14:02:15 qr1hi-8i9sb-n1yv047kymyjtxs 10767 50 stderr srun: error: compute0: task 0: Exited with exit code 1

followed by subsquent job cancellations:

2014-07-28_14:02:16 qr1hi-8i9sb-n1yv047kymyjtxs 10767 54 stderr srun: sending Ctrl-C to job 3133.57
2014-07-28_14:02:16 qr1hi-8i9sb-n1yv047kymyjtxs 10767 54 stderr crunchstat: caught signal:interrupt
Actions #1

Updated by Tom Clegg over 9 years ago

Possible solution (or at least helpful improvement):

[Crunch] API communication fail should result in recording temporary task failure, not permanent.

Actions #2

Updated by Ward Vandewege over 9 years ago

  • Target version set to Bug Triage
Actions #3

Updated by Ward Vandewege over 9 years ago

  • Project changed from Arvados to 35
Actions #4

Updated by Tim Pierce over 9 years ago

  • Target version changed from Bug Triage to 2014-10-08 sprint
Actions #5

Updated by Tim Pierce over 9 years ago

  • Subject changed from Termination of jobs due to 'Connection timed out'? to [Crunch] Termination of jobs due to 'Connection timed out'?
  • Category set to Crunch
  • Project changed from 35 to Arvados
  • Assigned To set to Tim Pierce
Actions #6

Updated by Tim Pierce over 9 years ago

  • Status changed from New to Closed

Could not reproduce; re-running this pipeline at https://workbench.qr1hi.arvadosapi.com/pipeline_instances/qr1hi-d1hrv-ftse9e4sz35fot7 yielded success.

Actions

Also available in: Atom PDF