Project

General

Custom queries

Watchers (1)

Profile

Actions

Bug #5352

closed

[Crunch] Dispatcher not handling node allocation failures

Added by Abram Connelly about 10 years ago. Updated about 10 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
-
Target version:
Story points:
0.5

Description

Pipeline instance su92l-d1hrv-wqa83p9taz3tvmf failed for the second job in it's two job pipeline.

From the logs, there is an error that says:

2015-03-02_04:41:12 {"errors":["State invalid change from Failed to Complete"],"error_token":"1425271270+a34d19c7"} at /usr/local/arvados/src/sdk/perl/lib/Arvados/ResourceProxy.pm line 28

Though before it there are errors of the form:

2015-03-02_04:39:57 salloc: error: Unable to allocate resources: Requested nodes are busy
2015-03-02_04:40:37 salloc: Granted job allocation 272
2015-03-02_04:40:37 salloc: error: Unable to allocate resources: Requested nodes are busy
2015-03-02_04:40:38 salloc: error: Unable to allocate resources: Requested nodes are busy

Rerunning the pipeline/jobs has it complete successfully, so this issue looks to be transient.


Subtasks 1 (0 open1 closed)

Task #5670: Review 5352-crunch-dispatch-salloc-tempfail-wipResolvedPeter Amstutz04/05/2015Actions
#5

Updated by Peter Amstutz about 10 years ago

  • Subject changed from Transient pipeline failure to [Crunch] Dispatcher not handling node allocation failures
#6

Updated by Tom Clegg about 10 years ago

  • Target version changed from Bug Triage to 2015-04-29 sprint
#7

Updated by Tom Clegg about 10 years ago

  • Story points set to 0.5
#8

Updated by Tom Clegg about 10 years ago

  • Target version changed from 2015-04-29 sprint to Arvados Future Sprints
#9

Updated by Tom Clegg about 10 years ago

  • Target version changed from Arvados Future Sprints to 2015-04-29 sprint
#10

Updated by Brett Smith about 10 years ago

  • Assigned To set to Brett Smith
#11

Updated by Brett Smith about 10 years ago

  • Status changed from New to In Progress
#15

Updated by Brett Smith about 10 years ago

  • Status changed from In Progress to Resolved
Actions

Also available in: Atom PDF