Project

General

Profile

Actions

Bug #5352

closed

[Crunch] Dispatcher not handling node allocation failures

Added by Abram Connelly about 9 years ago. Updated about 9 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
-
Target version:
Story points:
0.5

Description

Pipeline instance su92l-d1hrv-wqa83p9taz3tvmf failed for the second job in it's two job pipeline.

From the logs, there is an error that says:

2015-03-02_04:41:12 {"errors":["State invalid change from Failed to Complete"],"error_token":"1425271270+a34d19c7"} at /usr/local/arvados/src/sdk/perl/lib/Arvados/ResourceProxy.pm line 28

Though before it there are errors of the form:

2015-03-02_04:39:57 salloc: error: Unable to allocate resources: Requested nodes are busy
2015-03-02_04:40:37 salloc: Granted job allocation 272
2015-03-02_04:40:37 salloc: error: Unable to allocate resources: Requested nodes are busy
2015-03-02_04:40:38 salloc: error: Unable to allocate resources: Requested nodes are busy

Rerunning the pipeline/jobs has it complete successfully, so this issue looks to be transient.


Subtasks 1 (0 open1 closed)

Task #5670: Review 5352-crunch-dispatch-salloc-tempfail-wipResolvedPeter Amstutz04/05/2015Actions
Actions

Also available in: Atom PDF