Feature #4302: [Crunch] Pipelines should not fail immediately after one job failure, but continue running as much as possible - Arvados

Actions

Copy link

Feature #4302

open

[Crunch] Pipelines should not fail immediately after one job failure, but continue running as much as possible

Added by Bryan Cosca over 10 years ago. Updated about 1 year ago.

Status:

New

Priority:

Normal

Assigned To:

Category:

Crunch

Target version:

Future

Story points:

Release:

Postponed

Release relationship:

Auto

Description

The current pipeline running model very linear:
If A > B > C > D (the output of a is needed to run b, etc). if B fails, C and D do not get run and then the pipeline instance fails.

Lets say if A > B > C > D and B > E > F. If C fails, D does not get run and the entire pipeline instance fails. BUT what if you want to see if E and F complete? With the current model, E does get run but the output does not get saved (for example: qr1hi-8i9sb-56vgstlp2wk56vn). I would love to know if F gets completed before i go and edit the template and look into C and D, but we cannot because E's output does not get fed into F.

Lets say that there are more of these branches... If B branches out to 20 other jobs, and one of those branches fail, the other 19 get affected, which wastes a ton of time. The bioinformatician has to edit the pipeline template and remove those failed jobs and rerun on the 19 other branches. If one of those branches fail then its more editing, etc. A ton of time could be saved if the pipeline is run and all the branches finish (failed or success) and then editing can be done after the branches finish. Lets say 10 of those branches actually complete, then you saved 10 edit processes and sitting at your computer 10x as much. It would be easy to just wait and edit that template once after all jobs are complete.

Also, the scenario where the bioinformatician runs a pipeline and walks away for a couple hours to see nothing has been outputted would be kind of frustrating because the bioinformatician would have to rerun the pipeline and then do nothing but wait. He could have been analyzing the other branches in the pipeline and doing something useful, rather than waiting for his pipeline to finish.

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Arvados

Custom queries

Feature #4302

[Crunch] Pipelines should not fail immediately after one job failure, but continue running as much as possible

Updated by Brett Smith over 10 years ago

Updated by Tom Clegg over 10 years ago

Updated by Ward Vandewege almost 4 years ago

Updated by Peter Amstutz about 2 years ago

Updated by Peter Amstutz about 1 year ago