Project

General

Profile

Actions

Feature #4302

open

[Crunch] Pipelines should not fail immediately after one job failure, but continue running as much as possible

Added by Bryan Cosca about 10 years ago. Updated 10 months ago.

Status:
New
Priority:
Normal
Assigned To:
-
Category:
Crunch
Target version:
Story points:
-
Release:
Release relationship:
Auto

Description

The current pipeline running model very linear:
If A > B > C > D (the output of a is needed to run b, etc). if B fails, C and D do not get run and then the pipeline instance fails.

Lets say if A > B > C > D and B > E > F. If C fails, D does not get run and the entire pipeline instance fails. BUT what if you want to see if E and F complete? With the current model, E does get run but the output does not get saved (for example: qr1hi-8i9sb-56vgstlp2wk56vn). I would love to know if F gets completed before i go and edit the template and look into C and D, but we cannot because E's output does not get fed into F.

Lets say that there are more of these branches... If B branches out to 20 other jobs, and one of those branches fail, the other 19 get affected, which wastes a ton of time. The bioinformatician has to edit the pipeline template and remove those failed jobs and rerun on the 19 other branches. If one of those branches fail then its more editing, etc. A ton of time could be saved if the pipeline is run and all the branches finish (failed or success) and then editing can be done after the branches finish. Lets say 10 of those branches actually complete, then you saved 10 edit processes and sitting at your computer 10x as much. It would be easy to just wait and edit that template once after all jobs are complete.

Also, the scenario where the bioinformatician runs a pipeline and walks away for a couple hours to see nothing has been outputted would be kind of frustrating because the bioinformatician would have to rerun the pipeline and then do nothing but wait. He could have been analyzing the other branches in the pipeline and doing something useful, rather than waiting for his pipeline to finish.

Actions #1

Updated by Brett Smith about 10 years ago

  • Subject changed from Partial Failure for pipelines to [Crunch] Pipelines should not fail immediately after one job failure, but continue running as much as possible
  • Category set to Crunch
  • Target version set to Arvados Future Sprints
Actions #2

Updated by Tom Clegg about 10 years ago

arv-run-pipeline-instance has (had?) this option, but there has never been a way in Workbench to specify whether you want it. (Sometimes you really do want to halt, and stop wasting resources, as soon as something doesn't work as expected. a-r-p-i mimics the "make" / "make -k" convention, i.e., the default is to stop as soon as one thing fails.)

Actions #3

Updated by Ward Vandewege over 3 years ago

  • Target version deleted (Arvados Future Sprints)
Actions #4

Updated by Peter Amstutz almost 2 years ago

  • Release set to 60
Actions #5

Updated by Peter Amstutz 10 months ago

  • Target version set to Future
Actions

Also available in: Atom PDF