Project

General

Profile

Actions

Bug #21993

closed

Workflow runner keeps running when its only subprocess is "on hold" (state=queued, priority=0)

Added by Lucas Di Pentima 4 months ago. Updated 2 months ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
CWL
Story points:
-
Release:
Release relationship:
Auto

Description

Steps to reproduce:

  1. Launch a workflow that starts with a single step (e.g.: revsort) and wait for the wf runner instance to start
  2. Wait for the subprocess appears as Queued
  3. Cancel the subprocess

The workflow runner instance will keep running indefinitely. Not sure if it's a-c-r's job to realize that it should cancel itself.


Files

stuck-wf.png (161 KB) stuck-wf.png Lucas Di Pentima, 07/09/2024 06:36 PM

Subtasks 1 (0 open1 closed)

Task #22012: Review 21993-wf-step-cancel ResolvedPeter Amstutz07/25/2024Actions

Related issues

Related to Arvados - Bug #20985: Setting priority 0 on a queued container should change it to "cancelled" stateNewActions
Actions #1

Updated by Lucas Di Pentima 4 months ago

Actions #2

Updated by Brett Smith 4 months ago

  • Related to Bug #20985: Setting priority 0 on a queued container should change it to "cancelled" state added
Actions #3

Updated by Peter Amstutz 4 months ago

Yes, arvados-cwl-runner should notice that the priority went to 0 and do something about it.

Actions #4

Updated by Peter Amstutz 3 months ago

  • Category changed from Crunch to CWL
Actions #5

Updated by Peter Amstutz 3 months ago

  • Assigned To set to Peter Amstutz
Actions #6

Updated by Peter Amstutz 3 months ago

21993-wf-step-cancel @ e19c1309aa6bad138479da3a3a5dd737fca400b7

developer-run-tests: #4358

  • All agreed upon points are implemented / addressed.
    • workflow steps cancelled before being run are now recognized and treated as failed steps
  • Anything not implemented (discovered or discussed during work) has a follow-up story.
    • n/a
  • Code is tested and passing, both automated and manual, what manual testing was done is described
    • manually tested by running a workflow, cancelling the step in workbench, and observing the workflow runner didn't terminate. Then implemented the fix, went through the same process, and the cancelled (state: Committed, priority: 0) workflow step is now recognized as a failed step.
  • Documentation has been updated.
    • n/a
  • Behaves appropriately at the intended scale (describe intended scale).
    • no changes to scale
  • Considered backwards and forwards compatibility issues between client and server.
    • no issues, it now requests the 'priority' field from the container request, but that field has been in the container API from the very beginning
  • Follows our coding standards and GUI style guidelines.
    • yes

This is a straightforward bugfix, the only unusual thing is that it took so long to notice that there was a problem.

Unit testing currently doesn't really cover the main loop of the workflow runner. Integration tests do execute the main loop but it is difficult to test exceptional cases.

Actions #7

Updated by Peter Amstutz 3 months ago

  • Status changed from New to In Progress
Actions #8

Updated by Lucas Di Pentima 3 months ago

This LGTM, thanks!

Actions #9

Updated by Peter Amstutz 3 months ago

  • Status changed from In Progress to Resolved
Actions #10

Updated by Peter Amstutz 2 months ago

  • Release set to 70
Actions

Also available in: Atom PDF