Project

General

Profile

Actions

Feature #13403

open

[crunch-run] Cancel container on FUSE error

Added by Peter Amstutz almost 6 years ago. Updated about 2 months ago.

Status:
New
Priority:
Normal
Assigned To:
-
Category:
-
Target version:
Story points:
2.0
Release:
Release relationship:
Auto

Description

https://dev.arvados.org/issues/13377#note-4

The fact that we kept the container alive for 7 hours retrying seems like a problem, too. In the context of a container, if arv-mount gives up on a fuse request and returns an error analogous to "filesystem is corrupt / disk is dead" to the caller, should we automatically fail the container?

If arv-mount has a block read error (or really anything that will get turned into EIO by FUSE), crunch-run should cancel the container (the container request may retry by existing logic, though).

Proposed design:

  • arv-mount emits a well known error string when it returns a major file system error (EIO)
  • crunch-run monitors arv-mount output and looks for major file system errors or out or memory error (MemoryError)
  • on seeing the error from arv-mount, crunch-run logs an error of its own and cancels the container.
Actions #1

Updated by Peter Amstutz almost 6 years ago

  • Status changed from New to In Progress
Actions #2

Updated by Peter Amstutz almost 6 years ago

  • Description updated (diff)
Actions #4

Updated by Peter Amstutz almost 6 years ago

  • Status changed from In Progress to New
Actions #5

Updated by Peter Amstutz almost 6 years ago

  • Description updated (diff)
Actions #6

Updated by Tom Clegg almost 6 years ago

  • Story points set to 2.0
Actions #7

Updated by Tom Morris almost 6 years ago

  • Subject changed from Cancel container on FUSE error to [crunch-run] Cancel container on FUSE error
  • Target version changed from To Be Groomed to Arvados Future Sprints
Actions #9

Updated by Peter Amstutz over 5 years ago

  • Description updated (diff)
Actions #10

Updated by Peter Amstutz over 5 years ago

  • Description updated (diff)
Actions #11

Updated by Peter Amstutz almost 3 years ago

  • Target version deleted (Arvados Future Sprints)
Actions #12

Updated by Peter Amstutz about 1 year ago

  • Release set to 60
Actions #13

Updated by Peter Amstutz about 2 months ago

  • Target version set to Future
Actions

Also available in: Atom PDF