Project

General

Profile

Actions

Bug #19429

open

Somehow detect that scratch space failed to mount

Added by Peter Amstutz over 1 year ago. Updated 2 months ago.

Status:
New
Priority:
Normal
Assigned To:
-
Category:
Crunch
Target version:
Story points:
-
Release:
Release relationship:
Auto

Description

A failure mode with cloud compute nodes is that the scratch space can fail to mount due to misconfiguration. When this happens, jobs may mysteriously run out of disk space and it can be very difficult to debug.

Maybe "arvados-server cloudtest" could include a check that an expected amount of disk space is available, or even check on AWS that we have the right IAM permissions.

Actions #1

Updated by Peter Amstutz over 1 year ago

  • Description updated (diff)
Actions #2

Updated by Peter Amstutz about 1 year ago

  • Release set to 60
Actions #3

Updated by Peter Amstutz 2 months ago

  • Target version set to Future
Actions

Also available in: Atom PDF