Project

General

Profile

Actions

Feature #18513

closed

Print "exited from signal XY" for exit codes >128

Added by Peter Amstutz 7 months ago. Updated 3 months ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
Crunch
Target version:
Start date:
01/18/2022
Due date:
% Done:

100%

Estimated time:
(Total: 0.00 h)
Story points:
-
Release relationship:
Auto

Description

When a program exits from a signal, crunch-run prints this:

Container exited with code: 139

Instead, this should say something like "Exited with signal 11 (SIGSEGV)" because most people don't actually know that exit codes over 128 mean exiting due to an unhandled signal, and nobody likes doing math.


Subtasks 1 (0 open1 closed)

Task #18629: Review 18513-log-signal-exitResolvedTom Clegg01/18/2022

Actions

Related issues

Related to Arvados - Feature #17301: Special case report exit_code 137 as likely out of memory errorResolvedPeter Amstutz04/20/2022

Actions
Actions #1

Updated by Peter Amstutz 7 months ago

  • Status changed from New to In Progress
Actions #2

Updated by Peter Amstutz 7 months ago

  • Status changed from In Progress to New
  • Description updated (diff)
Actions #3

Updated by Peter Amstutz 6 months ago

  • Target version set to 2022-01-19 sprint
  • Assigned To set to Tom Clegg
Actions #4

Updated by Tom Clegg 5 months ago

  • Status changed from New to In Progress
Depending on whether there was a signal and whether it's a known signal number, the log message is now one of:
  • Container exited with status code 100
  • Container exited with status code 228 (signal 100)
  • Container exited with status code 137 (signal 9, SIGKILL)

While I was in the vicinity I figured we could test that the container executors actually return exit codes and signal numbers this way (as opposed to the "wait status" format where the exit code is in higher bits). This worked except that I couldn't get any variation of "docker run busybox sh -c 'kill -9 $$'" to work. Perhaps injecting a Go program that divides by zero would work? For now, this test just skips the docker case. For now I guess we've seen enough "exited 137" to know it works that way.

18513-log-signal-exit @ 3b9c4641a985a53347696b7a77bcde28a92d6e79 -- developer-run-tests: #2878

Actions #5

Updated by Lucas Di Pentima 5 months ago

This LGTM and I think will be super helpful for debugging. Thanks!

Actions #6

Updated by Tom Clegg 5 months ago

  • Status changed from In Progress to Resolved

Applied in changeset arvados-private:commit:arvados|dd056538060528e6f7b7b48183dfcaeac7882638.

Actions #7

Updated by Peter Amstutz 3 months ago

  • Release set to 46
Actions #8

Updated by Peter Amstutz 3 months ago

  • Related to Feature #17301: Special case report exit_code 137 as likely out of memory error added
Actions

Also available in: Atom PDF