Feature #17301
closedSpecial case report exit_code 137 as likely out of memory error
Description
One of the most common reasons for containers to fail by running out of memory and being OOM killed. When this happens the container exit code is 137. Arvados-cwl-runner should detect that and print a warning, workbench2 needs to display container warnings and errors similar to how it is already done with workbench 1.
Related issues
Updated by Peter Amstutz almost 4 years ago
- Category set to Workbench2
- Description updated (diff)
Updated by Peter Amstutz almost 4 years ago
- Target version set to 2021-02-17 sprint
Updated by Peter Amstutz almost 4 years ago
- Related to Idea #16945: WB2 Workflows / containers feature parity added
Updated by Peter Amstutz almost 4 years ago
- Release deleted (
31) - Target version deleted (
2021-02-17 sprint)
Updated by Peter Amstutz over 2 years ago
- Target version set to 2022-03-30 Sprint
Updated by Peter Amstutz over 2 years ago
- Target version changed from 2022-03-30 Sprint to 2022-04-13 Sprint
Updated by Peter Amstutz over 2 years ago
- Related to Feature #18513: Print "exited from signal XY" for exit codes >128 added
Updated by Peter Amstutz over 2 years ago
- Category changed from Workbench2 to CWL
Updated by Peter Amstutz over 2 years ago
- Target version changed from 2022-04-13 Sprint to 2022-04-27 Sprint
Updated by Peter Amstutz over 2 years ago
- Status changed from New to In Progress
Updated by Peter Amstutz over 2 years ago
17301-cwl-oom @ c22d90571a1fcb4b52e5387a791e3aefff5be6af
- Add special message about exit code 137
- Rework how runtime_status is updated, now takes the first line of the first message for the main message, and adds all subsequent messages in "details"
workbench re-run:
Updated by Lucas Di Pentima over 2 years ago
Reviewing c22d905
- The code assumes that
runtime_status['activityDetail']
is legal. Do we know if it's at least accepted in railsAPI/controller? (The documentation doesn't mention it) - The warning message seems to me a little too wordy. I was thinking that we could have an indexed documentation page where to point the user for broader explanations of the summarized messages that we display in WB2's UI. Food for thought, not sure if it should apply to this story.
- At
executor.py
:- Line 264: That comment seems to be outdated now.
- Line 268: There's a trailing semicolon.
- If we're going to use
runtime_status
as some sort of logging store (as I understand, any error/warning will be appended to this field) we'll need to think how to handle long texts on WB2.
Updated by Peter Amstutz over 2 years ago
Lucas Di Pentima wrote:
Reviewing c22d905
- The code assumes that
runtime_status['activityDetail']
is legal. Do we know if it's at least accepted in railsAPI/controller? (The documentation doesn't mention it)
Since a-c-r never posts 'activity' status I just took it out.
- The warning message seems to me a little too wordy. I was thinking that we could have an indexed documentation page where to point the user for broader explanations of the summarized messages that we display in WB2's UI. Food for thought, not sure if it should apply to this story.
I cut the text back to "Container may have been killed for using too much RAM. Try resubmitting with a higher 'ramMin'."
- At
executor.py
:
- Line 264: That comment seems to be outdated now.
- Line 268: There's a trailing semicolon.
Fixed
- If we're going to use
runtime_status
as some sort of logging store (as I understand, any error/warning will be appended to this field) we'll need to think how to handle long texts on WB2.
I added a 40 line limit to details.
17301-cwl-oom @ 332b0d1b4a9095f4e43893ec741f901b74b36ceb
Updated by Lucas Di Pentima over 2 years ago
Updates LGTM, but I don't understand why these tests failed: developer-run-tests-remainder: #3208 /console
Updated by Peter Amstutz over 2 years ago
This was annoying because it wasn't failing for me locally.
I fixed up the test cases to make sure RuntimeStatusLoggingHandler gets removed from the global logger.
Updated by Peter Amstutz over 2 years ago
- Status changed from In Progress to Resolved