Project

General

Profile

Actions

Idea #16222

closed

Handle container live logs in a more scalable way

Added by Peter Amstutz about 4 years ago. Updated 27 days ago.

Status:
Resolved
Priority:
Normal
Assigned To:
-
Category:
-
Target version:
-
Story points:
-

Description

Containers send their live logs to the database. Unfortunately when there are a very large number of containers, this can overwhelm the database and cause the API server to become non-responsive and start returning 503 errors.

We need better system behaviors and/or a new architecture so that large logging volumes do not cripple the system, and ideally don't require extensive tuning like the current logging parameters do, which only ever happens after a critical failure.

This solution should maintain two key features of the current system:

  • Live logs are be delivered to the browser in a reasonable amount time (latency should be seconds, not minutes)
  • Logs are stored for long enough that if a compute node running a container fails abruptly, there is a reasonable period where an admin doing a post-mortem can access logs leading right up until the point that that the compute node went away.

Related issues

Is duplicate of Arvados Epics - Idea #16442: Scalable + reliable container loggingResolved03/15/202308/31/2023Actions
Actions #2

Updated by Peter Amstutz about 4 years ago

  • Description updated (diff)
Actions #3

Updated by Peter Amstutz almost 4 years ago

  • Related to Idea #16442: Scalable + reliable container logging added
Actions #5

Updated by Peter Amstutz about 1 year ago

  • Release set to 60
Actions #6

Updated by Tom Clegg 11 months ago

  • Related to deleted (Idea #16442: Scalable + reliable container logging)
Actions #7

Updated by Tom Clegg 11 months ago

  • Is duplicate of Idea #16442: Scalable + reliable container logging added
Actions #8

Updated by Tom Clegg 11 months ago

  • Status changed from New to Closed
Actions #9

Updated by Peter Amstutz 27 days ago

  • Release deleted (60)
Actions #10

Updated by Peter Amstutz 27 days ago

  • Status changed from Closed to Resolved
Actions

Also available in: Atom PDF