Story #14284

Send real time container logs to a suitable log distribution system (instead of adding rows to the postgres logs table)

Added by Tom Clegg 6 months ago. Updated 29 days ago.

Status:
New
Priority:
Normal
Assigned To:
-
Category:
-
Target version:
Start date:
Due date:
% Done:

0%

Estimated time:
Story points:
-

Description

(split from #10181)

job output does not belong in the database logs table and should be able to be directed to non-Arvados logging systems

As a sysadmin, I'd rather my postgres database not fill up with hundreds of GB of job output logs. In addition to requiring a large amount of storage on the volume where the postgres database lives, this also tends to make queries to the logs table that have nothing to do with job output logging (i.e. fulfilling its role as more of an audit-log, such as checking for recent changes to collections) take ridiculously long. I think it would be best if no job output at all was stored in the central postgres database. In conjunction with the above story regarding storing in-progress job logs to keep, it would be great if some other system which is better suited to the task of buffering and distributing recent job output in order to make real-time job output available. It would be great if it could be sent via an existing log broker system such as logstash or fluentd such that it would be possible to not only direct the logs to whatever component Arvados uses to buffer and deliver the logs to consumers (such as via the existing websockets interface) but also to other non-Arvados logging systems (where we may be running the rest of the ELK/EFK stack for search and visualisation).


Related issues

Related to Arvados - Feature #10181: Crunch job output logging improvement storiesResolved2017-02-16

History

#1 Updated by Tom Clegg 6 months ago

  • Related to Feature #10181: Crunch job output logging improvement stories added

#2 Updated by Tom Morris 29 days ago

  • Target version set to To Be Groomed

Also available in: Atom PDF