Idea #14284
closedSend real time container logs to a suitable log distribution system (instead of adding rows to the postgres logs table)
Description
(split from #10181)
job output does not belong in the database logs table and should be able to be directed to non-Arvados logging systems
As a sysadmin, I'd rather my postgres database not fill up with hundreds of GB of job output logs. In addition to requiring a large amount of storage on the volume where the postgres database lives, this also tends to make queries to the logs table that have nothing to do with job output logging (i.e. fulfilling its role as more of an audit-log, such as checking for recent changes to collections) take ridiculously long. I think it would be best if no job output at all was stored in the central postgres database. In conjunction with the above story regarding storing in-progress job logs to keep, it would be great if some other system which is better suited to the task of buffering and distributing recent job output in order to make real-time job output available. It would be great if it could be sent via an existing log broker system such as logstash or fluentd such that it would be possible to not only direct the logs to whatever component Arvados uses to buffer and deliver the logs to consumers (such as via the existing websockets interface) but also to other non-Arvados logging systems (where we may be running the rest of the ELK/EFK stack for search and visualisation).
Related issues
Updated by Tom Clegg about 6 years ago
- Related to Feature #10181: Crunch job output logging improvement stories added
Updated by Peter Amstutz almost 5 years ago
- Target version deleted (
To Be Groomed) - Status changed from New to Closed