Actions
Troubleshooting aids » History » Revision 2
« Previous |
Revision 2/5
(diff)
| Next »
Tom Clegg, 04/03/2024 08:08 PM
Troubleshooting aids¶
Troubleshoot usage problems:- Improve error messages (e.g., clients should not crash and dump stack when a server is slow/unresponsive)
- Idea #21581: Crunch saves compute node journals to collections readable only by administrators
- Idea #21424: Way to run a diagnostic container that captures all system logs, not just Crunch's
- Save snapshot of internals (goroutines / memory profile) of specified system service(s) to a collection, and provide instructions for viewing
- Save last N minutes of logs from all arvados services running on this host
- Turn on debug mode temporarily, without restarting services
- Scan metrics for recent "near/at capacity" signals
- Probe for proper nginx/proxy config (e.g., max request body size)
Updated by Tom Clegg 9 months ago · 5 revisions