Troubleshooting aids » History » Version 1
Tom Clegg, 04/03/2024 07:40 PM
1 | 1 | Tom Clegg | h1. Troubleshooting aids |
---|---|---|---|
2 | |||
3 | Troubleshoot usage problems: |
||
4 | * Improve error messages (e.g., clients should not crash and dump stack when a server is slow/unresponsive) |
||
5 | |||
6 | Troubleshoot compute nodes/images: |
||
7 | * {{issue(21581)}} |
||
8 | * {{issue(21424)}} |
||
9 | |||
10 | Troubleshoot arvados system services: |
||
11 | * Save snapshot of internals (goroutines / memory profile) of specified system service(s) to a collection, and provide instructions for viewing |
||
12 | * Save last N minutes of logs from all arvados services running on this host |
||
13 | |||
14 | Expose config/scaling issues: |
||
15 | * Scan metrics for recent "near/at capacity" signals |
||
16 | * Probe for proper nginx/proxy config (e.g., max request body size) |