Project

General

Profile

Troubleshooting aids » History » Revision 2

Revision 1 (Tom Clegg, 04/03/2024 07:40 PM) → Revision 2/5 (Tom Clegg, 04/03/2024 08:08 PM)

h1. Troubleshooting aids 

 Troubleshoot usage problems: 
 * Improve error messages (e.g., clients should not crash and dump stack when a server is slow/unresponsive) 

 Troubleshoot compute nodes/images: 
 * {{issue(21581)}} 
 * {{issue(21424)}} 

 Troubleshoot arvados system services: 
 * Save snapshot of internals (goroutines / memory profile) of specified system service(s) to a collection, and provide instructions for viewing 
 * Save last N minutes of logs from all arvados services running on this host 
 * Turn on debug mode temporarily, without restarting services 

 Expose config/scaling issues: 
 * Scan metrics for recent "near/at capacity" signals 
 * Probe for proper nginx/proxy config (e.g., max request body size)