[arvados-ws] Websocket server stops processing events, but stays connected
Sometimes, after successfully processing hundreds or thousands of events, arvados-ws goes into a state where clients don't receive any events. The EventsIn number at /status.json is static, which indicates arvados-ws isn't receiving events from PostgreSQL.
Clients can still connect / stay connected, the once-per-minute empty "ping" message still works.
Cause is unknown.
#2 Updated by Tom Clegg over 1 year ago
- Assigned To set to Tom Clegg
- Status changed from New to In Progress
Not sure whether this is related to the observed failures but it seems worth fixing either way. Arvados-ws does a periodic listener ping, but hasn't been checking the returned error. With this change, if the ping fails, arvados-ws will log the error and exit/restart.
#5 Updated by Tom Clegg over 1 year ago
Replaces the old status/debug.json stuff with prometheus metrics. Also refactors services/ws to share service-startup code and distribute inside arvados-server like controller, boot, install, dispatchcloud, etc.