Bug #11070
closed[arvados-ws] job logs are not being updated in real time
Description
In workbench, job logs are updated in real time via websockets. On our test installations with arvados-ws, that is no longer working. According to the arvados-ws logs, workbench is subscribing to the updates, so the problem appears to be somewhere in arvados-ws.
Updated by Tom Clegg almost 8 years ago
Lots of connections from arvados-ws to postgresql sitting in CLOSE_WAIT, which doesn't seem right:
9tee4:~# ls -lart --full-time /proc/2322/fd/ total 0 dr-xr-xr-x 9 root root 0 2016-12-28 19:27:58.596931467 +0000 .. dr-x------ 2 root root 0 2017-01-06 12:22:12.549920285 +0000 . lrwx------ 1 root root 64 2017-01-31 18:05:35.954714735 +0000 508 -> socket:[661990687] lrwx------ 1 root root 64 2017-02-01 23:23:12.710549212 +0000 99 -> socket:[129111377] lrwx------ 1 root root 64 2017-02-01 23:23:12.710549212 +0000 98 -> socket:[129110706] ... lrwx------ 1 root root 64 2017-02-01 23:23:12.710549212 +0000 1 -> socket:[18678] lr-x------ 1 root root 64 2017-02-01 23:23:12.710549212 +0000 0 -> /dev/null lrwx------ 1 root root 64 2017-02-08 05:32:54.873000662 +0000 513 -> socket:[664409635] 9tee4:~# netstat -nape | grep arvados-ws | grep tcp6 | wc -l 509 9tee4:~# netstat -nape | grep arvados-ws | grep CLOSE_WAIT | wc -l 500
Updated by Tom Clegg almost 8 years ago
I ran tcpdump on the active connection, and I could see the once-per-minute "ping" between arvados-ws and postgresql. But if I generate a log entry (e.g., by modifying my user record) there is no evidence of a "notify" event on the wire.
Updated by Tom Clegg almost 8 years ago
- didn't tell the pq library's listener to stop listening (which is why there was still 1 ping every minute on the wire), and
- didn't exit the program (so clients could still connect to the webserver goroutine, and wait for events that will never come).
Updated by Tom Clegg almost 8 years ago
11070-ws-listener-problem @ 0fc6eaead0bf7691e99d19e74ec33636909001a7
testing on 9tee4 now.
to test behavior after postgresql connection loss (as postgres superuser):- select * from pg_stat_activity;
- select pg_terminate_backend(pid) from pg_stat_activity where pid=XXXXX;
Updated by Tom Clegg almost 8 years ago
- Category set to API
- Status changed from New to In Progress
- Assigned To set to Tom Clegg
Updated by Tom Clegg almost 8 years ago
- Status changed from In Progress to Resolved
- % Done changed from 0 to 100
Applied in changeset arvados|commit:08c06bc813d96634b362e8c5736341fb2d874f59.