Project

General

Profile

Websocket server » History » Version 7

Peter Amstutz, 08/19/2016 02:40 AM

1 1 Tom Clegg
h1. Websocket server
2
3
(early draft)
4
5
{{toc}}
6
7
h2. Background
8
9
The Rails API server can function as a websocket server. Clients (notably Workbench, arv-mount, arv-ws) use it to listen for events without polling.
10
11
Problems with current implementation:
12 3 Tom Clegg
* Unreliable. See #9427, #8277
13 1 Tom Clegg
* Resource-heavy (one postgres connection per connected client, uses lots of memory)
14
* Logging is not very good
15
* Updates look like database records instead of API responses (e.g., computed fields are missing, collection manifest_text has no signatures)
16 3 Tom Clegg
* Offers an API for catching up on missed events after disconnecting/reconnecting, but this API (let alone the code) isn't enough to offer a "don't miss any events, don't send any events twice" guarantee. See #9388
17 1 Tom Clegg
18 3 Tom Clegg
#8460
19
20
h2. Desired features
21
22
Monotonically increasing event IDs, so clients can (meaningfully) request "all matching events since X"
23
24 6 Peter Amstutz
h2. Design sketch (TC)
25 1 Tom Clegg
26
New server, written in Go.
27
28
One goroutine per connected client.
29
30
One database connection receiving notifications about new logs. (Possibly still N database connections serving "catch-up" messages to N clients.)
31 6 Peter Amstutz
32
h2. Design sketch (PA)
33
34
When client connects, it can request a new session (with event filter), or asks to resume an existing session from a given event id.
35
36
Each session has a session id and is associated with a user, an event channel, an event queue, event filters, and exactly one websocket connection.
37
38 7 Peter Amstutz
Clients can create multiple sessions on the same websocket.  When a session is created, the client can request a replay of events from database, either "last N" or "since time T".
39 6 Peter Amstutz
40 7 Peter Amstutz
Resuming a session associates the session to a new websocket connection (must be same user).  Resuming a session replays any queued events occurring after the given event id.
41 1 Tom Clegg
42 7 Peter Amstutz
Orderly websocket disconnects tear down any associated sessions.  Abrupt disconnects maintain the sessions for N minutes.  Clients can also end individual sessions without disconnecting the websocket.
43
44 6 Peter Amstutz
NOTIFY sends full json-encoded record.
45
46 7 Peter Amstutz
Websocket server database LISTEN receives NOTIFY, deserializes record, assigns a sequence number to the event, gets @object_uuid@ and @object_owner_uuid@ and determines the set of users who have permission to view the log record using in-memory cache of permissions graph.  
47 1 Tom Clegg
48
Intersect the set of users who can view the record with the users associated with active sessions and adds the record to the queue for each session with the associated user.
49 6 Peter Amstutz
50 7 Peter Amstutz
The goroutine for session gets a new event on the event channel.  It first applies the session filters to determine if the event should be propagated.  (This should be a Go-based implementation of Arvados query filters and not touch the database.)  If it passes filters it is added to a queue, and can be sent to the websocket client.
51 6 Peter Amstutz
52 7 Peter Amstutz
The websocket client receives the event and sends an acknowledgement with the sequence number.  The server receives the acknowledgement and removes messages from the queue with sequence number less than or equal to the acknowledgement.
53
54 2 Tom Clegg
55
h2. Libraries
56
57
Websocket:
58
* https://godoc.org/golang.org/x/net/websocket
59
60
PostgreSQL:
61 1 Tom Clegg
* https://godoc.org/github.com/lib/pq via https://godoc.org/database/sql
62
* https://godoc.org/github.com/lib/pq#hdr-Notifications and https://godoc.org/github.com/lib/pq/listen_example
63 3 Tom Clegg
64
h2. Obstacles
65
66
#8565, #8566
67 4 Tom Clegg
68
h2. Related
69
70
It might be expedient to offload synchronization to some existing software that does this well.
71
* "Apache Zookeeper":https://zookeeper.apache.org/doc/trunk/zookeeperOver.html -- "Coordination services are notoriously hard to get right. They are especially prone to errors such as race conditions and deadlock. The motivation behind ZooKeeper is to relieve distributed applications the responsibility of implementing coordination services from scratch."
72
* Google "Chubby":http://static.googleusercontent.com/media/research.google.com/en//archive/chubby-osdi06.pdf paper
73 5 Nico César
* NSQ http://nsq.io/