Project

General

Profile

Websocket server » History » Version 10

Tom Clegg, 12/13/2016 09:17 PM

1 1 Tom Clegg
h1. Websocket server
2
3 10 Tom Clegg
(draft for v1)
4 1 Tom Clegg
5
{{toc}}
6
7 8 Tom Clegg
See also: [[Events API]]
8 1 Tom Clegg
9 8 Tom Clegg
h1. Messages
10 1 Tom Clegg
11 8 Tom Clegg
Each message is JSON-encoded as an object with exactly one key. The key indicates the message type, and the value contains the message content.
12 1 Tom Clegg
13 8 Tom Clegg
This allows clients and servers to decode messages efficiently: decode the first token to determine the message type, then (if the message content is relevant) decode the message payload into an appropriate data structure.
14 1 Tom Clegg
15 8 Tom Clegg
<pre><code class="javascript">
16
good: {"error":{"code":418,"text":"I'm a teapot"}}
17 1 Tom Clegg
18 8 Tom Clegg
bad:  {"errorCode":418,"errorText":"I'm a teapot"}
19
</code></pre>
20 1 Tom Clegg
21 8 Tom Clegg
Clients must ignore any unrecognized keys they encounter in the payload. This allows the server to add features without breaking existing clients.
22 1 Tom Clegg
23 8 Tom Clegg
h2. setAuth
24 1 Tom Clegg
25 8 Tom Clegg
After establishing a connection, and before subscribing to any streams, the client must supply an authorization token.
26 1 Tom Clegg
27 8 Tom Clegg
Successful authorization is acknowledged.
28 1 Tom Clegg
29 8 Tom Clegg
<pre><code class="javascript">
30
client: {
31
          "setAuth":{"token":"3kg6k6lzmp9kj5cpkcoxie963cmvjahbt2fod9zru30k1jqdmi"}
32
        }
33 1 Tom Clegg
34 8 Tom Clegg
server: {
35
          "auth":{"uuid":"zzzzz-gj3su-077z32aux8dg2s1"}
36
        }
37
</code></pre>
38 1 Tom Clegg
39 8 Tom Clegg
Unsuccessful authorization results in an error.
40 1 Tom Clegg
41 8 Tom Clegg
<pre><code class="javascript">
42
client: {
43
          "setAuth":{
44
            "token":"xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"}}
45 6 Peter Amstutz
46 8 Tom Clegg
server: {
47
          "authError":{
48
            "errorText":"invalid or expired token"}}
49
</code></pre>
50 1 Tom Clegg
51 8 Tom Clegg
h2. subscribe
52 1 Tom Clegg
53 8 Tom Clegg
Subscribe to an event stream.
54 1 Tom Clegg
55 8 Tom Clegg
If the given ETag does not match the current ETag, the server should send an update event right away: this means the client has already missed one or more updates since the version it has cached.
56 1 Tom Clegg
57 8 Tom Clegg
<pre><code class="javascript">
58
client: {
59
          "subscribe":{
60
            "uuid":"zzzzz-4zz18-1g4g0vhpjn9wq7i",
61
            "etag":"9u32836jpz7i046sd84gu190h"}}
62 1 Tom Clegg
63 8 Tom Clegg
server: {
64
          "event":{
65
            "msgID":12345,
66
            "type":"update",
67
            "uuid":"zzzzz-4zz18-1g4g0vhpjn9wq7i",
68
            "etag":"1wfdizt65l5w597jf5lojf8jm"}}
69
</code></pre>
70 1 Tom Clegg
71 8 Tom Clegg
When a client subscribes to a stream X, but is not authorized to read the object with UUID X (or there is no such object), the server sends an error message. This does not terminate the connection, nor does it affect any other streams.
72 1 Tom Clegg
73 8 Tom Clegg
<pre><code class="javascript">
74
client: {
75
          "subscribe":{
76
            "uuid":"zzzzz-tpzed-000000000000000",
77
            "etag":"x"}}
78 1 Tom Clegg
79 8 Tom Clegg
server: {
80
          "subscribeError":{
81
            "uuid":"zzzzz-tpzed-000000000000000",
82
            "errorText":"forbidden"}}
83
</code></pre>
84
85
h2. Container and job logging events
86
87
[[Events API]] &rarr; "Non-state-changing events"
88
89
<pre><code class="javascript">
90
client: {
91
          "subscribe":{
92
            "uuid":"zzzzz-dz642-logscontainer03",
93
            "etag":"2qtm62j6zb3nx5zud8b5v0ayl",
94
            "select":["logs.event_type","logs.properties.text"]}}
95
96
server: {
97
          "event":{
98
            "msgID":12346,
99
            "type":"log",
100
            "uuid":"zzzzz-dz642-logscontainer03",
101
            "etag":"2qtm62j6zb3nx5zud8b5v0ayl",
102
            "log":{
103
              "event_type":"stderr",
104
              "properties":{
105
                "text":"foo\n"}}}}
106
</code></pre>
107
108
h2. Update events
109
110 9 Tom Clegg
<pre><code class="javascript">
111
server: {
112
          "event":{
113
            "msgID":12345,
114
            "type":"update",
115
            "uuid":"zzzzz-4zz18-1g4g0vhpjn9wq7i",
116
            "etag":"1wfdizt65l5w597jf5lojf8jm"}}
117
</code></pre>
118
119 8 Tom Clegg
h2. Create events
120
121 9 Tom Clegg
<pre><code class="javascript">
122
server: {
123
          "event":{
124
            "msgID":12345,
125
            "type":"create",
126
            "uuid":"zzzzz-4zz18-1g4g0vhpjn9wq7i",
127
            "etag":"1wfdizt65l5w597jf5lojf8jm"}}
128
</code></pre>
129
130 8 Tom Clegg
h2. Delete events
131 9 Tom Clegg
132
<pre><code class="javascript">
133
server: {
134
          "event":{
135
            "msgID":12345,
136
            "type":"delete",
137
            "uuid":"zzzzz-4zz18-1g4g0vhpjn9wq7i",
138
            "etag":"1wfdizt65l5w597jf5lojf8jm"}}
139
</code></pre>
140
141
The etag reflects the last state of the object before it was deleted.
142
143
TBD: Should the etag be omitted instead?
144
145
Note: The logs table (and the old websocket API) use(d) a different event type: "destroy".
146 8 Tom Clegg
147
h2. Missed events
148
149
Zero or more events for a single stream have been skipped:
150
151
<pre><code class="javascript">
152
server: {
153
          "eventsMissed":{
154
            "msgID":12347,
155
            "uuid":"zzzzz-dz642-logscontainer03"}}
156
</code></pre>
157
158
Zero or more events on one or more of the subscribed streams have been skipped:
159
160
<pre><code class="javascript">
161
server: {
162
          "eventsMissed":{
163
            "msgID":12348}}
164
</code></pre>
165
166
h1. Server implementation
167
168
h2. Architecture
169
170
Go server with a goroutine serving each connection.
171
172
One goroutine receives incoming events and assigns msgID numbers.
173
174
Each connection has an outgoing event queue. Leave room for ability to resize a connection's outgoing queue dynamically, provided no subscriptions are active: this way privileged clients can request bigger queues.
175
176
Common events should be serialized once and distributed to all connections. This avoids serializing each event N times, and allows outgoing queues to share a single message buffer for a given event.
177
178
If practical, when a connection's outgoing queue fills up, send a "missed events" signal and discard all buffered events (and, of course, any incoming events that arrive while the buffer is full). After a "missed events" signal the client needs to assume its cache is out of date anyway. Expect a faster recovery from a temporary backlog if, when skipping events, we skip as many as we can.
179
180
h2. Logging
181
182
Print JSON-formatted log entries on stderr.
183
184
Print a log entry when a client connects.
185
186
Print a log entry when a client disconnects. Show counters for:
187
* Number of streams (UUIDs) added while connection was up
188
* Number of streams removed
189
* Number of events sent
190
* Number of bytes sent
191
* Total time spent waiting for Write() to return (or a better way to measure congestion?)
192
193 2 Tom Clegg
h2. Libraries
194
195
Websocket:
196
* https://godoc.org/golang.org/x/net/websocket
197
198
PostgreSQL:
199
* https://godoc.org/github.com/lib/pq via https://godoc.org/database/sql
200 1 Tom Clegg
* https://godoc.org/github.com/lib/pq#hdr-Notifications and https://godoc.org/github.com/lib/pq/listen_example
201
202 8 Tom Clegg
h1. Problems with old/current implementation
203 3 Tom Clegg
204 8 Tom Clegg
(Lessons to avoid re-learning next time...)
205 3 Tom Clegg
206 8 Tom Clegg
The Rails API server can function as a websocket server. Clients (notably Workbench, arv-mount, arv-ws) use it to listen for events without polling.
207 4 Tom Clegg
208 8 Tom Clegg
Problems with current implementation:
209
* Unreliable. See #9427, #8277
210
* Resource-heavy (one postgres connection per connected client, uses lots of memory)
211
* Logging is not very good
212
* Updates look like database records instead of API responses (e.g., computed fields are missing, collection manifest_text has no signatures)
213
* Offers an API for catching up on missed events after disconnecting/reconnecting, but this API (let alone the code) isn't enough to offer a "don't miss any events, don't send any events twice" guarantee. See #9388
214
215
#8460