Project

General

Profile

Websocket server » History » Version 11

Tom Clegg, 12/13/2016 09:20 PM

1 1 Tom Clegg
h1. Websocket server
2
3 10 Tom Clegg
(draft for v1)
4 1 Tom Clegg
5
{{toc}}
6
7 11 Tom Clegg
See also:
8
* [[Events API]]
9
* [[Hacking websocket server]]
10 1 Tom Clegg
11 8 Tom Clegg
h1. Messages
12 1 Tom Clegg
13 8 Tom Clegg
Each message is JSON-encoded as an object with exactly one key. The key indicates the message type, and the value contains the message content.
14 1 Tom Clegg
15 8 Tom Clegg
This allows clients and servers to decode messages efficiently: decode the first token to determine the message type, then (if the message content is relevant) decode the message payload into an appropriate data structure.
16 1 Tom Clegg
17 8 Tom Clegg
<pre><code class="javascript">
18
good: {"error":{"code":418,"text":"I'm a teapot"}}
19 1 Tom Clegg
20 8 Tom Clegg
bad:  {"errorCode":418,"errorText":"I'm a teapot"}
21
</code></pre>
22 1 Tom Clegg
23 8 Tom Clegg
Clients must ignore any unrecognized keys they encounter in the payload. This allows the server to add features without breaking existing clients.
24 1 Tom Clegg
25 8 Tom Clegg
h2. setAuth
26 1 Tom Clegg
27 8 Tom Clegg
After establishing a connection, and before subscribing to any streams, the client must supply an authorization token.
28 1 Tom Clegg
29 8 Tom Clegg
Successful authorization is acknowledged.
30 1 Tom Clegg
31 8 Tom Clegg
<pre><code class="javascript">
32
client: {
33
          "setAuth":{"token":"3kg6k6lzmp9kj5cpkcoxie963cmvjahbt2fod9zru30k1jqdmi"}
34
        }
35 1 Tom Clegg
36 8 Tom Clegg
server: {
37
          "auth":{"uuid":"zzzzz-gj3su-077z32aux8dg2s1"}
38
        }
39
</code></pre>
40 1 Tom Clegg
41 8 Tom Clegg
Unsuccessful authorization results in an error.
42 1 Tom Clegg
43 8 Tom Clegg
<pre><code class="javascript">
44
client: {
45
          "setAuth":{
46
            "token":"xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"}}
47 6 Peter Amstutz
48 8 Tom Clegg
server: {
49
          "authError":{
50
            "errorText":"invalid or expired token"}}
51
</code></pre>
52 1 Tom Clegg
53 8 Tom Clegg
h2. subscribe
54 1 Tom Clegg
55 8 Tom Clegg
Subscribe to an event stream.
56 1 Tom Clegg
57 8 Tom Clegg
If the given ETag does not match the current ETag, the server should send an update event right away: this means the client has already missed one or more updates since the version it has cached.
58 1 Tom Clegg
59 8 Tom Clegg
<pre><code class="javascript">
60
client: {
61
          "subscribe":{
62
            "uuid":"zzzzz-4zz18-1g4g0vhpjn9wq7i",
63
            "etag":"9u32836jpz7i046sd84gu190h"}}
64 1 Tom Clegg
65 8 Tom Clegg
server: {
66
          "event":{
67
            "msgID":12345,
68
            "type":"update",
69
            "uuid":"zzzzz-4zz18-1g4g0vhpjn9wq7i",
70
            "etag":"1wfdizt65l5w597jf5lojf8jm"}}
71
</code></pre>
72 1 Tom Clegg
73 8 Tom Clegg
When a client subscribes to a stream X, but is not authorized to read the object with UUID X (or there is no such object), the server sends an error message. This does not terminate the connection, nor does it affect any other streams.
74 1 Tom Clegg
75 8 Tom Clegg
<pre><code class="javascript">
76
client: {
77
          "subscribe":{
78
            "uuid":"zzzzz-tpzed-000000000000000",
79
            "etag":"x"}}
80 1 Tom Clegg
81 8 Tom Clegg
server: {
82
          "subscribeError":{
83
            "uuid":"zzzzz-tpzed-000000000000000",
84
            "errorText":"forbidden"}}
85
</code></pre>
86
87
h2. Container and job logging events
88
89
[[Events API]] &rarr; "Non-state-changing events"
90
91
<pre><code class="javascript">
92
client: {
93
          "subscribe":{
94
            "uuid":"zzzzz-dz642-logscontainer03",
95
            "etag":"2qtm62j6zb3nx5zud8b5v0ayl",
96
            "select":["logs.event_type","logs.properties.text"]}}
97
98
server: {
99
          "event":{
100
            "msgID":12346,
101
            "type":"log",
102
            "uuid":"zzzzz-dz642-logscontainer03",
103
            "etag":"2qtm62j6zb3nx5zud8b5v0ayl",
104
            "log":{
105
              "event_type":"stderr",
106
              "properties":{
107
                "text":"foo\n"}}}}
108
</code></pre>
109
110
h2. Update events
111
112 9 Tom Clegg
<pre><code class="javascript">
113
server: {
114
          "event":{
115
            "msgID":12345,
116
            "type":"update",
117
            "uuid":"zzzzz-4zz18-1g4g0vhpjn9wq7i",
118
            "etag":"1wfdizt65l5w597jf5lojf8jm"}}
119
</code></pre>
120
121 8 Tom Clegg
h2. Create events
122
123 9 Tom Clegg
<pre><code class="javascript">
124
server: {
125
          "event":{
126
            "msgID":12345,
127
            "type":"create",
128
            "uuid":"zzzzz-4zz18-1g4g0vhpjn9wq7i",
129
            "etag":"1wfdizt65l5w597jf5lojf8jm"}}
130
</code></pre>
131
132 8 Tom Clegg
h2. Delete events
133 9 Tom Clegg
134
<pre><code class="javascript">
135
server: {
136
          "event":{
137
            "msgID":12345,
138
            "type":"delete",
139
            "uuid":"zzzzz-4zz18-1g4g0vhpjn9wq7i",
140
            "etag":"1wfdizt65l5w597jf5lojf8jm"}}
141
</code></pre>
142
143
The etag reflects the last state of the object before it was deleted.
144
145
TBD: Should the etag be omitted instead?
146
147
Note: The logs table (and the old websocket API) use(d) a different event type: "destroy".
148 8 Tom Clegg
149
h2. Missed events
150
151
Zero or more events for a single stream have been skipped:
152
153
<pre><code class="javascript">
154
server: {
155
          "eventsMissed":{
156
            "msgID":12347,
157
            "uuid":"zzzzz-dz642-logscontainer03"}}
158
</code></pre>
159
160
Zero or more events on one or more of the subscribed streams have been skipped:
161
162
<pre><code class="javascript">
163
server: {
164
          "eventsMissed":{
165
            "msgID":12348}}
166
</code></pre>
167
168
h1. Server implementation
169
170
h2. Architecture
171
172
Go server with a goroutine serving each connection.
173
174
One goroutine receives incoming events and assigns msgID numbers.
175
176
Each connection has an outgoing event queue. Leave room for ability to resize a connection's outgoing queue dynamically, provided no subscriptions are active: this way privileged clients can request bigger queues.
177
178
Common events should be serialized once and distributed to all connections. This avoids serializing each event N times, and allows outgoing queues to share a single message buffer for a given event.
179
180
If practical, when a connection's outgoing queue fills up, send a "missed events" signal and discard all buffered events (and, of course, any incoming events that arrive while the buffer is full). After a "missed events" signal the client needs to assume its cache is out of date anyway. Expect a faster recovery from a temporary backlog if, when skipping events, we skip as many as we can.
181
182
h2. Logging
183
184
Print JSON-formatted log entries on stderr.
185
186
Print a log entry when a client connects.
187
188
Print a log entry when a client disconnects. Show counters for:
189
* Number of streams (UUIDs) added while connection was up
190
* Number of streams removed
191
* Number of events sent
192
* Number of bytes sent
193
* Total time spent waiting for Write() to return (or a better way to measure congestion?)
194
195 2 Tom Clegg
h2. Libraries
196
197
Websocket:
198
* https://godoc.org/golang.org/x/net/websocket
199
200
PostgreSQL:
201
* https://godoc.org/github.com/lib/pq via https://godoc.org/database/sql
202 1 Tom Clegg
* https://godoc.org/github.com/lib/pq#hdr-Notifications and https://godoc.org/github.com/lib/pq/listen_example
203
204 8 Tom Clegg
h1. Problems with old/current implementation
205 3 Tom Clegg
206 8 Tom Clegg
(Lessons to avoid re-learning next time...)
207 3 Tom Clegg
208 8 Tom Clegg
The Rails API server can function as a websocket server. Clients (notably Workbench, arv-mount, arv-ws) use it to listen for events without polling.
209 4 Tom Clegg
210 8 Tom Clegg
Problems with current implementation:
211
* Unreliable. See #9427, #8277
212
* Resource-heavy (one postgres connection per connected client, uses lots of memory)
213
* Logging is not very good
214
* Updates look like database records instead of API responses (e.g., computed fields are missing, collection manifest_text has no signatures)
215
* Offers an API for catching up on missed events after disconnecting/reconnecting, but this API (let alone the code) isn't enough to offer a "don't miss any events, don't send any events twice" guarantee. See #9388
216
217
#8460