Websocket server¶
(draft for v1 -- note this is not the v0 API implemented by arvados-ws as of July 2018.)
- Table of contents
- Websocket server
- Messages
- Server implementation
- Problems with old/current implementation
Messages¶
Each message is JSON-encoded as an object with exactly one key. The key indicates the message type, and the value contains the message content.
This allows clients and servers to decode messages efficiently: decode the first token to determine the message type, then (if the message content is relevant) decode the message payload into an appropriate data structure.
good: {"error":{"code":418,"text":"I'm a teapot"}}
bad: {"errorCode":418,"errorText":"I'm a teapot"}
Clients must ignore any unrecognized keys they encounter in the payload. This allows the server to add features without breaking existing clients.
setAuth¶
After establishing a connection, and before subscribing to any streams, the client must supply an authorization token.
Successful authorization is acknowledged.
client: {
"setAuth":{"token":"3kg6k6lzmp9kj5cpkcoxie963cmvjahbt2fod9zru30k1jqdmi"}
}
server: {
"auth":{"uuid":"zzzzz-gj3su-077z32aux8dg2s1"}
}
Unsuccessful authorization results in an error.
client: {
"setAuth":{
"token":"xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"}}
server: {
"authError":{
"errorText":"invalid or expired token"}}
subscribe¶
Subscribe to an event stream.
If the given ETag does not match the current ETag, the server should send an update event right away: this means the client has already missed one or more updates since the version it has cached.
client: {
"subscribe":{
"uuid":"zzzzz-4zz18-1g4g0vhpjn9wq7i",
"etag":"9u32836jpz7i046sd84gu190h"}}
server: {
"event":{
"msgID":12345,
"type":"update",
"uuid":"zzzzz-4zz18-1g4g0vhpjn9wq7i",
"etag":"1wfdizt65l5w597jf5lojf8jm"}}
When a client subscribes to a stream X, but is not authorized to read the object with UUID X (or there is no such object), the server sends an error message. This does not terminate the connection, nor does it affect any other streams.
client: {
"subscribe":{
"uuid":"zzzzz-tpzed-000000000000000",
"etag":"x"}}
server: {
"subscribeError":{
"uuid":"zzzzz-tpzed-000000000000000",
"errorText":"forbidden"}}
Container and job logging events¶
Events API → "Non-state-changing events"
client: {
"subscribe":{
"uuid":"zzzzz-dz642-logscontainer03",
"etag":"2qtm62j6zb3nx5zud8b5v0ayl",
"select":["logs.event_type","logs.properties.text"]}}
server: {
"event":{
"msgID":12346,
"type":"log",
"uuid":"zzzzz-dz642-logscontainer03",
"etag":"2qtm62j6zb3nx5zud8b5v0ayl",
"log":{
"event_type":"stderr",
"properties":{
"text":"foo\n"}}}}
Update events¶
server: {
"event":{
"msgID":12345,
"type":"update",
"uuid":"zzzzz-4zz18-1g4g0vhpjn9wq7i",
"etag":"1wfdizt65l5w597jf5lojf8jm"}}
Create events¶
server: {
"event":{
"msgID":12345,
"type":"create",
"uuid":"zzzzz-4zz18-1g4g0vhpjn9wq7i",
"etag":"1wfdizt65l5w597jf5lojf8jm"}}
Delete events¶
server: {
"event":{
"msgID":12345,
"type":"delete",
"uuid":"zzzzz-4zz18-1g4g0vhpjn9wq7i",
"etag":"1wfdizt65l5w597jf5lojf8jm"}}
The etag reflects the last state of the object before it was deleted.
TBD: Should the etag be omitted instead?
Note: The logs table (and the old websocket API) use(d) a different event type: "destroy".
Missed events¶
Zero or more events for a single stream have been skipped:
server: {
"eventsMissed":{
"msgID":12347,
"uuid":"zzzzz-dz642-logscontainer03"}}
Zero or more events on one or more of the subscribed streams have been skipped:
server: {
"eventsMissed":{
"msgID":12348}}
Server implementation¶
Architecture¶
Go server with a goroutine serving each connection.
One goroutine receives incoming events and assigns msgID numbers.
Each connection has an outgoing event queue. Leave room for ability to resize a connection's outgoing queue dynamically, provided no subscriptions are active: this way privileged clients can request bigger queues.
Common events should be serialized once and distributed to all connections. This avoids serializing each event N times, and allows outgoing queues to share a single message buffer for a given event.
If practical, when a connection's outgoing queue fills up, send a "missed events" signal and discard all buffered events (and, of course, any incoming events that arrive while the buffer is full). After a "missed events" signal the client needs to assume its cache is out of date anyway. Expect a faster recovery from a temporary backlog if, when skipping events, we skip as many as we can.
Logging¶
Print JSON-formatted log entries on stderr.
Print a log entry when a client connects.
Print a log entry when a client disconnects. Show counters for:- Number of streams (UUIDs) added while connection was up
- Number of streams removed
- Number of events sent
- Number of bytes sent
- Total time spent waiting for Write() to return (or a better way to measure congestion?)
Libraries¶
Websocket: PostgreSQL:- https://godoc.org/github.com/lib/pq via https://godoc.org/database/sql
- https://godoc.org/github.com/lib/pq#hdr-Notifications and https://godoc.org/github.com/lib/pq/listen_example
Problems with old/current implementation¶
(Lessons to avoid re-learning next time...)
The Rails API server can function as a websocket server. Clients (notably Workbench, arv-mount, arv-ws) use it to listen for events without polling.
Problems with current implementation:- Unreliable. See #9427, #8277
- Resource-heavy (one postgres connection per connected client, uses lots of memory)
- Logging is not very good
- Updates look like database records instead of API responses (e.g., computed fields are missing, collection manifest_text has no signatures)
- Offers an API for catching up on missed events after disconnecting/reconnecting, but this API (let alone the code) isn't enough to offer a "don't miss any events, don't send any events twice" guarantee. See #9388
Updated by Tom Clegg over 6 years ago · 12 revisions