Events API

(draft)

See also: Websocket server

Purpose

The Events API serves to notify processes about events that interest them as soon as possible after those events happen.

(The history of events that have happened in the past is also interesting, but that's addressed by the Logs API, not the Events API.)

Conceptual model

An event reports a change to the state of an object.

The fact that an object's state has changed is meaningful only when its previous state is known. For example, if a client asks "tell me the next time object X changes" at nearly the same time X changes, the response depends on whether the request arrives before or after the change occurs.

Therefore, the Events API should support operations like:
  • "tell me the current state of X, and then notify me next time it changes"
  • "tell me as soon as X differs from my cached copy that has Etag E"

An "event stream" is a sequence of events about an object, starting from an implicit or explicit known state.

Essential features

Multiple streams

The Events API supports multiplexing event streams on a single connection. The cost of setting up and maintaining an event channel can be non-trivial, and the sequence of events concerning multiple related objects may be significant.

It is possible to add and remove event streams on an existing connection, without interrupting other streams.

It is permitted to hold a connection open with no event streams, but the server may close such connections after some time threshold.

Delivery guarantees

In general, the Events API cannot guarantee that every event will be delivered.

However, there are specific cases where it is beneficial (and practical) to detect missed events and notify the client.

If some events are dropped but the event stream is still open (for example, a server-side buffer overflows because a client is receiving data too slowly) the server must indicate this to the client no later than the next event delivery.
  • The "missed events" signal may specify a single event stream (UUID); if not, the client must interpret this as "events may have been missed on all active streams".
  • The "missed events" signal does not necessarily specify the number of missed events.
  • The server is permitted to send a "missed events" signal even if no events were missed.
Depending on the application, a client might respond to a "missed events" signal by
  • restarting the affected streams immediately
  • restarting the affected streams only if they stay silent for some timeout period
  • doing nothing
  • hanging up

Event message content

Each event includes, at minimum, the UUID and Etag of the changed object.

Non-state-changing events (logs)

Container/job log messages (e.g., stderr) should be available through the Events API, even though they don't correspond to an etag change in any object.

Given that the etag does not change, the client is obviously interested in other attributes of the event itself (e.g., stderr text), so those attributes must be
  • included with the event payload, or
  • stored in a Log object whose UUID is included in the event payload, or
  • both of the above.

(to be discussed)

Each non-state-changing event should include the UUID of the relevant Log object.

Each non-state-changing event should include the attributes of the relevant Log object itself.

Additional features

Event sequence

With the current API server, it may be possible to update an object twice in quick succession such that the modification timestamps are out of order: i.e., the current state of object X has modification time T1, even though the same object previously had modification time T2>T1. If this occurs, the Events API must return the T2 update before the T1 update (or not return the T2 update at all).

In order to support delivery mechanisms where messages are re-ordered in transit, the Events API should assign a strictly increasing integer ID to each event sent over a given connection. Client pseudocode:

receiveEvent(id, uuid, newEtag):
  if lastID[uuid] > id:
    # already received a newer update for this object
    return
  currentEtag[uuid] = newEtag
  lastID[uuid] = id

Note these IDs are connection-specific: they cannot be used to reconnect and resume an event stream.

Server-side event filters

Some clients will only be interested in a subset of possible changes. For example, a pipeline runner wants to know as soon as a container's "state" attribute changes, but might not care about other changes like "priority" or "progress".

Possible API features for reducing unnecessary work and network traffic:
  1. Allow clients to describe which attributes are interesting, e.g., "select":["state"]
  2. With each event, provide the list of changed attributes, e.g., "changed":["state","output","log"], but not the attribute values themselves

These features might be tricky to implement efficiently for attributes that are computed on the fly.

Including object attributes with events

Some clients perform a GET request in response to every event reveived. For the sake of efficiency and convenience, if desired by the client, the Events API should perform that request internally, and supply the response along with the event.

Clients should be able to control (separately for each stream) the list of object attributes to include with each event. This list corresponds to the "select" parameter for the "get object" REST API.

By default, only the "uuid" and "etag" attributes are included. It is not possible to un-select those attributes.

The values for any returned attributes must be identical to the values that would be returned in a GET response.

Null stream

To simplify implementation of clients that subscribe to event streams but also retrieve some objects without listening for events, a client should be able to use the Events API to retrieve the current state of an object without subscribing to the object's event stream.

Ownership-change events

Some clients need to know when an object is added or removed from a project.

When an object's owner_uuid changes, this event should be sent to:
  1. all clients subscribed to the object itself
  2. all clients subscribed to the old owner_uuid
  3. all clients subscribed to the new owner_uuid

Likewise, subscribing to stream X should cause clients to receive messages when a new object is created with owner_uuid=X, and when an object with owner_uuid=X is deleted.

"Owner and children" subscription

Some clients need to know when any object in a project (or other type of group) changes.

When subscribing to a group or user X, clients should have the option to receive events about objects whose owner_uuid is X, even if the event does not change the owner_uuid.

In order to avoid races using etags, the client or server would have to send the initial/cached etag for a (sometimes large) number of child objects.

Alternatively, the server can send a message acknowledging the subscription, and guarantee that no events will be silently missed after the acknowledgement is sent. If a client needs to avoid races, it must invalidate its cache of child objects upon receiving the acknowledgement message.

Event batches

When sending a sequence of events that differ only in etag (i.e., they refer to the same object UUID and the payload consists of just the new etag) the server is permitted to send just the last event, and silently skip the rest.

Client and use case examples

Workbench
  • "Object updated" events can trigger a page/section refresh.
  • "Job/container stderr" events add text content to the page.
  • Current implementation listens for all events, and filters them by UUID on the client side (inefficient!).
Future Workbench (single-page app)
  • "Get object now, and again whenever updated; re-render whenever response arrives" is a likely pattern.
  • During page transitions, connection will stay open but subscriptions will change (adjacent pages will often have overlapping subscriptions).
  • Will still want to display stderr messages as they arrive, when a container/job log is on the screen.
arv-mount
  • Current implementation listens for all events, and filters them by UUID on the client side (even when falling back to polling).
arv-ws (generic command line tool)
  • Current default mode listens for all events.
  • Offers "listen to events for given UUIDs" mode (compatible).
  • Offers "listen to events with arbitrary filters" mode (incompatible).
arvados-cwl-runner
  • Current implementation (2016-10-25) emulates a subscription by polling current state for a centrally tracked set of UUIDS, converting the responses to look like "update with new attributes" log entries, and passing them to an event handler.

Comparison with initial websocket API

The approach described here differs from the initial puma-based websocket service:
  • It is no longer possible to listen for events without filtering by UUID.
  • By default, events are compact (previously, "update" events included contents of all old and new database columns).
  • Races are addressed using etags. The server is not expected to replay an arbitrary set of past events in sequence.