Message queue

Arvados needs a (better) message-passing facility for internal use (e.g., container dispatch) and as a way to offer pubsub-like APIs (e.g., live container logs).

Motivation / background

Sometimes Arvados clients and system components need to pass messages to one another. Often, REST makes sense. But in some cases (e.g., it's not practical for the sender to know the message destination(s), or it's not efficient for the sender to initiate an HTTP request to each destination) something more like pubsub is needed.

Currently (2017) we use PostgreSQL notify/listen as a pubsub device.
  • This is very inefficient, primarily because Postgresql messages are very short. We pass longer messages by inserting the message to a "logs" table on disk, and sending the resulting ID to subscribers, who then select the real message. Eventually (depending on server config) we delete old rows.
  • Arvados userspace processes cannot connect directly to PostgreSQL; this approach requires an intermediary (arvados-ws) to implement the Arvados permission model.

Components/features that need a message queue

  • showing live container logs in workbench/terminal
  • crunch stderr logs
  • system logs
  • container added to queue → noticed by dispatch
  • container cancelled → noticed by dispatch
  • container state/progress has changed, workbench should update page
  • container finished, here is the output (maybe REST API is still best for this)
  • get current state of object and notify me when it changes relative to that
  • arvados-ws (“cache invalidation”) -- workbench, fuse
  • (future) "latest state of object" service, similar to arvados-ws but more convenient for the client
  • permission graph is updated (?)

Desired capabilities of a message queue

  • Support for WebSocket transport to enable use by browser
  • Good client libraries available for Go, Python, Javascript, Ruby
  • Can apply Arvados permission model
    • Cancel/silence subscription if relevant permission is revoked while subscription is open
  • Pub/sub of live change events on records
    • Transactional subscribe request that returns most recent record + subscribes to subsequent changes (alternatively, this might be a separate service built on the message queue + REST API)
  • Pub/sub of live Container logs
    • Transactional subscribe request that returns recent recent log history + subscribes to subsequent logs (optional?) (alternatively, this might be a separate service built on the message queue + Keep, if crunch-run checkpoints logs periodically)
    • In-order delivery of logs
  • Recover from transient network problems
    • Transparently catch up on missed events
  • Easy to deploy (fits into our stack)
    • Server can be imported as a module and run in-process by our Go binaries
  • Scales to thousands of topics/subscriptions per router process

Implementation

Investigate WAMP:

http://wamp-proto.org/

It uses WebSockets as its principal transport, and has client library implementations for the languages we care about (Go, Python, Javascript)

https://github.com/gammazero/nexus

https://github.com/crossbario/autobahn-python

https://github.com/crossbario/autobahn-js

Probably what we want to do is run a "WAMP router" that sits in the middle of the log producers, the logging microservice, and log listeners (browser).

There are currently 15 WAMP routers listed at http://wamp-proto.org/implementations/