Project

General

Profile

Actions

Bug #9427

closed

[Websockets] event queue backlogged

Added by Peter Amstutz over 8 years ago. Updated over 8 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
-
Target version:
Story points:
-

Description

The websockets server currently uses the EventMachine library for multiplexing events. This is a single threaded event queue (despite the fact that we use it with Puma, which is a thread-per-connection web server).

We're seeing severe lag, where new connections respond slowly or not at all. This seems to be correlated if any connection is using the "catch up" functionality to replay past events. Upon instrumenting the server, we can see the performance:

2016-06-16_20:30:05.76986 #<Faye::WebSocket:0x000000049298d0> sending 28643742
2016-06-16_20:30:05.87824 #<Faye::WebSocket:0x000000049298d0> sent 28643742
2016-06-16_20:30:05.87832 #<Faye::WebSocket:0x000000049298d0> sending 28643743
2016-06-16_20:30:05.99874 #<Faye::WebSocket:0x000000049298d0> sent 28643743
2016-06-16_20:30:05.99882 #<Faye::WebSocket:0x000000049298d0> sending 28643745
2016-06-16_20:30:06.10723 #<Faye::WebSocket:0x000000049298d0> sent 28643745
2016-06-16_20:30:06.10731 #<Faye::WebSocket:0x000000049298d0> sending 28643746
2016-06-16_20:30:06.21399 #<Faye::WebSocket:0x000000049298d0> sent 28643746
2016-06-16_20:30:06.21407 #<Faye::WebSocket:0x000000049298d0> sending 28643749
2016-06-16_20:30:06.32665 #<Faye::WebSocket:0x000000049298d0> sent 28643749
2016-06-16_20:30:06.32675 #<Faye::WebSocket:0x000000049298d0> sending 28643750
2016-06-16_20:30:06.43588 #<Faye::WebSocket:0x000000049298d0> sent 28643750

In this trace it is spending about 100ms per invocation of websocket.send(). This is probably due to a slow or backlogged client connection. Because all events are currently processed in a single thread, this means the websocket server can't scale past the speed of the slowest client.

This may also be affected by a related performance problem: every event has to run through the permission checks to check if the user is allowed to read the event; as the number of groups readable by a user becomes large, this check becomes expensive.

Proposed solution:

We need a thread per connection for blocking operations (accessing the database, sending events).

It would be nice if we could create an EventMachine "reactor" per thread, however this doesn't appear to be possible.

EventMachine has thread pool functionality with "defer". We may able to make use of that instead of next_tick.


Subtasks 1 (0 open1 closed)

Task #9432: Review 9427-threaded-websocketsResolvedPeter Amstutz06/17/2016Actions

Related issues 1 (0 open1 closed)

Related to Arvados - Bug #9388: [Websockets] events are skippedResolvedPeter Amstutz06/10/2016Actions
Actions

Also available in: Atom PDF