After some time under heavy load, the websockets in our system stop working.

The puma log shows nothing, and the nginx log shows errors for every request of the form:

2016/01/21 11:05:35 [error] 8741#0: *812590 recv() failed (104: Connection reset by peer) while reading response header from upstream, client:, server:, request: "GET /websocket?api_token=27fk7czgtrthf9th2e7dnrieny7npr0mk5gqmrqgj0mp4ded34 HTTP/1.1", upstream: "", host: ""

Attempting to connect directly to the puma server with curl also gives a similar message:

$ curl http://localhost:8100/websocket?api_token=27fk7czgtrthf9th2e7dnrieny7npr0mk5gqmrqgj0mp4ded34
curl: (56) Recv failure: Connection reset by peer

When I strace the process, all I see is a futex call that seems to never return:

  1. ps auxwww | grep 'www-data.*puma' | grep -v grep
    www-data 9247 96.6 0.2 10139760 576192 ? Sl 01:26 560:43 puma 2.8.2 (tcp://
  2. strace -p 9247
    Process 9247 attached - interrupt to quit
    futex(0x7f4381dd1744, FUTEX_WAIT_PRIVATE, 1, NULL

I killed the puma process, causing runsv to restart it, and now everything seems to be fine again, but I suspect it will happen again at some point.

Our puma version reports itself as: Version 2.8.2 (ruby 2.1.7-p400), codename: Sir Edmund Percival Hillary

Is duplicate of Arvados - Bug #8323: [API] Puma hangs forever on a futex, requiring restartResolved01/29/2016Actions
Updated by Brett Smith about 8 years ago

  • Status changed from New to Duplicate

#8323 documents the server-side issue in more detail.


