Bug #4591

[API] When websockets server runs out of memory, it should exit so it can be restarted, instead of wedging.

Added by Bryan Cosca about 6 years ago. Updated almost 6 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Brett Smith
Category:
API
Target version:
Start date:
12/02/2014
Due date:
% Done:

100%

Estimated time:
(Total: 1.00 h)
Story points:
0.5

Description

Symptom: Pages not refreshing due to failure to connect to websockets

bcosc:
If I start a new instance, the page will not get refreshed that the job has started/running.

Peter:
"Iceweasel can't establish a connection to the server at wss://ws.qr1hi.arvadosapi.com/websocket"

Tom:
http://ruby-doc.org/core-2.1.4/Thread.html#method-c-abort_on_exception-3D ?


Subtasks

Task #4701: Review 4591-websockets-raise-oom-wipResolvedBrett Smith


Related issues

Has duplicate Arvados - Bug #4623: No auto-update of pipeline times in browserClosed11/20/2014

Associated revisions

Revision 1af2d4f7
Added by Brett Smith almost 6 years ago

Merge branch '4591-websockets-raise-oom-wip'

Closes #4591, #4701.

History

#1 Updated by Bryan Cosca about 6 years ago

also from queued to pending

#2 Updated by Bryan Cosca about 6 years ago

or pending to complete for that matter

#3 Updated by Peter Amstutz about 6 years ago

  • Subject changed from workbench fails to refresh at pipeline instances when jobs are "Not ready" to [OPS] Websockets not working
  • Target version set to Bug Triage

#4 Updated by Peter Amstutz about 6 years ago

  • Subject changed from [OPS] Websockets not working to [OPS] Pages not refreshing due to failure to connect to websockets

#5 Updated by Peter Amstutz about 6 years ago

  • Description updated (diff)

#6 Updated by Tom Clegg about 6 years ago

  • Subject changed from [OPS] Pages not refreshing due to failure to connect to websockets to [API] When websockets server runs out of memory, it should exit so it can be restarted, instead of wedging.
  • Description updated (diff)
  • Category set to API

#7 Updated by Brett Smith about 6 years ago

  • Assigned To set to Brett Smith
  • Target version changed from Bug Triage to 2014-12-10 sprint

#8 Updated by Brett Smith almost 6 years ago

  • Status changed from New to In Progress

#9 Updated by Peter Amstutz almost 6 years ago

Per Tom's comment in the description, let's try setting Thread.abort_on_exception = true and see if that breaks anything. The websockets server retains very little state so it is best to just kill it with extreme prejudice at the first sign of trouble.

#10 Updated by Brett Smith almost 6 years ago

Peter Amstutz wrote:

Per Tom's comment in the description, let's try setting Thread.abort_on_exception = true and see if that breaks anything. The websockets server retains very little state so it is best to just kill it with extreme prejudice at the first sign of trouble.

Seems to be fine. Tested this locally with a hacked arv-ws that was rigged up to send a non-JSON string. That got back the "malformed request" response as expected. Then a normal arv-ws could connect without trouble. Now at 95d12ecb.

#11 Updated by Peter Amstutz almost 6 years ago

Great. LGTM

#12 Updated by Brett Smith almost 6 years ago

  • Status changed from In Progress to Resolved
  • % Done changed from 0 to 100

Applied in changeset arvados|commit:1af2d4f71f6a7ba4374f8490ef1b4f0b972e2dec.

#13 Updated by Ward Vandewege almost 6 years ago

  • Story points set to 0.5

Also available in: Atom PDF