Bug #8931

[SDK] websocket event thread crash

Added by Peter Amstutz over 3 years ago. Updated over 3 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
SDKs
Target version:
Start date:
04/25/2016
Due date:
% Done:

100%

Estimated time:
(Total: 0.00 h)
Story points:
0.5

Description

EventClient and PollClient in the Python SDK should be catching exceptions in a couple of places that it currently does not, which results in the event thread crashing and thing that depend on it getting wedged:

  • The callback to user code, to avoid crashing the event handler thread
    • When this happens, call thread.interrupt_main() to raise an exception in the main thread, and force it to deal with the error.
  • When PollClient makes calls to the API server
    • Use the same style of RetryLoop that one_task_per_input_file uses. It can retry infinitely, but take care that the backoff numbers make sense.
  • Test that PollClient retries API failures.

Subtasks

Task #8977: Review 8931-event-thread-catch-exceptionsResolvedRadhika Chippada


Related issues

Related to Arvados - Bug #8928: [SDKs] EventClient crashes after API server returns 504Feedback

Related to Arvados - Bug #9051: [SDKs] EventClient fails to reconnect after HandshakeError from last connectionResolved

Associated revisions

Revision 47a79960
Added by Peter Amstutz over 3 years ago

Merge branch '8931-event-thread-catch-exceptions' closes #8931

History

#1 Updated by Peter Amstutz over 3 years ago

  • Description updated (diff)
  • Estimated time set to 0.50 h

#2 Updated by Brett Smith over 3 years ago

  • Target version set to Arvados Future Sprints

#3 Updated by Brett Smith over 3 years ago

  • Description updated (diff)
  • Category set to SDKs
  • Story points set to 0.5

#4 Updated by Peter Amstutz over 3 years ago

  • Subject changed from [SDK] event thread crash to [SDK] websocket event thread crash

#5 Updated by Brett Smith over 3 years ago

  • Assigned To set to Peter Amstutz
  • Target version changed from Arvados Future Sprints to 2016-04-27 sprint

#6 Updated by Peter Amstutz over 3 years ago

  • Status changed from New to In Progress

#7 Updated by Peter Amstutz over 3 years ago

8931-event-thread-catch-exceptions is ready for review

#8 Updated by Radhika Chippada over 3 years ago

Reviewing 89e091b3

  • Should we also add an else statement in the tries_left for loop at line 175, similar to line 150? What happens if an exception other than ApiError is caught?
  • max_wait: Maximum time to wait between retries — should we say “Maximum number of seconds to wait …” instead?

#9 Updated by Peter Amstutz over 3 years ago

8931-event-thread-catch-exceptions back to you @ 75c049d

Includes possible fix for #9051 by having it create a new _EventClient on each reconnect retry.

#10 Updated by Peter Amstutz over 3 years ago

  • Status changed from In Progress to Resolved
  • % Done changed from 0 to 100

Applied in changeset arvados|commit:47a79960c81ea689445f2040b24cb76729afab06.

Also available in: Atom PDF