Project

General

Profile

Actions

Bug #8931

closed

[SDK] websocket event thread crash

Added by Peter Amstutz over 8 years ago. Updated over 8 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
SDKs
Target version:
Story points:
0.5

Description

EventClient and PollClient in the Python SDK should be catching exceptions in a couple of places that it currently does not, which results in the event thread crashing and thing that depend on it getting wedged:

  • The callback to user code, to avoid crashing the event handler thread
    • When this happens, call thread.interrupt_main() to raise an exception in the main thread, and force it to deal with the error.
  • When PollClient makes calls to the API server
    • Use the same style of RetryLoop that one_task_per_input_file uses. It can retry infinitely, but take care that the backoff numbers make sense.
  • Test that PollClient retries API failures.

Subtasks 1 (0 open1 closed)

Task #8977: Review 8931-event-thread-catch-exceptionsResolvedRadhika Chippada04/25/2016Actions

Related issues 2 (0 open2 closed)

Related to Arvados - Bug #8928: [SDKs] EventClient crashes after API server returns 504ClosedJiayong LiActions
Related to Arvados - Bug #9051: [SDKs] EventClient fails to reconnect after HandshakeError from last connectionResolvedActions
Actions #1

Updated by Peter Amstutz over 8 years ago

  • Description updated (diff)
  • Estimated time set to 0.50 h
Actions #2

Updated by Brett Smith over 8 years ago

  • Target version set to Arvados Future Sprints
Actions #3

Updated by Brett Smith over 8 years ago

  • Description updated (diff)
  • Category set to SDKs
  • Story points set to 0.5
Actions #4

Updated by Peter Amstutz over 8 years ago

  • Subject changed from [SDK] event thread crash to [SDK] websocket event thread crash
Actions #5

Updated by Brett Smith over 8 years ago

  • Assigned To set to Peter Amstutz
  • Target version changed from Arvados Future Sprints to 2016-04-27 sprint
Actions #6

Updated by Peter Amstutz over 8 years ago

  • Status changed from New to In Progress
Actions #7

Updated by Peter Amstutz over 8 years ago

8931-event-thread-catch-exceptions is ready for review

Actions #8

Updated by Radhika Chippada over 8 years ago

Reviewing 89e091b3

  • Should we also add an else statement in the tries_left for loop at line 175, similar to line 150? What happens if an exception other than ApiError is caught?
  • max_wait: Maximum time to wait between retries — should we say “Maximum number of seconds to wait …” instead?
Actions #9

Updated by Peter Amstutz over 8 years ago

8931-event-thread-catch-exceptions back to you @ 75c049d

Includes possible fix for #9051 by having it create a new _EventClient on each reconnect retry.

Actions #10

Updated by Peter Amstutz over 8 years ago

  • Status changed from In Progress to Resolved
  • % Done changed from 0 to 100

Applied in changeset arvados|commit:47a79960c81ea689445f2040b24cb76729afab06.

Actions

Also available in: Atom PDF