Bug #8931
closed
[SDK] websocket event thread crash
Added by Peter Amstutz over 8 years ago.
Updated over 8 years ago.
Description
EventClient and PollClient in the Python SDK should be catching exceptions in a couple of places that it currently does not, which results in the event thread crashing and thing that depend on it getting wedged:
- The callback to user code, to avoid crashing the event handler thread
- When this happens, call
thread.interrupt_main()
to raise an exception in the main thread, and force it to deal with the error.
- When PollClient makes calls to the API server
- Use the same style of RetryLoop that one_task_per_input_file uses. It can retry infinitely, but take care that the backoff numbers make sense.
- Test that PollClient retries API failures.
- Description updated (diff)
- Estimated time set to 0.50 h
- Target version set to Arvados Future Sprints
- Description updated (diff)
- Category set to SDKs
- Story points set to 0.5
- Subject changed from [SDK] event thread crash to [SDK] websocket event thread crash
- Assigned To set to Peter Amstutz
- Target version changed from Arvados Future Sprints to 2016-04-27 sprint
- Status changed from New to In Progress
8931-event-thread-catch-exceptions is ready for review
Reviewing 89e091b3
- Should we also add an else statement in the tries_left for loop at line 175, similar to line 150? What happens if an exception other than ApiError is caught?
- max_wait: Maximum time to wait between retries — should we say “Maximum number of seconds to wait …” instead?
8931-event-thread-catch-exceptions back to you @ 75c049d
Includes possible fix for #9051 by having it create a new _EventClient on each reconnect retry.
- Status changed from In Progress to Resolved
- % Done changed from 0 to 100
Applied in changeset arvados|commit:47a79960c81ea689445f2040b24cb76729afab06.
Also available in: Atom
PDF