Actions
Bug #9051
closed[SDKs] EventClient fails to reconnect after HandshakeError from last connection
Status:
Resolved
Priority:
Normal
Assigned To:
-
Category:
-
Target version:
-
Story points:
-
Files
Actions
#1
Updated by Jiayong Li over 8 years ago
- File snap_gatk_HG01953.log_01 snap_gatk_HG01953.log_01 added
- File snap_gatk_HG01953.log_02 snap_gatk_HG01953.log_02 added
Running snap_gatk on HG01953 exome with arvados-cwl-runner, with websocket enabled (using arvados branch 8931-event-thread-catch-exceptions).
log_1 shows
2016-04-21 01:51:21 arvados.events[40865] WARNING: Unexpected close. Reconnecting. 2016-04-21 01:51:22 arvados.events[40865] WARNING: Error 'Invalid response status: 502 Bad Gateway' during websocket reconnect. Will retry after 5s.
followed by traceback
Traceback (most recent call last): File "/home/jiayong/miniconda2/lib/python2.7/site-packages/arvados_python_client-0.1.20160412180035-py2.7.egg/arvados/events.py", line 119, in on_closed self.ec.connect() File "build/bdist.linux-x86_64/egg/ws4py/client/__init__.py", line 231, in connect self.process_response_line(response_line) File "build/bdist.linux-x86_64/egg/ws4py/client/__init__.py", line 284, in process_response_line raise HandshakeError("Invalid response status: %s %s" % (code, status)) HandshakeError: Invalid response status: 502 Bad Gateway 2016-04-21 01:51:27 arvados.events[40865] WARNING: Error ''NoneType' object has no attribute 'getsockopt'' during websocket reconnect. Will retry after 5s. Traceback (most recent call last): File "/home/jiayong/miniconda2/lib/python2.7/site-packages/arvados_python_client-0.1.20160412180035-py2.7.egg/arvados/events.py", line 119, in on_closed self.ec.connect() File "build/bdist.linux-x86_64/egg/ws4py/client/__init__.py", line 207, in connect self.sock = ssl.wrap_socket(self.sock, **self.ssl_options) File "/home/jiayong/miniconda2/lib/python2.7/ssl.py", line 911, in wrap_socket ciphers=ciphers) File "/home/jiayong/miniconda2/lib/python2.7/ssl.py", line 535, in __init__ if sock.getsockopt(SOL_SOCKET, SO_TYPE) != SOCK_STREAM: AttributeError: 'NoneType' object has no attribute 'getsockopt'
log_02 shows a deadlock.
2016-04-21 05:42:29 arvados.cwl-runner[7143] ERROR: Workflow is deadlocked, no runnable jobs and not waiting on any pending jobs. 2016-04-21 05:42:29 arvados.cwl-runner[7143] ERROR: Caught unhandled exception, marking pipeline as failed. Error was: <class 'cwltool.errors.WorkflowException'>
Updated by Brett Smith over 8 years ago
- Subject changed from [Websocket] Connection closed during long running jobs, and reconnecting error to [SDKs] EventClient fails to reconnect after HandshakeError from last connection
- Status changed from New to Feedback
Updated by Jiayong Li over 8 years ago
That's fantastic. I'll keep this ticket in mind the next time running pipelines.
Updated by Tom Morris about 8 years ago
- Status changed from Feedback to Resolved
Actions