Project

General

Profile

Actions

Bug #3846

closed

[SDK] Improve timeout handling in Python KeepClient

Added by Brett Smith over 9 years ago. Updated over 9 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
SDKs
Target version:
-
Story points:
-

Description

The Python KeepClient is impatient. If you're dealing with full 64MiB blocks, you're likely to run into timeout exceptions, which KeepClient.KeepService does not catch. This is usually fatal to arv-get and arv-put pointed at a proxy (where it has to wait until all replicas are done before it hears anything back), and results in a bunch of tracebacks when arv-put is pointed directly at disks. Here's a sample traceback from arv-get:

Traceback (most recent call last):
  File "/usr/local/bin/arv-get", line 202, in <module>
    for data in f.readall():
  File "/usr/local/lib/python2.7/dist-packages/arvados/stream.py", line 157, in readall
    data = self.read(size)
  File "/usr/local/lib/python2.7/dist-packages/arvados/stream.py", line 140, in read
    data = self._stream.readfrom(locator+segmentoffset, segmentsize)
  File "/usr/local/lib/python2.7/dist-packages/arvados/stream.py", line 266, in readfrom
    data += self._keep.get(locator)[segmentoffset:segmentoffset+segmentsize]
  File "/usr/local/lib/python2.7/dist-packages/arvados/keep.py", line 604, in get
    blob = keep_service.get(http, locator)
  File "/usr/local/lib/python2.7/dist-packages/arvados/keep.py", line 237, in get
    headers=self.get_headers)
  File "/usr/local/lib/python2.7/dist-packages/httplib2/__init__.py", line 1593, in request
    (response, content) = self._request(conn, authority, uri, request_uri, method, body, headers, redirections, cachekey)
  File "/usr/local/lib/python2.7/dist-packages/httplib2/__init__.py", line 1335, in _request
    (response, content) = self._conn_request(conn, request_uri, method, body, headers)
  File "/usr/local/lib/python2.7/dist-packages/httplib2/__init__.py", line 1257, in _conn_request
    conn.connect()
  File "/usr/local/lib/python2.7/dist-packages/httplib2/__init__.py", line 913, in connect
    raise socket.error, msg
socket.timeout: timed out

There are a few improvements to be made here:

  • KeepClient should use an appropriate timeout for each Keep request. The worst case is PUTting 64MiB to a proxy, where we have to wait for all the replicas to be written.
  • KeepClient should catch socket.timeout exceptions and handle them just like other transient exceptions (i.e., add it to KeepClient.KeepService.HTTP_ERRORS).
Actions #1

Updated by Brett Smith over 9 years ago

  • Status changed from New to In Progress
  • Assigned To set to Brett Smith
Actions #2

Updated by Brett Smith over 9 years ago

  • Status changed from In Progress to Resolved
  • % Done changed from 0 to 100

Applied in changeset arvados|commit:4ef537243058616754efde56438a193626556bca.

Actions

Also available in: Atom PDF