Project

General

Profile

Actions

Bug #5037

closed

[SDKs] Improve Python SDK thread safety and document pitfalls

Added by Brett Smith almost 10 years ago. Updated almost 10 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
SDKs
Target version:
Story points:
0.5

Description

One of our users wrote a Crunch script that, in part, does something like this:

import arvados
import os

def loc_size(coll_loc):
    # This function just needs to read the entire collection.
    # Details aren't important.
    cr = arvados.CollectionReader(coll_loc)
    return sum(len(data) for cf in cr.all_files() for data in cf.readall())

cr = arvados.CollectionReader(COLL_LOC1)
print list(cr.all_streams())

child_pid = os.fork()
if not child_pid:
    print "Child size:", loc_size(COLL_LOC2)
else:
    print "Parent size:", loc_size(COLL_LOC2)
    print os.waitpid(child_pid, 0)

This breaks horribly because all the CollectionReaders end up implicitly sharing a cached API object, and that object is not threadsafe. (It specifically manifests as SSL record errors like "wrong version number" and "decryption failed or bad record mac".)

We need to help users avoid these pitfalls. This could be a combination of SDK code changes and documentation noting the issues. We need more discussion to figure out what's worth doing.


Subtasks 4 (0 open4 closed)

Task #5117: Review 5037-python-sdk-thread-safeResolvedTom Clegg01/20/2015Actions
Task #5116: Remove connection pooling feature from arvados.api()ResolvedTom Clegg01/20/2015Actions
Task #5166: Remove cache=False from Python SDK clients (bump version requirements instead)ResolvedTom Clegg01/20/2015Actions
Task #5170: Review 5037-nonocacheResolvedTom Clegg01/20/2015Actions
Actions

Also available in: Atom PDF