Bug #10445
open
Fix memory leak in Python SDK Collection class
Added by Tom Clegg about 8 years ago.
Updated 11 months ago.
Description
Currently, each new CollectionReader creates its own API client, and Keep client, and block cache unless the caller supplies an API client object (and Keep client object?). If a caller creates 10 CollectionReaders and reads 64 MiB from each one, the program will use 640 MiB. Possibly due to HTTP KeepAlive behavior, Python does not reclaim memory even if the caller unreferences the CollectionReaders. For example, this script leaks memory and network connections:
import arvados
uuid = '......'
for i in range(20):
cr = arvados.collection.CollectionReader(uuid)
for fn in cr:
f = cr.open(fn)
f.read()
f.close()
Proposed improvement:
Share block caches between auto-instantiated API clients that use the same settings.
This also applied to KeepClient and BlockManager
Files
Related issues
1 (1 open — 0 closed)
The read cache is actually in KeepClient. Reads go through the block manager, but sharing the read cache is a matter of using a shared keepclient object.
(Technically, you could even create multiple KeepClient objects and initialize them with the same KeepBlockCache).
- Description updated (diff)
- Target version set to Arvados Future Sprints
- Target version deleted (
Arvados Future Sprints)
- Target version set to 2022-11-23 sprint
- Target version changed from 2022-11-23 sprint to 2022-12-07 Sprint
- Description updated (diff)
- Target version changed from 2022-12-07 Sprint to 2022-12-21 Sprint
- Subject changed from [SDKs] Fix memory leak in Python SDK Collection class to Fix memory leak in Python SDK Collection class
- Story points changed from 1.0 to 2.0
- Target version changed from 2022-12-21 Sprint to 2023-01-18 sprint
- Target version changed from 2023-01-18 sprint to 2023-02-01 sprint
- Release set to 59
- Target version deleted (
2023-02-01 sprint)
- Target version set to To be scheduled
- Target version changed from To be scheduled to Future
Also available in: Atom
PDF