Project

General

Profile

Bug #10445

Updated by Tom Clegg over 7 years ago

Currently, each new CollectionReader creates its own API client, and Keep client, and block cache unless the caller supplies an API client object _(and Keep client object?)_. _BlockManager. If a caller creates 10 CollectionReaders and reads 64 MiB from each one, the program will use 640 MiB. Possibly due to HTTP KeepAlive behavior, Python does not reclaim memory even if the caller unreferences the CollectionReaders. For example, this script leaks memory and network connections: 

 <pre><code class="python"> 
 import arvados 

 uuid = '......' 
 api = arvados.api('v1') 
 for i in range(20): 
     cr = arvados.collection.CollectionReader(uuid) arvados.collection.CollectionReader(uuid, api_client=api) 
     for fn in cr: 
         f = cr.open(fn) 
         f.read() 
         f.close() 
 </code></pre> 

 Proposed improvement: improvements: 

 1. Share block caches between auto-instantiated API clients that use CollectionReaders when possible. (Is this possible for CollectionWriters too?) 

 2. If a Collection class has its own private (non-shared) block cache, clear it when all files are closed. 

 3. Provide a "clear cache" or "close" method (on the same settings. Collection class?) so a caller can explicitly reclaim memory. (Generally, the caller is in a better position than the SDK to predict whether the cache is going to be useful or merely waste memory.) 

Back