Feature #13062

[SDK] Reduce collection class memory footprint

Added by Peter Amstutz 10 months ago. Updated 10 months ago.

Status:
New
Priority:
Normal
Assigned To:
-
Category:
-
Target version:
Start date:
Due date:
% Done:

0%

Estimated time:
Story points:
-

Description

Reduce collection class memory footprint in order to reduce the footprint of arv-mount and arvados-cwl-runner in order to run on smaller, cheaper nodes.

General approach: instead of parsing the manifest once and creating Python objects for every directory and file, reparse and create python objects on demand.

Possibly strategy:

  • Initial manifest parsing creates an index that maps each directory path to one or more manifest streams (by offset or by using memoryview) which describe the contents of that directory.
  • When the contents of a Collection or Subcollection is needed, look up the stream(s) associated with the Directory from the index and parse them.
  • Consider doing something similar at individual file level, only load "segments" on demand (may come at cost of higher overhead if it turns out the client is going to visit most of the files in a given directory anyway).
  • Make it possible for a caching strategy to evict loaded collection contents / file segments.

Challenges:

  • Can't cache evict anything that's been returned to the (Python SDK) user unless we can determine it isn't being held (maybe requires reference counting scheme).

History

#1 Updated by Peter Amstutz 10 months ago

  • Status changed from New to In Progress

#2 Updated by Peter Amstutz 10 months ago

  • Description updated (diff)
  • Status changed from In Progress to New

Also available in: Atom PDF