Project

General

Profile

Actions

Feature #13062

open

[SDK] Reduce collection class memory footprint

Added by Peter Amstutz about 6 years ago. Updated 2 months ago.

Status:
New
Priority:
Normal
Assigned To:
-
Category:
-
Target version:
Story points:
-
Release:
Release relationship:
Auto

Description

Reduce collection class memory footprint in order to reduce the footprint of arv-mount and arvados-cwl-runner in order to run on smaller, cheaper nodes.

General approach: instead of parsing the manifest once and creating Python objects for every directory and file, reparse and create python objects on demand.

Possibly strategy:

  • Initial manifest parsing creates an index that maps each directory path to one or more manifest streams (by offset or by using memoryview) which describe the contents of that directory.
  • When the contents of a Collection or Subcollection is needed, look up the stream(s) associated with the Directory from the index and parse them.
  • Consider doing something similar at individual file level, only load "segments" on demand (may come at cost of higher overhead if it turns out the client is going to visit most of the files in a given directory anyway).
  • Make it possible for a caching strategy to evict loaded collection contents / file segments.

Challenges:

  • Can't cache evict anything that's been returned to the (Python SDK) user unless we can determine it isn't being held (maybe requires reference counting scheme).
Actions

Also available in: Atom PDF