Project

General

Profile

Actions

Feature #13062

open

[SDK] Reduce collection class memory footprint

Added by Peter Amstutz about 6 years ago. Updated about 2 months ago.

Status:
New
Priority:
Normal
Assigned To:
-
Category:
-
Target version:
Story points:
-
Release:
Release relationship:
Auto

Description

Reduce collection class memory footprint in order to reduce the footprint of arv-mount and arvados-cwl-runner in order to run on smaller, cheaper nodes.

General approach: instead of parsing the manifest once and creating Python objects for every directory and file, reparse and create python objects on demand.

Possibly strategy:

  • Initial manifest parsing creates an index that maps each directory path to one or more manifest streams (by offset or by using memoryview) which describe the contents of that directory.
  • When the contents of a Collection or Subcollection is needed, look up the stream(s) associated with the Directory from the index and parse them.
  • Consider doing something similar at individual file level, only load "segments" on demand (may come at cost of higher overhead if it turns out the client is going to visit most of the files in a given directory anyway).
  • Make it possible for a caching strategy to evict loaded collection contents / file segments.

Challenges:

  • Can't cache evict anything that's been returned to the (Python SDK) user unless we can determine it isn't being held (maybe requires reference counting scheme).
Actions #1

Updated by Peter Amstutz about 6 years ago

  • Status changed from New to In Progress
Actions #2

Updated by Peter Amstutz about 6 years ago

  • Description updated (diff)
  • Status changed from In Progress to New
Actions #3

Updated by Peter Amstutz almost 3 years ago

  • Target version deleted (To Be Groomed)
Actions #4

Updated by Peter Amstutz about 1 year ago

  • Release set to 60
Actions #5

Updated by Peter Amstutz about 2 months ago

  • Target version set to Future
Actions

Also available in: Atom PDF