Project

General

Profile

Actions

Task #4837

closed

Feature #4823: [SDKs] Good Collection API for Python SDK

[SDKs] Define API and in-memory data structure for collections in Python SDK

Added by Tim Pierce over 9 years ago. Updated almost 4 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Target version:
-

Description

A good data structure for representing the content of a collection in memory will facilitate cleaner, more efficient code for manipulating that content.

Conceptually, a collection contains files and collections (analogous to subdirectories). Currently there are two distinct types of collection:
  1. Collections that correspond to records in the Arvados database (have uuid, portable data hash, manifest_text)
  2. Subdirectories within collections (encoded as stream names and filenames with slashes)

A collection has a name and an array of items. Each item is either a file or a collection.

Each file has a name and an array of data segments.

Each data segment is either
  1. a Keep locator and a list of byte ranges, or
  2. a data buffer (useful when some data is not written to Keep yet).

The file's contents is defined as the concatenation of the specified byte ranges from all data segments, in the order given.

Example:

File{
  name: "foo.txt",
  data_segments: [
    {
      locator: "6a4ff0499484c6c79c95cd8c566bd25f+249025",
      byte_ranges: [ [0, 75], [250, 300], [150, 200] ]
    },
    {
      locator: "ea10d51bcf88862dbcc36eb292017dfd+45",
      byte_ranges: [ [30, 40] ]
    }
  ]
}

(Examples here are written in JSON-ish notation for convenience. This is not intended to be used as a data interchange format.)

This represents a file named foo.txt and consisting of bytes 0-75, 250-300, and 150-200 from block 6a4ff0499484c6c79c95cd8c566bd25f+249025, followed by bytes 30-40 from ea10d51bcf88862dbcc36eb292017dfd+45.

An example for an entire collection:

Collection{
  name: "dir",
  items: [
    File{
      name: "foo.txt",
      data_segments: [
        {
          locator: "ea10d51bcf88862dbcc36eb292017dfd+45",
          byte_ranges: [ [0, 45] ]
        }
      ]
    },
    Collection{
      name: "subdir",
      items: [
        {
          type: "file",
          name: "bar.txt",
          data_segments: [
            {
              locator: "cdd549ae79fe6640fa3d5c6261d8303c+195",
              byte_ranges: [ [0, 195] ],
            }
          ]
        }
      ]
    }
  ]
}

Actions

Also available in: Atom PDF