Project

General

Profile

Task #4837

Updated by Tom Clegg about 10 years ago

A good Defining a data structure for representing to hold the content contents of a collection in memory manifest in-memory will facilitate cleaner, more efficient code allow us to write a better API for manipulating that content. collection contents. 

 Conceptually, a A collection contains of data is organized into files and collections (analogous to subdirectories). Currently there are two distinct types subcollections (which may be thought of collection: 
 # Collections that correspond to records in the Arvados database (have uuid, portable data hash, manifest_text) 
 # Subdirectories within collections (encoded as stream names and filenames with slashes) "directories".) 

 A collection has a name and an array a "contents" field, which is a list of items.    Each item is either a file or a collection. 

 Each file in a collection has a name name, and an array of data segments. 

 Each data segment is either 
 # defined as a list of block specifiers, each of which consists of a Keep locator and a list of byte ranges, or 
 # a data buffer (useful when some data is not written to Keep yet). 

 ranges.    The file's contents is defined as are produced by concatenating the concatenation of the specified byte ranges from all data segments, each Keep block in the order given. 

 Example: order.    An example of a file's structure: 
 <pre> 
 File{ { 
   name: "foo.txt", 
   data_segments: blocks: [ 
     { 
       locator: "6a4ff0499484c6c79c95cd8c566bd25f+249025", 
       byte_ranges: [ [0, 75], [250, 300], [150, 200] ] 
     }, 
     { 
       locator: "ea10d51bcf88862dbcc36eb292017dfd+45", 
       byte_ranges: [ [30, 40] ] 
     } 
   ] 
 } 
 </pre> 

 (Examples here are 
 (This example is written in JSON-ish JSON notation for convenience. This convenience of rendering; it is not intended necessary for the files to be used stored internally as a data interchange format.) JSON or serialized as JSON strings.) 

 This represents a file named @foo.txt@ and consisting of bytes 0-75, 250-300, and 150-200 from block @6a4ff0499484c6c79c95cd8c566bd25f+249025@, followed by bytes 30-40 from @ea10d51bcf88862dbcc36eb292017dfd+45@. 

 An example for an entire collection: 
 <pre> 
 Collection{ { 
   name: "dir", 
   items: contents: [ 
     File{ { 
       type: "file", 
       name: "foo.txt", 
       data_segments: blocks: [ 
         { 
           locator: "ea10d51bcf88862dbcc36eb292017dfd+45", 
           byte_ranges: [ [0, 45] ] 
         } 
       ] 
     }, 
     Collection{ { 
       type: "collection", 
       name: "subdir", 
       items: contents: [ 
         { 
           type: "file", 
           name: "bar.txt", 
           data_segments: blocks: [ 
             { 
               locator: "cdd549ae79fe6640fa3d5c6261d8303c+195", 
               byte_ranges: [ [0, 195] ], 
             } 
           ] 
         } 
       ] 
     } 
   ] 
 } 
 </pre> 

Back