Project

General

Profile

Idea #15960

Updated by Peter Amstutz 6 months ago

Right now, the feature of automatic HTTP download in @cwl-runner@ is effectively fulfilling this function for users (although it copies it into the local keepstore).    Users would probably like it if it were expanded to also support    copying s3:// URLs. 

 However, the big idea for this epic is on-demand retrieval from external storage -- we fetch the data from the external system on demand. 

 Previous designs involved reading all This involves: 

 # Going through each file of an external file system or bucket and hashing 64 MB byte ranges to get a block hash 
 # Keeping a database that maps the data block hash to generate content hashes. 

 The current design is outlined in https://dev.arvados.org/issues/21936 a URL and involves storing locators to byte range 
 # Constructing a collection using these blocks with file/directory structure matching the external data in file system 
 # Creating a special keepstore volume type that uses the manifest.    The block identifiers are based hash database to fetch block contents from the external source on hashing the locator (and other metadata) instead of the content.    demand as if it were a normal keep service 

Back