Project

General

Profile

Idea #15960

Updated by Peter Amstutz over 1 year ago

Right now, the feature of automatic HTTP download in @cwl-runner@ is effectively fulfilling this function for users (although it copies it into the local keepstore).    Users would probably like it if it were expanded to also support    copying s3:// URLs. 

 However, the big idea for this epic is on-demand retrieval from external storage -- we fetch the data from the external system on demand. 

 This involves: 

 # Going through each file of an external file system or bucket and hashing 64 MB byte ranges to get a block hash 
 # Keeping a database that maps the block hash to a URL and byte range 
 # Constructing a collection using these blocks with file/directory structure matching the external file system 
 # Creating a special keepstore volume type that uses the block hash database to fetch block contents from the external source on demand as if it were a normal keep service 

Back