Project

General

Profile

Idea #21936

Updated by Peter Amstutz 26 days ago

* Manifest format extended to support a link to an external resource as a block "hint", also a hint with the offset on the external resource "hint" 
 ** The block "size" needs to be the content size, not the size of the string that was hashed to get the md5 (see below) 
 * Keepstore gets an API which takes an external resource URL (s3://) and verifies that the object is accessible, fetches metadata, generates the md5, and returns a manifest stream fragment 
 ** The block identifier is a md5sum based on the locator, version, etag, offset and length 
 ** For versioned buckets, it should include the version in the locator 
 ** For non-versioned buckets, the metadata should  
 * Python SDK method which takes external resource URL, calls keepstore to get a manifest stream fragment 
 * Keepstore supports fetching blocks that have an external resource hint 
 * Python and Go SDK handle blocks with external resource hints, where the MD5 corresponds to a hash of the locator hint and not the content itself 
 ** Cache management might need some attention 
 * arvados-cwl-runner supports s3 object inputs by using this API to create collection with links to external resources 
 * Keep-balance ignores blocks with external links 

 Assumptions: 
 * Keepstore and compute nodes have permission to read s3 buckets where resources are located via IAM instance roles 

 Possibly required, TBD: 
 * Store credentials associated with S3 buckets in Arvados config.yml, which are used by keepstore when IAM instance roles are not available. 

Back