Idea #21936
Updated by Peter Amstutz 10 months ago
* Manifest format extended to support a link to an external resource as a block "hint", also a hint with the offset on the external resource "hint" ** The block "size" needs to be the content size, not the size of the string that was hashed to get the md5 (see below) * Keepstore gets an API which takes an external resource URL (s3://) and verifies that the object is accessible, fetches metadata, generates the md5, and returns a manifest stream fragment ** The block identifier is a md5sum based on the locator, version, etag, offset and length ** For versioned buckets, it should include the version in the locator ** For non-versioned buckets, the metadata should * Python SDK method which takes external resource URL, calls keepstore to get a manifest stream fragment * Keepstore supports fetching blocks that have an external resource hint * Python and Go SDK handle blocks with external resource hints, where the MD5 corresponds to a hash of the locator hint and not the content itself ** Cache management might need some attention * arvados-cwl-runner supports s3 object inputs by using this API to create collection with links to external resources * Keep-balance ignores blocks with external links Assumptions: * Keepstore and compute nodes have permission to read s3 buckets where resources are located via IAM instance roles Possibly required, TBD: * Store credentials associated with S3 buckets in Arvados config.yml, which are used by keepstore when IAM instance roles are not available.