Project

General

Profile

Actions

Objects as pseudo-blocks in Keep » History » Revision 2

« Previous | Revision 2/7 (diff) | Next »
Peter Amstutz, 05/28/2024 08:45 PM


Objects as pseudo-blocks in Keep

Idea for accessing external objects via Keep (specifically S3)

The way we've bounced around for a while has been to take an object, split it into 64 MiB blocks, and record, each block hash in a database along with a reference to the object and offset.

Here is a different approach to this idea. (Tom floated a version of this at one of our engineering meetings but I don't think we fully explored it at the time).

For an s3 object of 1234 bytes long located at s3://bucket/key

ffffffffffffffffffffffffffffffff+512+B(base64 encode of s3://bucket/key)+C256

The ffff... indicates it is a special block (we could also use 0000... or 0f0f0f... etc). Another idea would be to use a hash of the size, +B and +C hints. Alternately S3 also offers checksums of files, so we could use the MD5 of the full object.

  • It is 512 bytes long.
  • The hint +B means data should be fetched from the s3:// URL which is base64 encoded (this is necessary to match our locator syntax).
  • The hint +C means read from offset 256 bytes.

Large files can be split, e.g.

ffffffffffffffffffffffffffffffff+67108864+B(base64 encode of s3://bucket/key)+C0 ffffffffffffffffffffffffffffffff+67108864+B(base64 encode of s3://bucket/key)+C67108864 ffffffffffffffffffffffffffffffff+67108864+B(base64 encode of s3://bucket/key)+C134217728

However this repeats the the +B portion a bunch of times, so we could allow the manifest to describe oversized blocks:

ffffffffffffffffffffffffffffffff+1000000000+B(base64 encode of s3://bucket/key)+C0

Implementation-wise, this would be split into 64 MiB chunks at runtime when the manifest is loaded. The block cache would need to use the full locator (with +B and +C).

Add support for locators of this type to Keepstore, which already has code to interact with S3 buckets. This avoids adding such code to the client.

Keepstore would need to be able to read the buckets. This could be done either with a blanket policy (allow keepstore/compute nodes to read specific buckets) and/or by adding a feature to store AWS credentials in Arvados in a way such that Keepstore, having the user's API token, is able to fetch them and use them (such as on the API token record).

For S3 specifically, if we include ?versionId= on all URLs, the blocks can be assumed to be immutable.

Advantages

  • This strategy is a lot like how we approach federation.
  • If locators of this type are supported by Keepstore, then Go and Python SDKs require relatively few changes (they continue to blocks from Keepstore).
  • Does not require downloading and indexing files

Disadvantages

  • Can't verify file contents.

Updated by Peter Amstutz 3 months ago · 2 revisions