Project

General

Profile

Actions

Objects as pseudo-blocks in Keep » History » Revision 3

« Previous | Revision 3/7 (diff) | Next »
Peter Amstutz, 05/28/2024 10:18 PM


Objects as pseudo-blocks in Keep

Idea for accessing external objects via Keep (specifically S3)

The thought we've bounced around for a while has been to read the contents of an object, split it into 64 MiB blocks, and record each block hash in a database along with a reference to the object and offset.

Here is a different approach to this idea. (Tom floated a version of this at one of our engineering meetings but we didn't fully explore it at the time).

Block id

For an s3 object of 1234 bytes located at s3://bucket/key

ffffffffffffffffffffffffffffffff+512+B(base64 encode of s3://bucket/key)+C256

By my research, some values such as ETag can be MD5 in certain circumstances but this isn't true in general. So for these pseudo-blocks, I propose deriving the hash from (size, +B, +C) hints.

For S3 specifically, if the bucket supports versioning and we use ?versionId= on all URLs, blocks can be treated as immutable.

In this example:

  • It is 512 bytes long.
  • The hint +B means data should be fetched from a s3:// URL. In this case it is base64 encoded (this is necessary to match our locator syntax).
  • The hint +C means read from offset 256 bytes.

So this describes the range of bytes from 256 to 768.

Block stream

Large files can be split, e.g.

ffffffffffffffffffffffffffffffff+67108864+B(base64 encode of s3://bucket/key)+C0 ffffffffffffffffffffffffffffffff+67108864+B(base64 encode of s3://bucket/key)+C67108864 ffffffffffffffffffffffffffffffff+67108864+B(base64 encode of s3://bucket/key)+C134217728

However this repeats the the +B portion a bunch of times, so we could allow the manifest to describe oversized blocks:

ffffffffffffffffffffffffffffffff+1000000000+B(base64 encode of s3://bucket/key)+C0

Implementation-wise, this would be split into the previous example of 64 MiB chunks at runtime when the manifest is loaded (and re-compressed when the manifest is saved). The block cache would need to use the full locator (with +B and +C) or have some other means of distinguishing regular keep blocks from these external reference pseudo-blocks.

Keepstore support

Add support for locators of this type to Keepstore. Keepstore already needs to be able to interact with S3 buckets.

Keepstore would need to be able to read the buckets. This could be done either with a blanket policy (allow keepstore/compute nodes to read specific buckets) and/or by adding a feature to store AWS credentials in Arvados in a way such that Keepstore, having the user's API token, is able to fetch them and use them (such as on the API token record).

This interacts awkwardly with Arvados sharing; sharing a collection doesn't mean you can actually read it, without additional features.

SDK support

This approach limits the amount of S3-specific code directly in the client -- the goal should be to avoid having to import boto3.

The Collection class gets a new "import_from_s3()" method (or maybe an overload of the "copy" method) which takes the s3:// URL. This contacts the Keepstore server, provides the, s3 URL and gets back the appropriately formatted block locator. Keepstore should check that the object exists and the user can access it, and get the current versionId.

Advantages

  • This strategy is similar to how we approach federation, which reduces the number of dramatic changes in the architecture
  • If locators of this type are supported by Keepstore, then Go and Python SDKs require relatively few changes (they continue to blocks from Keepstore).
  • Does not require downloading and indexing files
  • Can still get a unique PDH for the collection
  • Can mix S3 objects and regular Keep objects, Arvados now becomes generally useful for organizing data in buckets (although changes in Keep don't propagate down to the bucket, but moving data once it has been written is crappy in S3 anyway so you don't do it).

Disadvantages

  • Can't verify file contents.
  • Requires working with AWS access control, whether by granting blanket read access ahead of time to certain specific buckets, storing credentials, or some other mechanism we haven't designed yet
  • Sharing a collection with another person requires granting permission in both Arvados and AWS.
  • The fact that a given manifest contains references to S3 objects is opaque to the user and could produce confusing errors
  • Given a s3:// id to an object, can't efficiently find what collections use it (but this is a feature currently missing from Keep in general, keep-balance could do something here if needed, or we implement a block index in the future)

Updated by Peter Amstutz 3 months ago · 3 revisions