Project

General

Profile

Objects as pseudo-blocks in Keep » History » Version 4

Peter Amstutz, 05/29/2024 01:38 PM

1 1 Peter Amstutz
h1. Objects as pseudo-blocks in Keep
2
3
Idea for accessing external objects via Keep (specifically S3)
4
5 3 Peter Amstutz
The thought we've bounced around for a while has been to read the contents of an object, split it into 64 MiB blocks, and record each block hash in a database along with a reference to the object and offset.
6 1 Peter Amstutz
7 4 Peter Amstutz
Here is a different approach to this idea.  (Tom floated a version of this at one of our engineering meetings but I think I didn't like it / we didn't fully explore it at the time).
8 1 Peter Amstutz
9 3 Peter Amstutz
h3. Block id
10 1 Peter Amstutz
11 3 Peter Amstutz
For an s3 object of 1234 bytes located at s3://bucket/key
12
13 1 Peter Amstutz
ffffffffffffffffffffffffffffffff+512+B(base64 encode of s3://bucket/key)+C256
14
15 3 Peter Amstutz
By my research, some values such as ETag can be MD5 in certain circumstances but this isn't true in general.  So for these pseudo-blocks, I propose deriving the hash from (size, @+B@, @+C@) hints.  
16 1 Peter Amstutz
17 3 Peter Amstutz
For S3 specifically, if the bucket supports versioning and we use @?versionId=@ on all URLs, blocks can be treated as immutable.
18
19
In this example:
20
21 1 Peter Amstutz
* It is 512 bytes long.
22 3 Peter Amstutz
* The hint @+B@ means data should be fetched from a s3:// URL.  In this case it is base64 encoded (this is necessary to match our locator syntax).
23 1 Peter Amstutz
* The hint @+C@ means read from offset 256 bytes.
24 2 Peter Amstutz
25 3 Peter Amstutz
So this describes the range of bytes from 256 to 768.
26
27
h3. Block stream
28
29 1 Peter Amstutz
Large files can be split, e.g.
30
31
ffffffffffffffffffffffffffffffff+67108864+B(base64 encode of s3://bucket/key)+C0 ffffffffffffffffffffffffffffffff+67108864+B(base64 encode of s3://bucket/key)+C67108864 ffffffffffffffffffffffffffffffff+67108864+B(base64 encode of s3://bucket/key)+C134217728
32
33
However this repeats the the +B portion a bunch of times, so we could allow the manifest to describe oversized blocks:
34
35
ffffffffffffffffffffffffffffffff+1000000000+B(base64 encode of s3://bucket/key)+C0
36
37 3 Peter Amstutz
Implementation-wise, this would be split into the previous example of 64 MiB chunks at runtime when the manifest is loaded (and re-compressed when the manifest is saved).  The block cache would need to use the full locator (with +B and +C) or have some other means of distinguishing regular keep blocks from these external reference pseudo-blocks.
38 1 Peter Amstutz
39 3 Peter Amstutz
h3. Keepstore support
40 1 Peter Amstutz
41 3 Peter Amstutz
Add support for locators of this type to Keepstore.  Keepstore already needs to be able to interact with S3 buckets.  
42
43 1 Peter Amstutz
Keepstore would need to be able to read the buckets.  This could be done either with a blanket policy (allow keepstore/compute nodes to read specific buckets) and/or by adding a feature to store AWS credentials in Arvados  in a way such that Keepstore, having the user's API token, is able to fetch them and use them (such as on the API token record).
44
45 3 Peter Amstutz
This interacts awkwardly with Arvados sharing; sharing a collection doesn't mean you can actually read it, without additional features.
46 2 Peter Amstutz
47 3 Peter Amstutz
h3. SDK support
48 1 Peter Amstutz
49 3 Peter Amstutz
This approach limits the amount of S3-specific code directly in the client -- the goal should be to avoid having to import boto3.  
50
51
The Collection class gets a new "import_from_s3()" method (or maybe an overload of the "copy" method) which takes the s3:// URL.  This contacts the Keepstore server, provides the, s3 URL and gets back the appropriately formatted block locator.  Keepstore should check that the object exists and the user can access it, and get the current versionId.
52
53
h3. Advantages
54
55
* This strategy is similar to how we approach federation, which reduces the number of dramatic changes in the architecture
56 1 Peter Amstutz
* If locators of this type are supported by Keepstore, then Go and Python SDKs require relatively few changes (they continue to blocks from Keepstore).
57 2 Peter Amstutz
* Does not require downloading and indexing files
58 3 Peter Amstutz
* Can still get a unique PDH for the collection
59
* Can mix S3 objects and regular Keep objects, Arvados now becomes generally useful for organizing data in buckets (although changes in Keep don't propagate down to the bucket, but moving data once it has been written is crappy in S3 anyway so you don't do it).
60 2 Peter Amstutz
61 3 Peter Amstutz
h3. Disadvantages
62 2 Peter Amstutz
63
* Can't verify file contents.
64 3 Peter Amstutz
* Requires working with AWS access control, whether by granting blanket read access ahead of time to certain specific buckets, storing credentials, or some other mechanism we haven't designed yet
65
* Sharing a collection with another person requires granting permission in both Arvados and AWS.
66
* The fact that a given manifest contains references to S3 objects is opaque to the user and could produce confusing errors
67
* Given a s3:// id to an object, can't efficiently find what collections use it (but this is a feature currently missing from Keep in general, keep-balance could do something here if needed, or we implement a block index in the future)