Objects as pseudo-blocks in Keep » History » Version 2
Peter Amstutz, 05/28/2024 08:45 PM
1 | 1 | Peter Amstutz | h1. Objects as pseudo-blocks in Keep |
---|---|---|---|
2 | |||
3 | Idea for accessing external objects via Keep (specifically S3) |
||
4 | |||
5 | The way we've bounced around for a while has been to take an object, split it into 64 MiB blocks, and record, each block hash in a database along with a reference to the object and offset. |
||
6 | |||
7 | 2 | Peter Amstutz | Here is a different approach to this idea. (Tom floated a version of this at one of our engineering meetings but I don't think we fully explored it at the time). |
8 | 1 | Peter Amstutz | |
9 | For an s3 object of 1234 bytes long located at s3://bucket/key |
||
10 | |||
11 | ffffffffffffffffffffffffffffffff+512+B(base64 encode of s3://bucket/key)+C256 |
||
12 | |||
13 | 2 | Peter Amstutz | The ffff... indicates it is a special block (we could also use 0000... or 0f0f0f... etc). Another idea would be to use a hash of the size, @+B@ and @+C@ hints. Alternately S3 also offers checksums of files, so we could use the MD5 of the full object. |
14 | 1 | Peter Amstutz | |
15 | 2 | Peter Amstutz | * It is 512 bytes long. |
16 | * The hint @+B@ means data should be fetched from the s3:// URL which is base64 encoded (this is necessary to match our locator syntax). |
||
17 | * The hint @+C@ means read from offset 256 bytes. |
||
18 | 1 | Peter Amstutz | |
19 | Large files can be split, e.g. |
||
20 | |||
21 | ffffffffffffffffffffffffffffffff+67108864+B(base64 encode of s3://bucket/key)+C0 ffffffffffffffffffffffffffffffff+67108864+B(base64 encode of s3://bucket/key)+C67108864 ffffffffffffffffffffffffffffffff+67108864+B(base64 encode of s3://bucket/key)+C134217728 |
||
22 | |||
23 | However this repeats the the +B portion a bunch of times, so we could allow the manifest to describe oversized blocks: |
||
24 | |||
25 | ffffffffffffffffffffffffffffffff+1000000000+B(base64 encode of s3://bucket/key)+C0 |
||
26 | |||
27 | Implementation-wise, this would be split into 64 MiB chunks at runtime when the manifest is loaded. The block cache would need to use the full locator (with +B and +C). |
||
28 | |||
29 | 2 | Peter Amstutz | Add support for locators of this type to Keepstore, which already has code to interact with S3 buckets. This avoids adding such code to the client. |
30 | 1 | Peter Amstutz | |
31 | Keepstore would need to be able to read the buckets. This could be done either with a blanket policy (allow keepstore/compute nodes to read specific buckets) and/or by adding a feature to store AWS credentials in Arvados in a way such that Keepstore, having the user's API token, is able to fetch them and use them (such as on the API token record). |
||
32 | |||
33 | 2 | Peter Amstutz | For S3 specifically, if we include @?versionId=@ on all URLs, the blocks can be assumed to be immutable. |
34 | |||
35 | Advantages |
||
36 | |||
37 | * This strategy is a lot like how we approach federation. |
||
38 | * If locators of this type are supported by Keepstore, then Go and Python SDKs require relatively few changes (they continue to blocks from Keepstore). |
||
39 | * Does not require downloading and indexing files |
||
40 | |||
41 | Disadvantages |
||
42 | |||
43 | * Can't verify file contents. |