Project

General

Profile

Objects as pseudo-blocks in Keep » History » Revision 3

Revision 2 (Peter Amstutz, 05/28/2024 08:45 PM) → Revision 3/7 (Peter Amstutz, 05/28/2024 10:18 PM)

h1. Objects as pseudo-blocks in Keep 

 Idea for accessing external objects via Keep (specifically S3) 

 The thought way we've bounced around for a while has been to read the contents of take an object, split it into 64 MiB blocks, and record record, each block hash in a database along with a reference to the object and offset. 

 Here is a different approach to this idea.    (Tom floated a version of this at one of our engineering meetings but I don't think we didn't fully explore explored it at the time). 

 h3. Block id 

 For an s3 object of 1234 bytes long located at s3://bucket/key 

 ffffffffffffffffffffffffffffffff+512+B(base64 encode of s3://bucket/key)+C256 

 By my research, some values such as ETag can The ffff... indicates it is a special block (we could also use 0000... or 0f0f0f... etc).    Another idea would be MD5 in certain circumstances but this isn't true in general.    So for these pseudo-blocks, I propose deriving to use a hash of the hash from (size, @+B@, @+C@) size, @+B@ and @+C@ hints.   

 For    Alternately S3 specifically, if the bucket supports versioning and also offers checksums of files, so we could use @?versionId=@ on all URLs, blocks can be treated as immutable. the MD5 of the full object. 

 In this example: 

 * It is 512 bytes long. 
 * The hint @+B@ means data should be fetched from a the s3:// URL.    In this case it URL which is base64 encoded (this is necessary to match our locator syntax). 
 * The hint @+C@ means read from offset 256 bytes. 

 So this describes the range of bytes from 256 to 768. 

 h3. Block stream 

 Large files can be split, e.g. 

 ffffffffffffffffffffffffffffffff+67108864+B(base64 encode of s3://bucket/key)+C0 ffffffffffffffffffffffffffffffff+67108864+B(base64 encode of s3://bucket/key)+C67108864 ffffffffffffffffffffffffffffffff+67108864+B(base64 encode of s3://bucket/key)+C134217728 

 However this repeats the the +B portion a bunch of times, so we could allow the manifest to describe oversized blocks: 

 ffffffffffffffffffffffffffffffff+1000000000+B(base64 encode of s3://bucket/key)+C0 

 Implementation-wise, this would be split into the previous example of 64 MiB chunks at runtime when the manifest is loaded (and re-compressed when the manifest is saved). loaded.    The block cache would need to use the full locator (with +B and +C) or have some other means of distinguishing regular keep blocks from these external reference pseudo-blocks. +C). 

 h3. Keepstore support 

 Add support for locators of this type to Keepstore.    Keepstore Keepstore, which already needs has code to be able to interact with S3 buckets.   

    This avoids adding such code to the client. 

 Keepstore would need to be able to read the buckets.    This could be done either with a blanket policy (allow keepstore/compute nodes to read specific buckets) and/or by adding a feature to store AWS credentials in Arvados    in a way such that Keepstore, having the user's API token, is able to fetch them and use them (such as on the API token record). 

 This interacts awkwardly with Arvados sharing; sharing a collection doesn't mean you For S3 specifically, if we include @?versionId=@ on all URLs, the blocks can actually read it, without additional features. 

 h3. SDK support 

 This approach limits the amount of S3-specific code directly in the client -- the goal should be assumed to avoid having to import boto3. be immutable.   

 The Collection class gets a new "import_from_s3()" method (or maybe an overload of the "copy" method) which takes the s3:// URL.    This contacts the Keepstore server, provides the, s3 URL and gets back the appropriately formatted block locator.    Keepstore should check that the object exists and the user can access it, and get the current versionId. 

 h3. Advantages 

 * This strategy is similar to a lot like how we approach federation, which reduces the number of dramatic changes in the architecture federation. 
 * If locators of this type are supported by Keepstore, then Go and Python SDKs require relatively few changes (they continue to blocks from Keepstore). 
 * Does not require downloading and indexing files 
 * Can still get a unique PDH for the collection 
 * Can mix S3 objects and regular Keep objects, Arvados now becomes generally useful for organizing data in buckets (although changes in Keep don't propagate down to the bucket, but moving data once it has been written is crappy in S3 anyway so you don't do it). 

 h3. Disadvantages 

 * Can't verify file contents. 
 * Requires working with AWS access control, whether by granting blanket read access ahead of time to certain specific buckets, storing credentials, or some other mechanism we haven't designed yet 
 * Sharing a collection with another person requires granting permission in both Arvados and AWS. 
 * The fact that a given manifest contains references to S3 objects is opaque to the user and could produce confusing errors 
 * Given a s3:// id to an object, can't efficiently find what collections use it (but this is a feature currently missing from Keep in general, keep-balance could do something here if needed, or we implement a block index in the future)