Project

General

Profile

Keep S3 gateway » History » Revision 2

Revision 1 (Tom Clegg, 03/23/2015 08:55 PM) → Revision 2/4 (Tom Clegg, 03/25/2015 07:20 PM)

h1. Keep S3 gateway 

 See [[Keep service hints]] for more background. 

 h2. High level design 

 Each remote storage service (e.g, S3 bucket) in use at a given Arvados installation is supported by one keep server process, running with a flag like @-volume=s3:/mapping-store-path:/s3-credentials-path@ @-volume=s3:/mappings:bucketname:s3credentials@ instead of @-volumes=/tmp/1,/tmp/2@. 

 Operations to support: 

 * Given an S3 bucket and optional prefix/object, read the data from S3, update the {locator, S3 segment} map, and return signed block locators to the client. 
 * Given an S3 bucket and optional prefix/object, create a collection that references the S3 data and return the collection UUID. (This is more suitable for larger datasets because the data transfer can be done asynchronously after the collection UUID has been returned to the client.) 
 * Given a locator, read the data from S3 and return it to the client. 

 h2. Specifics 

 h3. API for writing 

 @POST /manifest_text@ - read objects from S3 and add/update map entries. Respond with a manifest that references the indexed data. 
 * If the request body is of the form @{"S3Path":"s3://abucket/aprefix/anobject"}@ -- read segments (up to 64MiB each) from the specified object and construct a manifest with a single file. 
 * If the request body is of the form @{"S3Path":"s3://abucket/aprefix/"}@ or @{"S3Path":"s3://abucket/"}@ -- read all objects (with the given prefix, if any) from the bucket and construct a manifest with one file per object read. 
 * It is easy to make a request that takes a long time and generates lots of network traffic. As a minimum, the worker must exit if the client closes the connection. 

 @POST /collection@ - read objects from S3, add/update map entries, and add the objects as files in a new collection. Respond with the UUID of the new collection. 
 * Request body is a JSON-encoded hash with an @"S3Path"@ key, as with @POST /manifest_text@ 
 * If request body hash has a @"collection"@ key, its value must be a hash, and it will be passed to arvados.v1.collections.create when creating the new collection. (This sets the parent project, name, and description of the new collection, for example.) This @"collection"@ hash must not contain a @"manifest_text"@ key. 

 Example request: 
 <pre> 
 POST /collection HTTP/1.1 
 Host: zzzzz.arvadosapi.com 
 Content-type: application/json 

 { 
  "S3Path":"s3://abucket/aprefix/anobject", 
  "collection":{ 
   "owner_uuid":"zzzzz-j7d0g-12345abcde12345", 
   "name":"Something stored in S3" 
  } 
 } 
 </pre> 

 Response: 
 <pre> 
 HTTP/1.1 200 OK 
 Content-type: application/json 

 { 
  "uuid":"zzzzz-4zz18-abcde12345abcde" 
 } 
 </pre> 

 @DELETE /collection/{uuid}@ - stop working on a job that was started with @POST /collection@. 
 * Returns 404 if a collection with the specified uuid does not exist (according to the API server, when using the token provided in the @DELETE@ request). 
 * Returns 200 if a collection with the specified uuid does exist and this server is not doing any work on it (regardless of whether work was being done before the @DELETE@ request). 
 * Returns 5xx if an error prevented the server from asserting either of the above cases (e.g., work could not be cancelled, or there was an error looking up the collection). 

 h3. API for reading 

 @GET /{locator}@ - Look up the given locator in the map. Retrieve the data segment from S3, and return it to the client. Return an error if the hash of the retrieved data does not match the locator. 
 * Verify the signature portion of the locator (@+Ahash@timestamp@) before doing anything else. 

 h3. Implementation 

 Likely, some parts of keepproxy and keepstore should be refactored to share code more effectively. 
 * keepstore logs & answers client queries, verifies hashes, answers index/status queries, reads/writes data blocks on disk, enforces per-disk mutexes. 
 * keepproxy logs & answers client queries, verifies hashes, connects to other keep services. 
 * keepgw logs & answers client queries, verifies hashes, answers index/status queries, reads/writes a local {hash, remote object} index, connects to remote services. 

 Possibilities: 
 * Refactor the keepstore command to consist of just the "unix volume" code; move everything else into packages like keep_server and hash_checking_reader. Create a new keepgw-s3 command. 
 * Extend the keepstore command to use backing-store modules like -volume=unix:/foo and -volume=s3:bucketid. 
 * Extend the keepproxy command to use backing-store modules like S3 as an alternative to keep disk services. 

 The {hash, remote object} mapping can be stored in the local filesystem. 
 * A given hash can map to more than one remote object. It's worth remembering all such remote objects: if one disappears or changes, a different one should be attempted next. Suggestion: For each hash, we have a text file with one line per remote data object matching the hash. 
 * When remote objects are bigger than 64 MiB, the mapping will actually be {hash, remote object segment}. This should be easy to manage if remote object references are always stored as @"offset:length:remote_object_path"@. 

 h2. Related changes 

 When using local filesystems as data stores, keepstore should accept @-volume=/tmp/foo -volume=/tmp/bar@ (in addition to @-volumes=/tmp/foo,/tmp/bar@ for backward compatibility). See https://golang.org/src/flag/example_test.go 

 h2. Open questions and risks 

 How does the gateway know its own UUID so it can write the appropriate +Kuuid locator hints when constructing a manifest? 

 How does a client know how much progress has been made on a @"POST /collection"@ request? The worker could update the collection object each time an object is written, or each time a locator (64 MiB segment) is indexed, and this behavior could be toggled during the initial API call. But how does a caller know whether the work is finished? 
 * The collection "expires_at" attribute could be set to some non-null value, to indicate that the collection is ephemeral, until it is complete. This would help avoid accidental use of partially-written collections. It would also provide automatic clean-up of partially written collections, but still permit "resume" (assuming "resume" starts before the @expires_at@ time arrives). 
 * How should the worker communicate the expected total collection size, number of blocks/files, or finish time? This could be written in the collection's @properties@ hash under a key chosen by convention. (In the pathological case where the client provides a conflicting key in @{"collection":{"properties":{...}}}@ then progress information would be unavailable.) 

 How does a client know when a @"POST /collection"@ request has been abandoned? 
 * The worker could delete the collection in case of error, but this would make "resume" impossible. 

 How should the server indicate to the client that progress is being made during a @"POST /manifest_text"@ request? 
 * Use HTTP chunked transfer encoding to return the manifest text one token at a time? (This could also help detect closed connections sooner.) 

 The @DELETE /collection/{uuid}@ API cancels a worker thread, but at face value looks like it will delete a collection. (In a sense it _can_ delete a collection by cancelling work and leaving the partial collection with a non-null @expires_at@ value, but if the job is finished, the effect is nothing at all like "delete collection".) Perhaps it should be renamed to something more like @DELETE /queue/{uuid}@? Perhaps it should have different responses for "cancelled as a result of this request" and "already cancelled or was never happening"? 

 Should there be an API for providing credentials in a POST request? The choice(s) of credentials to use for each data segments could be stored in the map: 
 * <pre> 
 { 
  "locator" : [{"S3Path":"bucket/prefix/object","credential_id":"local_credential_set_id"}, ...], 
  ... 
 } 
 </pre> 
 * @local_credential_set_id@ could be the hash (and filename of local cache) of a key pair. 

 Other than by having admin privileges, how can a client establish permission to use S3 credentials (which are already known by the gateway server) during a POST request? 

 h2. Future work 

 A "resume" API would be useful.