Project

General

Profile

Keep S3 gateway » History » Version 3

Tom Clegg, 03/26/2015 04:45 AM

1 3 Tom Clegg
{{>toc}}
2
3 1 Tom Clegg
h1. Keep S3 gateway
4
5
See [[Keep service hints]] for more background.
6
7 3 Tom Clegg
h2. Overview
8 1 Tom Clegg
9 3 Tom Clegg
h3. Objective
10 1 Tom Clegg
11 3 Tom Clegg
The Keep S3 gateway is a Keep-compatible interface to Amazon S3. It allows programs like Workbench, arv-get, and arv-mount to read data that is stored in S3, without adding any S3-specific code.
12
13
Currently, it does not address _writing_ to S3. It is useful in situations where some data is already stored in S3 -- and should continue to be stored only in S3, rather than making a local copy -- and that data is to be used by Arvados programs: for example, running a Crunch job using publicly available S3-hosted datasets as input.
14
15
h3. High level design
16
17
Remotely stored data available from a given Arvados installation is supported by a gateway server process, similar to @keepstore@ but running with options like @-volume=s3:/mapping-store-path:/s3-credentials-path@ instead of @-volumes=/tmp/1,/tmp/2@.
18
19 2 Tom Clegg
Operations to support:
20
21
* Given an S3 bucket and optional prefix/object, read the data from S3, update the {locator, S3 segment} map, and return signed block locators to the client.
22
* Given an S3 bucket and optional prefix/object, create a collection that references the S3 data and return the collection UUID. (This is more suitable for larger datasets because the data transfer can be done asynchronously after the collection UUID has been returned to the client.)
23
* Given a locator, read the data from S3 and return it to the client.
24 1 Tom Clegg
25
h2. Specifics
26
27 3 Tom Clegg
h3. Detailed design
28 1 Tom Clegg
29 3 Tom Clegg
h4. API for writing
30
31 2 Tom Clegg
@POST /manifest_text@ - read objects from S3 and add/update map entries. Respond with a manifest that references the indexed data.
32
* If the request body is of the form @{"S3Path":"s3://abucket/aprefix/anobject"}@ -- read segments (up to 64MiB each) from the specified object and construct a manifest with a single file.
33
* If the request body is of the form @{"S3Path":"s3://abucket/aprefix/"}@ or @{"S3Path":"s3://abucket/"}@ -- read all objects (with the given prefix, if any) from the bucket and construct a manifest with one file per object read.
34
* It is easy to make a request that takes a long time and generates lots of network traffic. As a minimum, the worker must exit if the client closes the connection.
35
36
@POST /collection@ - read objects from S3, add/update map entries, and add the objects as files in a new collection. Respond with the UUID of the new collection.
37
* Request body is a JSON-encoded hash with an @"S3Path"@ key, as with @POST /manifest_text@
38
* If request body hash has a @"collection"@ key, its value must be a hash, and it will be passed to arvados.v1.collections.create when creating the new collection. (This sets the parent project, name, and description of the new collection, for example.) This @"collection"@ hash must not contain a @"manifest_text"@ key.
39
40
Example request:
41
<pre>
42
POST /collection HTTP/1.1
43
Host: zzzzz.arvadosapi.com
44
Content-type: application/json
45
46
{
47
 "S3Path":"s3://abucket/aprefix/anobject",
48
 "collection":{
49
  "owner_uuid":"zzzzz-j7d0g-12345abcde12345",
50
  "name":"Something stored in S3"
51
 }
52
}
53
</pre>
54
55
Response:
56
<pre>
57
HTTP/1.1 200 OK
58
Content-type: application/json
59
60
{
61
 "uuid":"zzzzz-4zz18-abcde12345abcde"
62
}
63
</pre>
64
65
@DELETE /collection/{uuid}@ - stop working on a job that was started with @POST /collection@.
66 1 Tom Clegg
* Returns 404 if a collection with the specified uuid does not exist (according to the API server, when using the token provided in the @DELETE@ request).
67 2 Tom Clegg
* Returns 200 if a collection with the specified uuid does exist and this server is not doing any work on it (regardless of whether work was being done before the @DELETE@ request).
68
* Returns 5xx if an error prevented the server from asserting either of the above cases (e.g., work could not be cancelled, or there was an error looking up the collection).
69
70 3 Tom Clegg
h4. API for reading
71 1 Tom Clegg
72
@GET /{locator}@ - Look up the given locator in the map. Retrieve the data segment from S3, and return it to the client. Return an error if the hash of the retrieved data does not match the locator.
73
* Verify the signature portion of the locator (@+Ahash@timestamp@) before doing anything else.
74
75 3 Tom Clegg
h4. Mapping locators to S3 objects
76 1 Tom Clegg
77 3 Tom Clegg
The {hash, remote object} mapping can be stored in the local filesystem.
78
* A given hash can map to more than one remote object. It's worth remembering all such remote objects: if one disappears or changes, a different one should be attempted next. Suggestion: For each hash, we have a text file with one line per remote data object matching the hash.
79
* When remote objects are bigger than 64 MiB, the mapping will actually be {hash, remote object segment}. This should be easy to manage if remote object references are always stored as @"offset:length:remote_object_path"@.
80
81
h3. Code Location
82
83
The source code for the server command will be in @/services/keepgw-s3@.
84
85 1 Tom Clegg
Likely, some parts of keepproxy and keepstore should be refactored to share code more effectively.
86
* keepstore logs & answers client queries, verifies hashes, answers index/status queries, reads/writes data blocks on disk, enforces per-disk mutexes.
87
* keepproxy logs & answers client queries, verifies hashes, connects to other keep services.
88
* keepgw logs & answers client queries, verifies hashes, answers index/status queries, reads/writes a local {hash, remote object} index, connects to remote services.
89
90
Possibilities:
91
* Refactor the keepstore command to consist of just the "unix volume" code; move everything else into packages like keep_server and hash_checking_reader. Create a new keepgw-s3 command.
92
* Extend the keepstore command to use backing-store modules like -volume=unix:/foo and -volume=s3:bucketid.
93
* Extend the keepproxy command to use backing-store modules like S3 as an alternative to keep disk services.
94 2 Tom Clegg
95 3 Tom Clegg
h3. Open questions and risks
96 2 Tom Clegg
97
How does the gateway know its own UUID so it can write the appropriate +Kuuid locator hints when constructing a manifest?
98
99
How does a client know how much progress has been made on a @"POST /collection"@ request? The worker could update the collection object each time an object is written, or each time a locator (64 MiB segment) is indexed, and this behavior could be toggled during the initial API call. But how does a caller know whether the work is finished?
100
* The collection "expires_at" attribute could be set to some non-null value, to indicate that the collection is ephemeral, until it is complete. This would help avoid accidental use of partially-written collections. It would also provide automatic clean-up of partially written collections, but still permit "resume" (assuming "resume" starts before the @expires_at@ time arrives).
101
* How should the worker communicate the expected total collection size, number of blocks/files, or finish time? This could be written in the collection's @properties@ hash under a key chosen by convention. (In the pathological case where the client provides a conflicting key in @{"collection":{"properties":{...}}}@ then progress information would be unavailable.)
102
103
How does a client know when a @"POST /collection"@ request has been abandoned?
104
* The worker could delete the collection in case of error, but this would make "resume" impossible.
105
106
How should the server indicate to the client that progress is being made during a @"POST /manifest_text"@ request?
107 1 Tom Clegg
* Use HTTP chunked transfer encoding to return the manifest text one token at a time? (This could also help detect closed connections sooner.)
108 2 Tom Clegg
109
The @DELETE /collection/{uuid}@ API cancels a worker thread, but at face value looks like it will delete a collection. (In a sense it _can_ delete a collection by cancelling work and leaving the partial collection with a non-null @expires_at@ value, but if the job is finished, the effect is nothing at all like "delete collection".) Perhaps it should be renamed to something more like @DELETE /queue/{uuid}@? Perhaps it should have different responses for "cancelled as a result of this request" and "already cancelled or was never happening"?
110
111
Should there be an API for providing credentials in a POST request? The choice(s) of credentials to use for each data segments could be stored in the map:
112
* <pre>
113
{
114
 "locator" : [{"S3Path":"bucket/prefix/object","credential_id":"local_credential_set_id"}, ...],
115
 ...
116
}
117
</pre>
118
* @local_credential_set_id@ could be the hash (and filename of local cache) of a key pair.
119
120
Other than by having admin privileges, how can a client establish permission to use S3 credentials (which are already known by the gateway server) during a POST request?
121
122 3 Tom Clegg
h3. Future work
123 2 Tom Clegg
124
A "resume" API would be useful.