Project

General

Profile

Keep S3 gateway » History » Version 1

Tom Clegg, 03/23/2015 08:55 PM

1 1 Tom Clegg
h1. Keep S3 gateway
2
3
See [[Keep service hints]] for more background.
4
5
h2. High level design
6
7
Each remote storage service (e.g, S3 bucket) in use at a given Arvados installation is supported by one keep server process, running with a flag like @-volume=s3:/mappings:bucketname:s3credentials@ instead of @-volumes=/tmp/1,/tmp/2@.
8
9
h2. Specifics
10
11
Likely, some parts of keepproxy and keepstore should be refactored to share code more effectively.
12
* keepstore logs & answers client queries, verifies hashes, answers index/status queries, reads/writes data blocks on disk, enforces per-disk mutexes.
13
* keepproxy logs & answers client queries, verifies hashes, connects to other keep services.
14
* keepgw logs & answers client queries, verifies hashes, answers index/status queries, reads/writes a local {hash, remote object} index, connects to remote services.
15
16
Possibilities:
17
* Refactor the keepstore command to consist of just the "unix volume" code; move everything else into packages like keep_server and hash_checking_reader. Create a new keepgw-s3 command.
18
* Extend the keepstore command to use backing-store modules like -volume=unix:/foo and -volume=s3:bucketid.
19
* Extend the keepproxy command to use backing-store modules like S3 as an alternative to keep disk services.
20
21
The {hash, remote object} mapping can be stored in the local filesystem.
22
* A given hash can map to more than one remote object. It's worth remembering all such remote objects: if one disappears or changes, a different one should be attempted next. Suggestion: For each hash, we have a text file with one line per remote data object matching the hash.
23
* When remote objects are bigger than 64 MiB, the mapping will actually be {hash, remote object segment}. This should be easy to manage if remote object references are always stored as @"offset:length:remote_object_path"@.
24
25
h2. Related changes
26
27
When using local filesystems as data stores, keepstore should accept @-volume=/tmp/foo -volume=/tmp/bar@ (in addition to @-volumes=/tmp/foo,/tmp/bar@ for backward compatibility). See https://golang.org/src/flag/example_test.go