Project

General

Profile

Actions

Keep S3 gateway » History » Revision 1

Revision 1/4 | Next »
Tom Clegg, 03/23/2015 08:55 PM


Keep S3 gateway

See Keep service hints for more background.

High level design

Each remote storage service (e.g, S3 bucket) in use at a given Arvados installation is supported by one keep server process, running with a flag like -volume=s3:/mappings:bucketname:s3credentials instead of -volumes=/tmp/1,/tmp/2.

Specifics

Likely, some parts of keepproxy and keepstore should be refactored to share code more effectively.
  • keepstore logs & answers client queries, verifies hashes, answers index/status queries, reads/writes data blocks on disk, enforces per-disk mutexes.
  • keepproxy logs & answers client queries, verifies hashes, connects to other keep services.
  • keepgw logs & answers client queries, verifies hashes, answers index/status queries, reads/writes a local {hash, remote object} index, connects to remote services.
Possibilities:
  • Refactor the keepstore command to consist of just the "unix volume" code; move everything else into packages like keep_server and hash_checking_reader. Create a new keepgw-s3 command.
  • Extend the keepstore command to use backing-store modules like -volume=unix:/foo and -volume=s3:bucketid.
  • Extend the keepproxy command to use backing-store modules like S3 as an alternative to keep disk services.
The {hash, remote object} mapping can be stored in the local filesystem.
  • A given hash can map to more than one remote object. It's worth remembering all such remote objects: if one disappears or changes, a different one should be attempted next. Suggestion: For each hash, we have a text file with one line per remote data object matching the hash.
  • When remote objects are bigger than 64 MiB, the mapping will actually be {hash, remote object segment}. This should be easy to manage if remote object references are always stored as "offset:length:remote_object_path".

Related changes

When using local filesystems as data stores, keepstore should accept -volume=/tmp/foo -volume=/tmp/bar (in addition to -volumes=/tmp/foo,/tmp/bar for backward compatibility). See https://golang.org/src/flag/example_test.go

Updated by Tom Clegg about 9 years ago · 1 revisions