Project

General

Profile

Federated collections » History » Version 3

Peter Amstutz, 08/01/2018 08:23 PM

1 1 Peter Amstutz
h1. Federated collections
2 2 Peter Amstutz
3
* Fetch collection record by uuid
4
** use federated record retrieval strategy, already developed.
5
* Fetch collection record by PDH
6 3 Peter Amstutz
** No location hint.  Distribute request to all federated clusters and pick one to return.
7 2 Peter Amstutz
** Read-only, only need to support GET operation
8
* Can cache result by PDH.
9
10
Record will have a manifest with signed blocks.  However these blocks will be signed for the origin cluster.
11
12
Client needs to be able to fetch blocks from remote cluster.
13
14
arvados-controller could add block hints, using existing feature in the Python and Go SDK:
15
16
* Blocks in a manifest can include a hint in the form "+K@zzzzz".  Python SDK will attempt to fetch the block from "https://keep.zzzzz.arvadosapi.com/"
17
** Must conform to a particular naming DNS scheme.
18
** Could be generalized by looking up in "remote_hosts" and using the "keep_services.accessible" API.
19
** Every block will be requested from remote every time, because client is contacting remote server directly, limited opportunity for edge caching.
20
21
* Hint can also be a uuid of a "local gateway service".  This is instructs client to use a specific service from the keep_services table (indicated as "service_type" of "gateway:")
22
** Direct requests through a specific service
23
** Does not encode which remote cluster to pull a block from.
24
** Gateway service could search for blocks by sending request to every federated cluster
25 1 Peter Amstutz
** Gateway service can cache blocks so they don't need to be re-fetched from remote.
26 3 Peter Amstutz
27
Both "hint" schemes are slightly inelegant because they require repeating the "+K@" hint for ever block in the manifest.
28
29
We probably want an architecture that makes block caching possible, even if the first pass implementation doesn't support it.  That implies a gateway / proxy service rather than contacting the remote cluster directly (architecturally, this is also more in line with arvados-controller design acting as an intermediary, as opposed to adding federation features in the client.)
30
31
Proposal:
32
33
Arvados-controller decorates blocks with "+K@zzzzz" hints but change the implementation so that instead of the client contacting the remote host, the client contacts the local gateway service and requests the block with the cluster hint and block signature (which is returned by the remote cluster).
34
35
The local gateway services requests the block from the appropriate cluster, returns the result.
36
37
A simple caching strategy would be to copy the block to local keep storage, and maintain a mapping from the remote signature(s) to a local signature.  If a request comes for a block which has recently been fetched, it can issue a HEAD request to verify the signature and then remember the signature.
38
39
Fetching collection flow:
40
41
# Running on cluster aaaaa
42
# Client sends request to arvados-controller by PDH
43
# arvados-controller searches local database and comes up empty.
44
# arvados-controller sends request for collection by PDH (with salted token) out to federated clusters bbbbb and ccccc
45
# ccccc returns result
46
# arvados-controller decorates the return record with "+K@ccccc" block hints
47
# return record to client
48
49
Fetching block flow:
50
51
# client wishes to read a file
52
# client has signed block locator with "+K@ccccc" hint
53
# client sends request to "gateway" Keep service
54
# gateway keep service contacts keepproxy on cluster ccccc and requests block
55
# keepproxy on ccccc returns block content to gateway
56
# gateway returns block content to client
57
58
Fetching block, with caching:
59
60
# client wishes to read a file
61
# client has signed block locator with "+K@ccccc" hint
62
# client sends request to "gateway" Keep service
63
# gateway service looks up block in memory / local database
64
## if found, check if the block signature is cached
65
## if block signature isn't cached, send HEAD request to ccccc
66
## if the signature checks out, fetch the block from aaaaa local keepstore and returns that.
67
## else fail (because HEAD request must have failed)
68
# gateway keep service contacts keepproxy on cluster ccccc and requests block
69
# keepproxy on ccccc returns block content to gateway
70
# gateway saves block to aaaaa local keep, records mapping of remote block+signature to local block+signature (could be in memory, or local database such as sqlite)
71
# gateway returns block content to client