Federated collections » History » Version 2

Peter Amstutz, 08/01/2018 06:58 PM

1 1 Peter Amstutz
h1. Federated collections
2 2 Peter Amstutz
3 2 Peter Amstutz
* Fetch collection record by uuid
4 2 Peter Amstutz
** use federated record retrieval strategy, already developed.
5 2 Peter Amstutz
* Fetch collection record by PDH
6 2 Peter Amstutz
** No location hint.  Must send out request to all federated clusters.
7 2 Peter Amstutz
** Read-only, only need to support GET operation
8 2 Peter Amstutz
* Can cache result by PDH.
9 2 Peter Amstutz
10 2 Peter Amstutz
Record will have a manifest with signed blocks.  However these blocks will be signed for the origin cluster.
11 2 Peter Amstutz
12 2 Peter Amstutz
Client needs to be able to fetch blocks from remote cluster.
13 2 Peter Amstutz
14 2 Peter Amstutz
arvados-controller could add block hints, using existing feature in the Python and Go SDK:
15 2 Peter Amstutz
16 2 Peter Amstutz
* Blocks in a manifest can include a hint in the form "+K@zzzzz".  Python SDK will attempt to fetch the block from "https://keep.zzzzz.arvadosapi.com/"
17 2 Peter Amstutz
** Must conform to a particular naming DNS scheme.
18 2 Peter Amstutz
** Could be generalized by looking up in "remote_hosts" and using the "keep_services.accessible" API.
19 2 Peter Amstutz
** Every block will be requested from remote every time, because client is contacting remote server directly, limited opportunity for edge caching.
20 2 Peter Amstutz
21 2 Peter Amstutz
* Hint can also be a uuid of a "local gateway service".  This is instructs client to use a specific service from the keep_services table (indicated as "service_type" of "gateway:")
22 2 Peter Amstutz
** Direct requests through a specific service
23 2 Peter Amstutz
** Does not encode which remote cluster to pull a block from.
24 2 Peter Amstutz
** Gateway service could search for blocks by sending request to every federated cluster
25 2 Peter Amstutz
** Gateway service can cache blocks so they don't need to be re-fetched from remote.