Project

General

Profile

Multi-cluster user database » History » Version 13

Tom Clegg, 08/07/2019 05:35 PM

1 1 Tom Clegg
h1. Multi-cluster user database
2
3
It is sometimes desirable to share a single user database across multiple Arvados clusters. For example:
4 12 Tom Clegg
* Clusters aaaaa, bbbbb, ccccc, ddddd, eeeee are on different continents, but they use the same upstream authentication providers (ldap/google) and a given user either has access to all clusters, or none.
5 6 Tom Clegg
* A down/unreachable cluster should not prevent any user from using _other_ clusters in the group -- even if the down/unreachable cluster is the one where the user's account was initially created.
6 1 Tom Clegg
7 6 Tom Clegg
This requires some changes to login and token validation. (Currently, any given user account has a single "home cluster" that can issue or validate tokens for it.)
8 1 Tom Clegg
9 6 Tom Clegg
h2. Logging in
10 1 Tom Clegg
11 6 Tom Clegg
Each user should be able to log in to their account using any cluster, regardless of where/whether they have logged in previously.
12 1 Tom Clegg
13 11 Tom Clegg
To achieve this (without depending real-time communication between clusters) the participating clusters need to agree on a mapping of upstream authentication results to Arvados user UUIDs. For example, if the upstream authentication result is @"foo@bar.example"@ ("an upstream auth provider assures us this user is foo@bar.example"):
14 8 Tom Clegg
# Generate a UUID "eeeee-tpzed-${sha1part(upstream)}" (where eeeee is a common prefix used by all participating clusters and sha1part() is the first 15 chars of base-36-encoded sha1())
15
# If it doesn't already exist, add a row to the users table with this UUID
16
# If another row exists in the users table with the same upstream (or same identity_url) but a different UUID, [offer to] merge the old account's data/objects/permissions into the new account (it isn't possible to log into the old account any more, but we know it belongs to the same person as the new account).
17 1 Tom Clegg
18 12 Tom Clegg
This also makes upstream authentication providers equivalent: as long as they report the same IDs (email addresses), users/sites can switch upstream providers on the fly without having to merge or migrate accounts.
19
20 8 Tom Clegg
Notes
21 1 Tom Clegg
* the "upstream" field is similar to identity_url as initially conceived. Since #4601, identity_url has been an opaque SSO-generated UUID, with no info about upstream -- so we will rely on it to detect "same upstream as old account that needs to be migrated" but we can't use it to generate the same user UUID as other clusters, hence the need for a new "upstream" field
22
* "remote" accounts (the kind that we already have in the users table with foreign UUIDs) have a null identity_url field, and will also have a null upstream field
23 2 Tom Clegg
24 12 Tom Clegg
|uuid                        |upstream        |identity_url                |significance               |
25
|eeeee-tpzed-012340123401234 |foo@bar.example |login-tpzed-aaaaaaaaaaaaaaa |Newly created user account |
26
|aaaaa-tpzed-aaaaaaaaaaaaaaa |NULL            |login-tpzed-aaaaaaaaaaaaaaa |Old user account (can't log in to this any more - contents should be migrated to eeeee-*) |
27
|ooooo-tpzed-ooooooooooooooo |NULL            |NULL                        |Remote user from cluster ooooo (not part of our multi-cluster group) |
28 1 Tom Clegg
29
h2. Configuration
30
31 4 Tom Clegg
Each cluster needs to know
32 12 Tom Clegg
* the uuid prefix to use when creating a new account, e.g., "eeeee" (this will be the initial "master" cluster -- see below)
33 2 Tom Clegg
34 12 Tom Clegg
The master cluster needs to know
35
* which other clusters are authorized to issue tokens for "eeeee-tpzed-*" users
36
37 2 Tom Clegg
<pre><code class="yaml">
38
Clusters:
39 13 Tom Clegg
  eeeee:
40 1 Tom Clegg
    Login:
41
      AssignUUIDPrefix: eeeee
42
    RemoteClusters:
43
      bbbbb:
44
        Proxy: true
45
        Authenticate:
46
          bbbbb: {} # (implied)
47
          eeeee: {} # accept tokens issued by bbbbb for users with uuid eeeee-*
48
</code></pre>
49
50
Example: aaaaa needs to validate a token issued by bbbbb.
51 12 Tom Clegg
* Do a callback to bbbbb (or check JWT signature) to confirm bbbbb really issued this token and get the relevant user UUID (result: yes, user uuid is eeeee-tpzed-012340123401234).
52
* Fetch eeeee's config.
53
* If RemoteClusters.bbbbb.Authenticate.eeeee is present, accept the token. Otherwise, reject the token.
54
* If the token is accepted, update the local cache of the user record from eeeee.
55 4 Tom Clegg
56 1 Tom Clegg
h2. Validating tokens
57 8 Tom Clegg
58
(...even when the issuing cluster is unreachable)
59 6 Tom Clegg
60
Each cluster should be able to validate a token that was issued by a different, currently unreachable, cluster. This contrasts with the current setup, where aaaaa validates tokens issued by bbbbb by doing a callback to bbbbb.
61 1 Tom Clegg
62
This seems easy enough: instead of random strings, tokens can be [like] "JSON Web Tokens":https://jwt.io/, signed by a private key whose public part is known by all clusters. (This would also be more efficient than callbacks, benefiting the mutually-untrusted cluster scenario too.)
63
64 12 Tom Clegg
h2. "Master" cluster
65 1 Tom Clegg
66
The authoritative place to store/load per-user information (preferences, and the "this email is just an alternate way to log in to a different account" marker) is:
67
* ...for callers outside the "eeeee" group of clusters: the "eeeee" cluster
68 6 Tom Clegg
* ...for callers inside the "eeeee" group of clusters, for now: a manually designated "master" cluster (probably "eeeee")
69 7 Tom Morris
* ...for callers inside the "eeeee" group of clusters, in future: a group-wide distributed database whose default/initial "master" is eeeee
70 10 Tom Clegg
71
Until a distributed database is implemented, each non-master cluster can update its cached user record (if stale at login or token validation time) from the master cluster, and proxy update requests to the master.
72 12 Tom Clegg
73
h2. Migration
74
75
Admin migration tool should, for each record in the user database:
76
* Check whether the UUID has been generated from the email address as described above (if so, do nothing)
77
* Check whether another user record exists with the generated UUID (if not, create one)
78
* [Prompt and] change existing references to the old UUID to the new generated UUID (_if the existing email field is trusted,_ this should include tokens, SSH keys, etc.)