Project

General

Profile

Federated identity » History » Revision 21

Revision 20 (Peter Amstutz, 06/21/2017 03:58 PM) → Revision 21/22 (Tom Clegg, 10/16/2017 07:13 PM)

h1. Federated identity 

 See 
 * #11453 
 * #11874 

 A person should be able to create an account and get a token from a single identity provider, and use that token to access private/protected resources on multiple Arvados clusters. 

 Motivating use cases: 
 * A user on cluster B shares a project with a user on cluster A. 
 * A container running on cluster A reads and writes data on cluster B. 
 * A user logged in to Workbench A can search/view/download/upload collections at cluster B. 

 Configuration examples: 
 * An organization has 5 clusters, but only one of them has user accounts and roles in its database. 
 * An on-premise cluster runs containers that use public data stored in the cloud (without mirroring the data locally). 

 h2. Relevant principles 

 Cluster "bbbbb" is authoritative for objects whose UUIDs start with "v2-bbbbb-". "bbbbb-". This applies to both the object's state and the set of user/group UUIDs that are allowed to read/write the object. 
 * This implies that permission links whose head_uuid starts with "bbbbb-" exist only on cluster bbbbb. (If they exist elsewhere, they should be ignored.) 

 h2. Design sketch 

 Each Arvados client must be able to prove to cluster B that it is authorized by cluster A to act on behalf of a user account which is controlled by cluster A. This must not involve giving enough information to cluster B to act on behalf of the user account: for example, the client cannot simply give cluster B its cluster A token for the purpose of doing a canary query: doing so would allow cluster B to exercise the client's authority on cluster C, D, and E as well. 

 h2. Protocol ideas 

 "Salted tokens": instead of passing its literal token, the client passes the token UUID and @HMAC(token, "bbbbb")@ when sending a request to cluster B (where "bbbbb" is cluster B's cluster ID / UUID prefix). Cluster B validates the request by passing those two parameters untouched to a "verify request" ("no-op") endpoint at cluster A. 
 * API server hands out tokens in the form "tokenUUID <delimiter> secret" instead of just the secret part. 
 * Cluster B figures out cluster A's API endpoint by looking at the "site ID prefix" of the token UUID. 
 * Cluster B can be configured with a lookup table (clusterID&rarr;apiHost) to override the implicit {id}.arvadosapi.com 
 * Cluster B can be configured to _only_ use the lookup table, i.e., to never use implicit {id}.arvadosapi.com endpoints 

 "Cluster-scoped tokens": the client contacts cluster A to get a scoped token which only allows "GET /users/current" on cluster A but is accepted by cluster B as an [all] token for that user. 

 h2. Adding permissions 

 There are a few permission-granting cases to consider. 

 |grantor|grantee|object|notes| 
 |user on site A|user on site A|object on site A|(existing permission system)| 
 |user on site A|group on site A|object on site A|(existing permission system)| 
 |user on site A|user or group on site A|object on site B|Client creates a link at site B. Site B asks site A whether the grantee user/group is visible to user A.| 
 |user on site A|user or group on site B|object on site B|Client creates a link at site B. Site B asks site A for a list of groups user A can see, then checks whether (possibly via one of those groups) user A can read the grantee user/group according to site B's local database.| 
 |user on site A|user or group on site B|object on site A|Client creates a link at site A. Site A generates a salted token and uses it to ask site B whether user A can read the grantee user/group.| 

 In all of these cases, "user on site A" has a UUID starting with "aaaaa-" and therefore uses a token issued by cluster A (see "protocol ideas" above). 

 When site B connects to site A in the course of processing request R, it uses the token provided by the client in request R. 

 

 h2. TODO 

 Things to address 

 * how to sync groups 
 * diagrams 
 * mnemonic cluster names / more concrete examples (including who is reachable on the internet) 
 * [how] do you get a list of users/groups you can share stuff with? 
 * clarify what UUIDs look like (some people have A uuids, some have B uuids) 
 * [[Cross-cluster delegation]] 
 * [[Routing multi cluster requests]]