Federated identity


A person should be able to create an account and get a token from a single identity provider, and use that token to access private/protected resources on multiple Arvados clusters.

Motivating use cases:
  • A user on cluster B shares a project with a user on cluster A.
  • A container running on cluster A reads and writes data on cluster B.
  • A user logged in to Workbench A can search/view/download/upload collections at cluster B.
Configuration examples:
  • An organization has 5 clusters, but only one of them has user accounts and roles in its database.
  • An on-premise cluster runs containers that use public data stored in the cloud (without mirroring the data locally).

Relevant principles

Cluster "bbbbb" is authoritative for objects whose UUIDs start with "bbbbb-". This applies to both the object's state and the set of user/group UUIDs that are allowed to read/write the object.
  • This implies that permission links whose head_uuid starts with "bbbbb-" exist only on cluster bbbbb. (If they exist elsewhere, they should be ignored.)

Design sketch

Each Arvados client must be able to prove to cluster B that it is authorized by cluster A to act on behalf of a user account which is controlled by cluster A. This must not involve giving enough information to cluster B to act on behalf of the user account: for example, the client cannot simply give cluster B its cluster A token for the purpose of doing a canary query: doing so would allow cluster B to exercise the client's authority on cluster C, D, and E as well.

Protocol ideas

"Salted tokens": instead of passing its literal token, the client passes the token UUID and HMAC(token, "bbbbb") when sending a request to cluster B (where "bbbbb" is cluster B's cluster ID / UUID prefix). Cluster B validates the request by passing those two parameters untouched to a "verify request" ("no-op") endpoint at cluster A.
  • API server hands out tokens in the form "v2 <delimiter> tokenUUID <delimiter> secret" instead of just the secret part.
  • Cluster B figures out cluster A's API endpoint by looking at the "site ID prefix" of the token UUID.
  • Cluster B can be configured with a lookup table (clusterID→apiHost) to override the implicit {id}
  • Cluster B can be configured to only use the lookup table, i.e., to never use implicit {id} endpoints

"Cluster-scoped tokens": the client contacts cluster A to get a scoped token which only allows "GET /users/current" on cluster A but is accepted by cluster B as an [all] token for that user.

Adding permissions

There are a few permission-granting cases to consider.

grantor grantee object notes
user on site A user on site A object on site A (existing permission system)
user on site A group on site A object on site A (existing permission system)
user on site A user or group on site A object on site B Client creates a link at site B. Site B asks site A whether the grantee user/group is visible to user A.
user on site A user or group on site B object on site B Client creates a link at site B. Site B asks site A for a list of groups user A can see, then checks whether (possibly via one of those groups) user A can read the grantee user/group according to site B's local database.
user on site A user or group on site B object on site A Client creates a link at site A. Site A generates a salted token and uses it to ask site B whether user A can read the grantee user/group.

In all of these cases, "user on site A" has a UUID starting with "aaaaa-" and therefore uses a token issued by cluster A (see "protocol ideas" above).

When site B connects to site A in the course of processing request R, it uses the token provided by the client in request R.


Things to address

  • how to sync groups
  • diagrams
  • mnemonic cluster names / more concrete examples (including who is reachable on the internet)
  • [how] do you get a list of users/groups you can share stuff with?
  • clarify what UUIDs look like (some people have A uuids, some have B uuids)
  • Cross-cluster delegation
  • Routing multi cluster requests

Updated by Tom Clegg over 6 years ago · 22 revisions