Feature #13493
closedFederated record retrieval
Added by Peter Amstutz over 6 years ago. Updated about 6 years ago.
Description
From Federation implementation roadmap
Implementation notesWhen retrieving a workflow record for arvados-cwl-runner, API server notices workflow UUID is remote, and fetches it with salted token (instead of looking in local DB)
- This is a feature of the new (Go) API server (#13497)
- Implement as http handler middleware ("proxy to remote cluster if applicable, otherwise fall through to next handler")
- Add configs (similar to Rails API's remote_hosts and remote_hosts_via_dns) to cluster config object in source:sdk/go/arvados/config.go
Updated by Peter Amstutz over 6 years ago
- Status changed from New to In Progress
Updated by Tom Clegg over 6 years ago
- Related to Idea #9053: Port API server to Go added
Updated by Tom Clegg over 6 years ago
- Blocked by Idea #13497: [API] Initial "arvados-controller" server that proxies API endpoints to Rails server added
Updated by Peter Amstutz over 6 years ago
- Status changed from In Progress to New
Updated by Tom Morris over 6 years ago
- Target version changed from To Be Groomed to Arvados Future Sprints
- Story points set to 2.0
Updated by Tom Morris over 6 years ago
- Target version changed from Arvados Future Sprints to 2018-07-03 Sprint
Updated by Tom Clegg over 6 years ago
- Doesn't cover all possible permutations of {issuer of client auth, uuid of object being retrieved, cluster receiving proxyable request}. In the context of using arvados-cwl-runner on cluster A, loading a workflow from cluster B:
- Works if your token was issued by cluster A (i.e., your user account is also on cluster A)
- Works if your token was issued by cluster B, and has not been restricted to only work on cluster A (afaik there is no way to get a UI to offer this pair of host/token env vars)
- Doesn't work if your token was issued by cluster B and has been restricted to only work on cluster A (e.g., by logging in to Workbench B, following a link to Workbench A, and using "current token" in the nav menu)
- Doesn't work if your token was issued by cluster C (to make this work, API server needs to start issuing v2 tokens)
- Doesn't support object types other than workflow -- the idea is to use federation to host workflows on cluster A and use them on clusters B..E, even though we haven't yet dealt with all the complications/implications of proxying other kinds of requests
The test suite is a bit counterintuitive: unlike most tests, "zzzzz" is the remote cluster, and "zhome" is the local cluster whose proxy behavior is being tested, so that the remote cluster can be hooked up to the Rails server provided by run-tests.sh, which is always "zzzzz".
Updated by Peter Amstutz over 6 years ago
- Missing support for looking up remote clusters by DNS.
- SaltToken() returns ErrObsoleteToken for "plain" tokens, so it depends on the API server giving out "v2/" tokens (or the client converting them to 'v2' tokens) but I don't think the API server does that yet.
- Decides if a token is salted or not based on only the token length, which seems likely to cause problems in the future. Could we tweak the v2 token format so there is a way to unambiguously indicate a token is salted?
- Going a bit further, did we ever investigate using something like JSON Web Tokens (JWT) for Arvados tokens? This would provide a standard structured format for token payload and metadata.
- It appears that if the token was provided in the query string as
?api_token=...
it is not filtered out, and will be passed along in addition to the salted token in the Authorization header.
Updated by Tom Clegg over 6 years ago
Peter Amstutz wrote:
- Missing support for looking up remote clusters by DNS.
Not sure whether/when we'll need this, but that's right, it's not included here.
- SaltToken() returns ErrObsoleteToken for "plain" tokens, so it depends on the API server giving out "v2/" tokens (or the client converting them to 'v2' tokens) but I don't think the API server does that yet.
That's right, that's where the "cluster C" restriction above comes from.
- Decides if a token is salted or not based on only the token length, which seems likely to cause problems in the future. Could we tweak the v2 token format so there is a way to unambiguously indicate a token is salted?
We could, although length==40 seems unambiguous. What kind of problems do you have in mind?
- Going a bit further, did we ever investigate using something like JSON Web Tokens (JWT) for Arvados tokens? This would provide a standard structured format for token payload and metadata.
We could investigate that. But this is probably not the best place to open a discussion. Wiki?
- It appears that if the token was provided in the query string as
?api_token=...
it is not filtered out, and will be passed along in addition to the salted token in the Authorization header.
Indeed. Will fix.
Updated by Tom Clegg over 6 years ago
- Target version changed from 2018-07-03 Sprint to 2018-07-18 Sprint
Updated by Tom Clegg over 6 years ago
- removes ?api_token=...
- cleanup tests, add "mocked remote" for testing request details like the query string
Updated by Peter Amstutz over 6 years ago
Tom Clegg wrote:
Peter Amstutz wrote:
- Missing support for looking up remote clusters by DNS.
Not sure whether/when we'll need this, but yes, it's missing.
The API server supports it already, so why wouldn't we do it now?
- SaltToken() returns ErrObsoleteToken for "plain" tokens, so it depends on the API server giving out "v2/" tokens (or the client converting them to 'v2' tokens) but I don't think the API server does that yet.
That's right, that's where the "cluster C" restriction above comes from.
That doesn't address my comment, if you get a "classic" token from cluster A, you still can't use it for federation unless some as-yet-unwritten client code converts it to "v2/" format first. So it won't actually work?
- Decides if a token is salted or not based on only the token length, which seems likely to cause problems in the future. Could we tweak the v2 token format so there is a way to unambiguously indicate a token is salted?
We could, although length==40 seems unambiguous. What kind of problems do you have in mind?
Because we don't specify the plain token is required to have a certain length that != 40, it would be legal to have an unsalted token that is 40 characters long. If we're going to rely on a particular behavior we should document it.
- Going a bit further, did we ever investigate using something like JSON Web Tokens (JWT) for Arvados tokens? This would provide a standard structured format for token payload and metadata.
We could investigate that.
I'll do a little research and see if it makes sense for us.
Updated by Peter Amstutz over 6 years ago
Also: please add a documentation page under "Architecture" that describes the current state of federation features.
Updated by Peter Amstutz over 6 years ago
Also there should be a documentation page describing how to install & configure arvados-controller (even if it is marked EXPERIMENTAL).
Updated by Peter Amstutz over 6 years ago
Comments about testing:
- Looks like we test GET and PATCH, what about POST, DELETE, or weird methods like OPTIONS?
- It looks like TestRemoteError tests for connection error, how about a test where the remote returns 404?
- Maybe errors talking to clusters be returned as "Bad Gateway" not "Internal Server Error" ?
Updated by Tom Clegg over 6 years ago
Peter Amstutz wrote:
- Missing support for looking up remote clusters by DNS.
The API server supports it already, so why wouldn't we do it now?
The API server supports accepting and validating tokens issued by *.arvadosapi.com. Here, a wildcard would support proxying requests to *.arvadosapi.com. Both are "looking up by DNS" but they are different features.
I guess the reason not to do it now would be that we have other things to do now. It might be better to think about this in the context of #13619.
- SaltToken() returns ErrObsoleteToken for "plain" tokens, so it depends on the API server giving out "v2/" tokens (or the client converting them to 'v2' tokens) but I don't think the API server does that yet.
That's right, that's where the "cluster C" restriction above comes from.
That doesn't address my comment, if you get a "classic" token from cluster A, you still can't use it for federation unless some as-yet-unwritten client code converts it to "v2/" format first. So it won't actually work?
Aha. I thought "it" in your comment referred to SaltToken(), I didn't know you were thinking of the "token issued by A" scenario.
Indeed, for that to work, we need to either fix the FIXME in saltAuthToken(), or start issuing v2 tokens (and make sure Workbench recognizes/salts them properly).
if err == auth.ErrObsoleteToken {
// FIXME: If the token exists in our own database,
// salt it for the remote. Otherwise, assume it was
// issued by the remote, and pass it through
// unmodified.
token = creds.Tokens[0]
If we're going to rely on a particular behavior we should document it.
Agreed. Added API token format
Updated by Tom Clegg over 6 years ago
Peter Amstutz wrote:
Also there should be a documentation page describing how to install & configure arvados-controller (even if it is marked EXPERIMENTAL).
Yes, this is happening on #13497. Waiting for ops feedback about Installing controller service.
- Looks like we test GET and PATCH, what about POST, DELETE, or weird methods like OPTIONS?
Will add.
- It looks like TestRemoteError tests for connection error, how about a test where the remote returns 404?
TestNoAccess tests this (get object with no read access = 404).
- Maybe errors talking to clusters be returned as "Bad Gateway" not "Internal Server Error" ?
Sounds good.
Updated by Tom Clegg over 6 years ago
- upgrade locally-issued v1 tokens to v2 so they can be salted for proxying (it's still desirable to move to issuing v2 tokens, but I don't want that and the associated Workbench updates to block this -- also, even after we do that, the transparent upgrade will allow old tokens to work as well as new tokens)
- test OPTIONS method (and updated apiserver to return an empty body instead of "-")
Updated by Peter Amstutz over 6 years ago
s.remoteServer.Server.Handler.ServeHTTP(rec, req) // direct to remote -- can't proxy a create req because no uuid
You could look for workflow.owner_uuid
and use the the cluster id from that. (Not going to block the branch on this, but we should think about it.)
err = db.QueryRowContext(req.Context(), `SELECT uuid FROM api_client_authorizations WHERE api_token=$1 AND (expires_at IS NULL OR expires_at > current_timestamp) LIMIT 1`, aca.APIToken).Scan(&aca.UUID)
This line is really long
def empty - render text: "-" + render text: "" end
This is awesome.
So far this looks good. I'm going to do some manual testing / playing around with arvbox.
Updated by Tom Clegg over 6 years ago
- Target version changed from 2018-07-18 Sprint to 2018-08-01 Sprint
Updated by Peter Amstutz over 6 years ago
I'm happy to say after a bunch or arvbox hacking (adding support for arvados-controller) I was able to do a federated query between two arvbox instances! Both the "arv" command line and workbench were able to retrieve a record from a federated instance.
LGTM!
Updated by Tom Clegg over 6 years ago
- Target version changed from 2018-08-01 Sprint to 2018-08-15 Sprint
Updated by Tom Clegg over 6 years ago
Peter Amstutz wrote:
The architecture section seems like a good place to explain how federation works under the hood, but would it be more appropriate to put the "feature is available" info in a usage context?Also: please add a documentation page under "Architecture" that describes the current state of federation features.
- "Running registered workflows at the command line" (https://doc.arvados.org/user/tutorials/writing-cwl-workflow.html)
- "GET workflow API" (https://doc.arvados.org/api/methods/workflows.html)
I suppose these also have a dependency on docs for setting up the relevant parts of the cluster config and Rails API config.
Updated by Peter Amstutz over 6 years ago
- Related to Feature #13993: [API] Fetch remote-hosted collection by UUID added
Updated by Tom Clegg over 6 years ago
- Assigned To deleted (
Tom Clegg) - Target version changed from 2018-08-15 Sprint to 2018-09-05 Sprint
Updated by Peter Amstutz over 6 years ago
- Target version changed from 2018-09-05 Sprint to 2018-09-19 Sprint
Updated by Peter Amstutz over 6 years ago
13493-document-federation @ 5d8eb6ead9e2ca32f424a8385979485a902cf09f
First pass at documenting the mechanics of federation: cluster ids, salted tokens, proxying record & keep block requests.
Updated by Tom Clegg over 6 years ago
There's a lot of information here about how things have been implemented under the hood, which is fine if we can keep it up to date.
We should probably mention that most of the federation story isn't actually implemented/supported yet -- just some specific parts, and with certain simplifying assumptions -- and that the behaviors described here should not be interpreted as a stable API.
This seems like a good place to put some (badly needed) sensible rules for choosing a cluster ID.- For throw-away testing purposes, use "z****"
- For experimental/local clusters that will never be referenced by the outside world, use "x****"
- For long-lived clusters, contact us to get an ID that won't conflict with anyone else in the future
Updated by Peter Amstutz over 6 years ago
Tom Clegg wrote:
There's a lot of information here about how things have been implemented under the hood, which is fine if we can keep it up to date.
We should probably mention that most of the federation story isn't actually implemented/supported yet -- just some specific parts, and with certain simplifying assumptions -- and that the behaviors described here should not be interpreted as a stable API.
This seems like a good place to put some (badly needed) sensible rules for choosing a cluster ID.
- For throw-away testing purposes, use "z****"
- For experimental/local clusters that will never be referenced by the outside world, use "x****"
- For long-lived clusters, contact us to get an ID that won't conflict with anyone else in the future
13493-document-federation @ 898cd195f8b5916ed9c3a83dec212e858bfa610f updated taking these comments into account, also adds missing diagram
Updated by Tom Morris over 6 years ago
- Target version changed from 2018-09-19 Sprint to 2018-10-03 Sprint
Updated by Peter Amstutz over 6 years ago
- Status changed from In Progress to Resolved