Feature #20640
closedAPI for admin to query materialized permissions
Description
Requested by user:
They would like to synchronize permissions set in Arvados to another system. This system does not support OpenID, users will log in with LDAP and they can do a mapping between username and arvados uuid. However it is not convenient to generate and use Arvados tokens in order to do permission lookups on the fly.
Proposed solution is to have an admin API where the client can send a list of users and/or projects and get back the permissions associated with each user and/or project. This would make it possible to write a policy plug-in that queries permissions for a given user or project from Arvados on demand applies them to the 3rd party system.
Design sketch:
Implement a "list" API that behaves similarly to other list APIs, supporting our filter syntax. It return a list of items with the user uuid, target uuid, and permission level.
The permission level is numeric internally but it might make sense to transform it to can_read, can_write or can_manage for the external API -- because if we ever want to tweak the implementation or introduce new types of permission, it would be better not to be committed to a specific numeric representation of permission.
Since it is read only and rows don't have standalone identifiers, the endpoint would only support "list" method and not "get" or anything else.
It should be usable by admins only and regular users.
Admins have unrestricted access.
Regular users are limited to permissions for targets which they have "manage" access (this includes "self" so they can always see their own permissions), implemented as a self-join on the permission table.
Detailed design¶
endpoint: /arvados/v1/permissions
User accessing the endpoint must be admin, otherwise returns 403.
Fields¶
Field | Type | Description |
user_uuid | string | user account that has permission |
target_uuid | string | target of the permission |
perm_level | string | one of "can_read", "can_write" or "can_manage" |
There is an additional field 'traverse_owned' which records whether permission should be transitive through this target. This is important for table updates but clients shouldn't be doing any reasoning about permissions, so I don't know if this should be returned or not.
Methods¶
list
HTTP GET with query parameters (in the URL or body, as supported by every other endpoint).
All other methods (PUT, POST etc) return method not supported. Any paths under the endpoint return 404.
As with other endpoints, GET with no parameters yields default behavior (query result with default limit, count, order, select, with no filters). It should be possible to enumerate the entire table if you really want to (but this could be millions of entries, which is why you'd normally use a filter).
Argument | Type | Description |
limit | integer | same as other API methods |
count | string | default 'none', only accepts 'none' |
order | array | same as other API methods |
select | array | same as other API methods |
filters | array | same as other API methods |
- Returns "permission" objects in "items" same as other list APIs
- If a (user, target) combination isn't listed that would have otherwise matched the filters (and all pages have been fetched) no permission exists.
- perm_level isn't indexed, including it explicitly in a query may cause a table scan, we should disallow filtering or ordering on it.
- I'm omitting "offset" because it is not needed for keyset paging
- several fields are of limited use, but I propose including them here so that the Python keyset_list_all method can be used to query permissions.
- keyset_list_all does assume the the record has a 'uuid' field, it should be tweaked to allow specifying an alternate tiebreaker field
Providing this list method gives the client the flexibility to query permission of individual users and projects, get all permissions of a single user, or get all permissions on a specific project. This also allows us to use our existing code to construct the query.
Even admin-only, the ability to get a list of every user granted access to a specific project is something that would be useful to add to Workbench. As a future extension it could allow regular users with can_manage access to a project to query the list of users with access to the project.
Updated by Peter Amstutz 9 months ago
- Target version changed from Future to Development 2024-05-08 sprint
Updated by Peter Amstutz 9 months ago
- Target version changed from Development 2024-05-08 sprint to Development 2024-06-05 sprint
Updated by Peter Amstutz 9 months ago
- Target version changed from Development 2024-06-05 sprint to Development 2024-05-08 sprint
Updated by Tom Clegg 8 months ago
Should we call the API endpoint "implicit_permissions" or something like that rather than just "permissions", to make a more obvious distinction from the explicit permissions (links) elsewhere in the API? ("materialized permissions" is our de facto internal term for this, but not a great external-facing choice)
Updated by Peter Amstutz 8 months ago
- Target version changed from Development 2024-05-08 sprint to Development 2024-05-22 sprint
Updated by Peter Amstutz 8 months ago
- Related to Feature #18182: Parameter on get_permissions API to return every user and group with permissions added
Updated by Peter Amstutz 8 months ago
- Target version changed from Development 2024-05-22 sprint to Development 2024-06-05 sprint
Updated by Moritz Gilsdorf 7 months ago
Hi Peter,
We reviewed this design in the team and think it will fully serve our use case.
Thanks
Updated by Peter Amstutz 7 months ago
- Assigned To set to Tom Clegg
- Subject changed from API for admin to query materialized permissions to API for admin to query materialized permissions
Updated by Tom Clegg 7 months ago
20640-computed-permissions-api @ df9503c818bb07f86669ef20f014647203d77d6f -- developer-run-tests: #4262
Updated by Tom Clegg 7 months ago
20640-computed-permissions-api @ ce87d7b43cc2192a6ca13e3eaef10457c1715f7c -- developer-run-tests: #4264
A sanity check might be good at this point, but the branch won't really be ready to review/merge until it has- API documentation
- More tests for select, order, etc. (the code paths being different from generic models/controllers)
- More tests for actual content
The fact that permissions don't have UUIDs does make keyset paging a bit more complicated. The solution I've got here is for keyset_list_all()
to change its tie-break behavior automatically when the returned items have "target_uuid" but no "uuid" field. It assumes the caller specifies order_key="user_uuid"
. Perhaps it would be fine to just error out when processing the first page if order_key
is anything else (or unspecified)?
Updated by Peter Amstutz 7 months ago
- Target version changed from Development 2024-06-05 sprint to Development 2024-06-19 sprint
Updated by Tom Clegg 6 months ago
20640-computed-permissions-api @ f4d2b1be4d6f724c4a4e68875782219343603674 -- developer-run-tests: #4285
(failure is unrelated to this branch, and also seems not to have anything to do with that same test failing on debian12)
20640-computed-permissions-api @ f4d2b1be4d6f724c4a4e68875782219343603674 -- developer-run-tests: #4287
- All agreed upon points are implemented / addressed.
- ✅
- Anything not implemented (discovered or discussed during work) has a follow-up story.
- N/A
- Code is tested and passing, both automated and manual, what manual testing was done is described
- ✅ automated
- Documentation has been updated.
- ✅
- Behaves appropriately at the intended scale (describe intended scale).
- Not expecting any scaling issues given that we have a btree index on the default/expected sort columns (user_uuid, target_uuid) and we already rely on materialized_permissions queries to be fast to support all other APIs.
- Considered backwards and forwards compatibility issues between client and server.
- The list_all helpers in old versions of the Python SDK will not work with the new API. ([Where] should this be documented?)
- Follows our coding standards and GUI style guidelines.
- ✅
Updated by Peter Amstutz 6 months ago
- Target version changed from Development 2024-06-19 sprint to Development 2024-07-03 sprint
Updated by Brett Smith 6 months ago
Tom Clegg wrote in #note-27:
20640-computed-permissions-api @ f4d2b1be4d6f724c4a4e68875782219343603674 -- developer-run-tests: #4287
There is a small suite of tests for keyset_list_all
in test_util.py
. One way or another, it would be good to mirror this suite for the new endpoint too. Because there's one bug I'm sure it would catch: if the user passes in a select
argument that doesn't include the "seen keys," it'll misbehave.
The current code already detects this for uuid
and automatically adds it to select
as needed. It would be nice to have analogous behavior for this case. Off the top of my head, I admit I don't know a way to do that without actually calling the list function. If you're interested, I'd be happy to investigate and see if we can find a reliable way to introspect it.
My only other comment is a tiny little readability suggestion in services/api/app/controllers/arvados/v1/schema_controller.rb
:
discovery[:resources]['computed_permissions'][:methods][:list][:parameters].select! do |param|
![:cluster_id, :bypass_federation, :offset].include?(param)
end
discovery[:schemas]['ComputedPermission'].delete(:uuidPrefix)
discovery[:schemas]['ComputedPermission'][:properties].select! do |prop|
![:uuid, :etag].include?(prop)
end
What you wrote is totally fine but I think it would be just a smidge easier to read written with reject!
. That leading !
in the blocks feels easy to overlook/forget/misunderstand.
- Considered backwards and forwards compatibility issues between client and server.
- The list_all helpers in old versions of the Python SDK will not work with the new API. ([Where] should this be documented?)
I don't think it needs to be called out. I feel like this is our "standard policy" with the SDKs and should just be treated as a given. That said, I do wonder if maybe that policy in general could be better-documented.
Thanks.
Updated by Brett Smith 6 months ago
[Brett wrote a bad suggestion here that doesn't work. If you saw it please disregard.]
Updated by Tom Clegg 6 months ago
if the user passes in a select argument that doesn't include the "seen keys," it'll misbehave
I've added a test that confirms this issue by failing, but I'm not sure how to fix it. We could try introspecting the "get page" func, but that seems very icky -- it wouldn't handle a case where the paging func is anything but the stock API method. For example, this test case.
I've also thought about getting the first page using params appropriate for the generic case, then inspecting the response to check whether it's a computed_permissions response, and if so, throwing out that first response and starting over with params appropriate for computed_permissions. If this only happened in the specific case where the caller provided an insufficient select
param, maybe the extra wasted API call would be OK.
Or, given that all three of the computed_permissions fields are unique to computed_permissions...
...and selecting zero fields seems nearly completely useless -- the only examples I can think of being- filter by exact match on all 3 fields, because we only need to check whether that exact record exists, and len(items) works even if we don't select any fields
- filter with a mix of "in" and exact matches on all 3 fields, because we only need to check whether all of a set of exact records exist (e.g., "users a, b, and c all have permission x on object y") and if not, for some reason we don't care which of a, b, and c do have permission, we just want to check
len(items)==3
(I can't think of why anyone would want this but it does seem theoretically possible and in such a case select=[] would be the most efficient way to get that result)
- selecting any of {user_uuid, target_uuid, perm_level} means the callable fn is expected to return a page of computed_permissions, so keyset_list_all should add {user_uuid, target_uuid} to the select arg
- in the two contrived situations described above -- caller tries to use keyset_list_all with computed_permissions and an empty
select
-- it fails and that's OK because we documented it and it's easy for the caller to fix by providingselect=['user_uuid']
instead.
Fixed the double-negative in the Rails code.
20640-computed-permissions-api @ 5f39a0b2fa48f3a78022da38212d682604b07954
Updated by Brett Smith 6 months ago
Okay, so like, points for creativity. I respect this solution. The next question that jumps to mind is, do we need a test in API server to make sure that computed permissions is the only table with these fields, so that later changes to the database schema don't accidentally break the new logic in keyset_list_all
?
Alternatively, may I make another suggestion? I propose that instead of having keyset_list_all
become a giant function with a lot of internal branching that tries to automagically do everything for the user, we just… have the user say what they want? How about:
- We add a new argument to
keyset_list_all
likekey_fields: Sequence[str]=('uuid',)
. - The
select
munging logic near the top of the function replacesselect.add('uuid')
withselect.update(key_fields)
. - The inner loop is updated to use
key_fields
instead of the current hardcoded'uuid'
.operator.itemgetter(*key_fields)
should make it easy to implement this. - Of course it's mean to force users to memorize that they need to pass
key_fields=('user_uuid', 'target_uuid')
every time they want to iterate compute permissions. No problem, let's just add a functioniter_computed_permissions
that does that for them. (It should be callediter_computed_permissions
for #19817 futureproofing. For the record, I have been doing Arvados development for one additional year and I still don't know what "keyset" means.)
Yes, there is a downside to telling users "if you want to use this specific API, you have to use this other specific function instead." But that's straightforward enough to document, and since this is an admin-only API, I don't think most users will run into it anyway. And in exchange we get a lot more maintainability room and a lot less entanglement between the implementation of the function and the details of the API. What do you think?
Updated by Tom Clegg 6 months ago
Yes to relying on the caller to specify different behavior when needed (and making it easy for them).
20640-computed-permissions-api @ b1f991b0343ba6cba2a3433fc7ca606668e75fdb -- developer-run-tests: #4311
- add
key_fields
arg - add
iter_computed_permissions
wrapper withorder_key
andkey_fields
defaults suitable for that API
Updated by Brett Smith 6 months ago
Tom Clegg wrote in #note-33:
20640-computed-permissions-api @ b1f991b0343ba6cba2a3433fc7ca606668e75fdb -- developer-run-tests: #4311
- This version of
iter_computed_permissions
does not pass through any of the arguments it's called with besidesfn
andkwargs
. It needs to pass through all of them. Ideally there would be tests for this. - The
iter_computed_permissions
docstring refers toarv.computed_permissions().list
. However, there's nothing to tell the reader whatarv
is. Suggest insteadarvados.api_resources.ArvadosAPIClient.computed_permissions.list
so it links to that method in the API reference. I understand this leaves open the possibility that readers try to pass in the unbound instance methodcomputed_permissions.list
, but until feedback tells me that's a real confusion I think it's the better trade-off overall. - My bad: the type of
key_fields
would be better written asContainer[str]
instead ofSequence[str]
, since the order doesn't matter. Sorry. if len(tiebreak_keys) == 0
- Per PEP 8, preferif not tiebreak_keys
kwargs['select'] = list(select | set(key_fields) | {order_key})
- If you prefer this style, suggest you just go all the way with it and write:
This avoids the overhead of multiple set allocations when we really only need to build one.try: kwargs['select'] = list({*kwargs['select'], *key_fields, order_key}) except KeyError: pass
- I had to spend a couple minutes thinking through the updated test parameterization. I think mixing itertools and comprehensions at different levels like this is pretty rough. I suggest we just tip the scales and go full comprehensions. Consider:
Don't get me wrong, I wouldn't exactly call this "readable," but I think it's an improvement at least. It also removes the need for the inner "skip test" logic.@parameterized.parameterized.expand( (fake_item, key_fields, order_key, select) for (fake_item, key_fields) in [ (_SELECT_FAKE_ITEM, ('uuid',)), (_FAKE_COMPUTED_PERMISSIONS_ITEM, ('user_uuid', 'target_uuid')), ] for order_key in fake_item if order_key != 'perm_level' for count in range(len(fake_item) + 1) for select in itertools.combinations(fake_item, count) )
Thanks.
Updated by Tom Clegg 6 months ago
20640-computed-permissions-api @ fe6de2473eae9c979cdfcc638d35bd3cc62f08fd -- developer-run-tests: #4314
Incorporates all points in #note-34.
Updated by Brett Smith 6 months ago
Tom Clegg wrote in #note-35:
20640-computed-permissions-api @ fe6de2473eae9c979cdfcc638d35bd3cc62f08fd -- developer-run-tests: #4314
Incorporates all points in #note-34.
Thanks, just one nit. In sdk/python/tests/test_util.py
, this:
if num_retries != self.expect_num_retries:
raise Exception(f"KeysetTestHelper called with num_retries={num_retries} but expected {expect_num_retries}")
would be better written as:
assert num_retries == self.expect_num_retries, f"KeysetTestHelper called with num_retries={num_retries} but expected {expect_num_retries}"
A failed assertion just raises AssertionError, so it's doing the same thing with a more specific exception. At least sometimes that'll be reported as a test failure rather than a test error, which is a reporting improvement, and even in contexts where it's not we don't lose anything.
Please merge. Thanks.
Updated by Tom Clegg 6 months ago
Ugh, yes, I followed nearby existing code instead of fixing it. Changed both to assert.
After merging main:
20640-computed-permissions-api @ a8bb14d1dc42a316677f6435a281230a60db569d -- developer-run-tests: #4321
Updated by Brett Smith 6 months ago
Tom Clegg wrote in #note-37:
20640-computed-permissions-api @ a8bb14d1dc42a316677f6435a281230a60db569d -- developer-run-tests: #4321
Looks good, please merge. Thanks.
Updated by Tom Clegg 6 months ago
- Status changed from In Progress to Resolved
Applied in changeset arvados|896d7bb9650b07d60494faee969aa70a6e5c26a2.