Project

General

Profile

Actions

Feature #20640

closed

API for admin to query materialized permissions

Added by Peter Amstutz over 1 year ago. Updated 5 months ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
API
Story points:
-
Release:
Release relationship:
Auto

Description

Requested by user:

They would like to synchronize permissions set in Arvados to another system. This system does not support OpenID, users will log in with LDAP and they can do a mapping between username and arvados uuid. However it is not convenient to generate and use Arvados tokens in order to do permission lookups on the fly.

Proposed solution is to have an admin API where the client can send a list of users and/or projects and get back the permissions associated with each user and/or project. This would make it possible to write a policy plug-in that queries permissions for a given user or project from Arvados on demand applies them to the 3rd party system.

Design sketch:

Implement a "list" API that behaves similarly to other list APIs, supporting our filter syntax. It return a list of items with the user uuid, target uuid, and permission level.

The permission level is numeric internally but it might make sense to transform it to can_read, can_write or can_manage for the external API -- because if we ever want to tweak the implementation or introduce new types of permission, it would be better not to be committed to a specific numeric representation of permission.

Since it is read only and rows don't have standalone identifiers, the endpoint would only support "list" method and not "get" or anything else.

It should be usable by admins only and regular users.

Admins have unrestricted access.

Regular users are limited to permissions for targets which they have "manage" access (this includes "self" so they can always see their own permissions), implemented as a self-join on the permission table.

Detailed design

endpoint: /arvados/v1/permissions

User accessing the endpoint must be admin, otherwise returns 403.

Fields

Field Type Description
user_uuid string user account that has permission
target_uuid string target of the permission
perm_level string one of "can_read", "can_write" or "can_manage"

There is an additional field 'traverse_owned' which records whether permission should be transitive through this target. This is important for table updates but clients shouldn't be doing any reasoning about permissions, so I don't know if this should be returned or not.

Methods

list

HTTP GET with query parameters (in the URL or body, as supported by every other endpoint).

All other methods (PUT, POST etc) return method not supported. Any paths under the endpoint return 404.

As with other endpoints, GET with no parameters yields default behavior (query result with default limit, count, order, select, with no filters). It should be possible to enumerate the entire table if you really want to (but this could be millions of entries, which is why you'd normally use a filter).

Argument Type Description
limit integer same as other API methods
count string default 'none', only accepts 'none'
order array same as other API methods
select array same as other API methods
filters array same as other API methods
  • Returns "permission" objects in "items" same as other list APIs
  • If a (user, target) combination isn't listed that would have otherwise matched the filters (and all pages have been fetched) no permission exists.
  • perm_level isn't indexed, including it explicitly in a query may cause a table scan, we should disallow filtering or ordering on it.
  • I'm omitting "offset" because it is not needed for keyset paging
  • several fields are of limited use, but I propose including them here so that the Python keyset_list_all method can be used to query permissions.
    • keyset_list_all does assume the the record has a 'uuid' field, it should be tweaked to allow specifying an alternate tiebreaker field

Providing this list method gives the client the flexibility to query permission of individual users and projects, get all permissions of a single user, or get all permissions on a specific project. This also allows us to use our existing code to construct the query.

Even admin-only, the ability to get a list of every user granted access to a specific project is something that would be useful to add to Workbench. As a future extension it could allow regular users with can_manage access to a project to query the list of users with access to the project.


Subtasks 1 (0 open1 closed)

Task #21824: Review 20640-computed-permissions-apiResolvedTom Clegg07/01/2024Actions

Related issues

Related to Arvados - Feature #18182: Parameter on get_permissions API to return every user and group with permissionsNewActions
Actions #2

Updated by Peter Amstutz over 1 year ago

  • Description updated (diff)
Actions #3

Updated by Peter Amstutz 10 months ago

  • Description updated (diff)
Actions #4

Updated by Peter Amstutz 10 months ago

  • Description updated (diff)
Actions #5

Updated by Peter Amstutz 10 months ago

  • Description updated (diff)
Actions #6

Updated by Peter Amstutz 8 months ago

  • Target version changed from Future to Development 2024-05-08 sprint
Actions #7

Updated by Peter Amstutz 8 months ago

  • Target version changed from Development 2024-05-08 sprint to Development 2024-06-05 sprint
Actions #8

Updated by Peter Amstutz 7 months ago

  • Target version changed from Development 2024-06-05 sprint to Development 2024-05-08 sprint
Actions #9

Updated by Peter Amstutz 7 months ago

  • Description updated (diff)
Actions #10

Updated by Peter Amstutz 7 months ago

  • Description updated (diff)
Actions #11

Updated by Peter Amstutz 7 months ago

  • Description updated (diff)
Actions #12

Updated by Peter Amstutz 7 months ago

  • Description updated (diff)
Actions #13

Updated by Peter Amstutz 7 months ago

  • Description updated (diff)
Actions #14

Updated by Peter Amstutz 7 months ago

  • Description updated (diff)
Actions #15

Updated by Peter Amstutz 7 months ago

  • Description updated (diff)
Actions #16

Updated by Tom Clegg 7 months ago

Should we call the API endpoint "implicit_permissions" or something like that rather than just "permissions", to make a more obvious distinction from the explicit permissions (links) elsewhere in the API? ("materialized permissions" is our de facto internal term for this, but not a great external-facing choice)

Actions #17

Updated by Peter Amstutz 7 months ago

  • Target version changed from Development 2024-05-08 sprint to Development 2024-05-22 sprint
Actions #18

Updated by Peter Amstutz 7 months ago

  • Release set to 70
Actions #19

Updated by Peter Amstutz 7 months ago

  • Related to Feature #18182: Parameter on get_permissions API to return every user and group with permissions added
Actions #20

Updated by Peter Amstutz 7 months ago

  • Target version changed from Development 2024-05-22 sprint to Development 2024-06-05 sprint
Actions #21

Updated by Moritz Gilsdorf 6 months ago

Hi Peter,

We reviewed this design in the team and think it will fully serve our use case.

Thanks

Actions #22

Updated by Peter Amstutz 6 months ago

  • Assigned To set to Tom Clegg
  • Subject changed from API for admin to query materialized permissions to API for admin to query materialized permissions
Actions #23

Updated by Tom Clegg 6 months ago

Actions #24

Updated by Tom Clegg 6 months ago

  • Status changed from New to In Progress
Actions #25

Updated by Tom Clegg 6 months ago

20640-computed-permissions-api @ ce87d7b43cc2192a6ca13e3eaef10457c1715f7c -- developer-run-tests: #4264

A sanity check might be good at this point, but the branch won't really be ready to review/merge until it has
  • API documentation
  • More tests for select, order, etc. (the code paths being different from generic models/controllers)
  • More tests for actual content

The fact that permissions don't have UUIDs does make keyset paging a bit more complicated. The solution I've got here is for keyset_list_all() to change its tie-break behavior automatically when the returned items have "target_uuid" but no "uuid" field. It assumes the caller specifies order_key="user_uuid". Perhaps it would be fine to just error out when processing the first page if order_key is anything else (or unspecified)?

Actions #26

Updated by Peter Amstutz 6 months ago

  • Target version changed from Development 2024-06-05 sprint to Development 2024-06-19 sprint
Actions #27

Updated by Tom Clegg 5 months ago

20640-computed-permissions-api @ f4d2b1be4d6f724c4a4e68875782219343603674 -- developer-run-tests: #4285

(failure is unrelated to this branch, and also seems not to have anything to do with that same test failing on debian12)

20640-computed-permissions-api @ f4d2b1be4d6f724c4a4e68875782219343603674 -- developer-run-tests: #4287

  • All agreed upon points are implemented / addressed.
  • Anything not implemented (discovered or discussed during work) has a follow-up story.
    • N/A
  • Code is tested and passing, both automated and manual, what manual testing was done is described
    • ✅ automated
  • Documentation has been updated.
  • Behaves appropriately at the intended scale (describe intended scale).
    • Not expecting any scaling issues given that we have a btree index on the default/expected sort columns (user_uuid, target_uuid) and we already rely on materialized_permissions queries to be fast to support all other APIs.
  • Considered backwards and forwards compatibility issues between client and server.
    • The list_all helpers in old versions of the Python SDK will not work with the new API. ([Where] should this be documented?)
  • Follows our coding standards and GUI style guidelines.
Actions #28

Updated by Peter Amstutz 5 months ago

  • Target version changed from Development 2024-06-19 sprint to Development 2024-07-03 sprint
Actions #29

Updated by Brett Smith 5 months ago

Tom Clegg wrote in #note-27:

20640-computed-permissions-api @ f4d2b1be4d6f724c4a4e68875782219343603674 -- developer-run-tests: #4287

There is a small suite of tests for keyset_list_all in test_util.py. One way or another, it would be good to mirror this suite for the new endpoint too. Because there's one bug I'm sure it would catch: if the user passes in a select argument that doesn't include the "seen keys," it'll misbehave.

The current code already detects this for uuid and automatically adds it to select as needed. It would be nice to have analogous behavior for this case. Off the top of my head, I admit I don't know a way to do that without actually calling the list function. If you're interested, I'd be happy to investigate and see if we can find a reliable way to introspect it.

My only other comment is a tiny little readability suggestion in services/api/app/controllers/arvados/v1/schema_controller.rb:

discovery[:resources]['computed_permissions'][:methods][:list][:parameters].select! do |param|
  ![:cluster_id, :bypass_federation, :offset].include?(param)
end
discovery[:schemas]['ComputedPermission'].delete(:uuidPrefix)
discovery[:schemas]['ComputedPermission'][:properties].select! do |prop|
  ![:uuid, :etag].include?(prop)
end

What you wrote is totally fine but I think it would be just a smidge easier to read written with reject!. That leading ! in the blocks feels easy to overlook/forget/misunderstand.

  • Considered backwards and forwards compatibility issues between client and server.
    • The list_all helpers in old versions of the Python SDK will not work with the new API. ([Where] should this be documented?)

I don't think it needs to be called out. I feel like this is our "standard policy" with the SDKs and should just be treated as a given. That said, I do wonder if maybe that policy in general could be better-documented.

Thanks.

Actions #30

Updated by Brett Smith 5 months ago

[Brett wrote a bad suggestion here that doesn't work. If you saw it please disregard.]

Actions #31

Updated by Tom Clegg 5 months ago

if the user passes in a select argument that doesn't include the "seen keys," it'll misbehave

I've added a test that confirms this issue by failing, but I'm not sure how to fix it. We could try introspecting the "get page" func, but that seems very icky -- it wouldn't handle a case where the paging func is anything but the stock API method. For example, this test case.

I've also thought about getting the first page using params appropriate for the generic case, then inspecting the response to check whether it's a computed_permissions response, and if so, throwing out that first response and starting over with params appropriate for computed_permissions. If this only happened in the specific case where the caller provided an insufficient select param, maybe the extra wasted API call would be OK.

Or, given that all three of the computed_permissions fields are unique to computed_permissions...

...and selecting zero fields seems nearly completely useless -- the only examples I can think of being
  • filter by exact match on all 3 fields, because we only need to check whether that exact record exists, and len(items) works even if we don't select any fields
  • filter with a mix of "in" and exact matches on all 3 fields, because we only need to check whether all of a set of exact records exist (e.g., "users a, b, and c all have permission x on object y") and if not, for some reason we don't care which of a, b, and c do have permission, we just want to check len(items)==3 (I can't think of why anyone would want this but it does seem theoretically possible and in such a case select=[] would be the most efficient way to get that result)
...then
  • selecting any of {user_uuid, target_uuid, perm_level} means the callable fn is expected to return a page of computed_permissions, so keyset_list_all should add {user_uuid, target_uuid} to the select arg
  • in the two contrived situations described above -- caller tries to use keyset_list_all with computed_permissions and an empty select -- it fails and that's OK because we documented it and it's easy for the caller to fix by providing select=['user_uuid'] instead.

Fixed the double-negative in the Rails code.

20640-computed-permissions-api @ 5f39a0b2fa48f3a78022da38212d682604b07954

Actions #32

Updated by Brett Smith 5 months ago

Okay, so like, points for creativity. I respect this solution. The next question that jumps to mind is, do we need a test in API server to make sure that computed permissions is the only table with these fields, so that later changes to the database schema don't accidentally break the new logic in keyset_list_all?

Alternatively, may I make another suggestion? I propose that instead of having keyset_list_all become a giant function with a lot of internal branching that tries to automagically do everything for the user, we just… have the user say what they want? How about:

  • We add a new argument to keyset_list_all like key_fields: Sequence[str]=('uuid',).
  • The select munging logic near the top of the function replaces select.add('uuid') with select.update(key_fields).
  • The inner loop is updated to use key_fields instead of the current hardcoded 'uuid'. operator.itemgetter(*key_fields) should make it easy to implement this.
  • Of course it's mean to force users to memorize that they need to pass key_fields=('user_uuid', 'target_uuid') every time they want to iterate compute permissions. No problem, let's just add a function iter_computed_permissions that does that for them. (It should be called iter_computed_permissions for #19817 futureproofing. For the record, I have been doing Arvados development for one additional year and I still don't know what "keyset" means.)

Yes, there is a downside to telling users "if you want to use this specific API, you have to use this other specific function instead." But that's straightforward enough to document, and since this is an admin-only API, I don't think most users will run into it anyway. And in exchange we get a lot more maintainability room and a lot less entanglement between the implementation of the function and the details of the API. What do you think?

Actions #33

Updated by Tom Clegg 5 months ago

Yes to relying on the caller to specify different behavior when needed (and making it easy for them).

20640-computed-permissions-api @ b1f991b0343ba6cba2a3433fc7ca606668e75fdb -- developer-run-tests: #4311

  • add key_fields arg
  • add iter_computed_permissions wrapper with order_key and key_fields defaults suitable for that API
Actions #34

Updated by Brett Smith 5 months ago

Tom Clegg wrote in #note-33:

20640-computed-permissions-api @ b1f991b0343ba6cba2a3433fc7ca606668e75fdb -- developer-run-tests: #4311

  • This version of iter_computed_permissions does not pass through any of the arguments it's called with besides fn and kwargs. It needs to pass through all of them. Ideally there would be tests for this.
  • The iter_computed_permissions docstring refers to arv.computed_permissions().list. However, there's nothing to tell the reader what arv is. Suggest instead arvados.api_resources.ArvadosAPIClient.computed_permissions.list so it links to that method in the API reference. I understand this leaves open the possibility that readers try to pass in the unbound instance method computed_permissions.list, but until feedback tells me that's a real confusion I think it's the better trade-off overall.
  • My bad: the type of key_fields would be better written as Container[str] instead of Sequence[str], since the order doesn't matter. Sorry.
  • if len(tiebreak_keys) == 0 - Per PEP 8, prefer if not tiebreak_keys
  • kwargs['select'] = list(select | set(key_fields) | {order_key}) - If you prefer this style, suggest you just go all the way with it and write:
    try:
        kwargs['select'] = list({*kwargs['select'], *key_fields, order_key})
    except KeyError:
        pass
    
    This avoids the overhead of multiple set allocations when we really only need to build one.
  • I had to spend a couple minutes thinking through the updated test parameterization. I think mixing itertools and comprehensions at different levels like this is pretty rough. I suggest we just tip the scales and go full comprehensions. Consider:
    @parameterized.parameterized.expand(
        (fake_item, key_fields, order_key, select)
        for (fake_item, key_fields) in [
                (_SELECT_FAKE_ITEM, ('uuid',)),
                (_FAKE_COMPUTED_PERMISSIONS_ITEM, ('user_uuid', 'target_uuid')),
        ]
        for order_key in fake_item
        if order_key != 'perm_level'
        for count in range(len(fake_item) + 1)
        for select in itertools.combinations(fake_item, count)
    )
    Don't get me wrong, I wouldn't exactly call this "readable," but I think it's an improvement at least. It also removes the need for the inner "skip test" logic.

Thanks.

Actions #35

Updated by Tom Clegg 5 months ago

20640-computed-permissions-api @ fe6de2473eae9c979cdfcc638d35bd3cc62f08fd -- developer-run-tests: #4314

Incorporates all points in #note-34.

Actions #36

Updated by Brett Smith 5 months ago

Tom Clegg wrote in #note-35:

20640-computed-permissions-api @ fe6de2473eae9c979cdfcc638d35bd3cc62f08fd -- developer-run-tests: #4314

Incorporates all points in #note-34.

Thanks, just one nit. In sdk/python/tests/test_util.py, this:

if num_retries != self.expect_num_retries:
    raise Exception(f"KeysetTestHelper called with num_retries={num_retries} but expected {expect_num_retries}")

would be better written as:

assert num_retries == self.expect_num_retries, f"KeysetTestHelper called with num_retries={num_retries} but expected {expect_num_retries}"

A failed assertion just raises AssertionError, so it's doing the same thing with a more specific exception. At least sometimes that'll be reported as a test failure rather than a test error, which is a reporting improvement, and even in contexts where it's not we don't lose anything.

Please merge. Thanks.

Actions #37

Updated by Tom Clegg 5 months ago

Ugh, yes, I followed nearby existing code instead of fixing it. Changed both to assert.

After merging main:

20640-computed-permissions-api @ a8bb14d1dc42a316677f6435a281230a60db569d -- developer-run-tests: #4321

Actions #38

Updated by Brett Smith 5 months ago

Tom Clegg wrote in #note-37:

20640-computed-permissions-api @ a8bb14d1dc42a316677f6435a281230a60db569d -- developer-run-tests: #4321

Looks good, please merge. Thanks.

Actions #39

Updated by Tom Clegg 5 months ago

  • Status changed from In Progress to Resolved
Actions

Also available in: Atom PDF