Project

General

Profile

Actions

Bug #20831

closed

User unsetup method does not consistently remove permissions in a federation

Added by Brett Smith 9 months ago. Updated 5 months ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
API
Story points:
-
Release relationship:
Auto

Description

In a federation with a login cluster, if you call a user's unsetup method on the login cluster, they will still have permission links on other clusters in the federation. This is undesirable for all the same reasons we started removing them originally in #19501. Permission links should be removed across the federation.


Subtasks 1 (0 open1 closed)

Task #21179: Review 20831-user-table-locksResolvedPeter Amstutz11/13/2023Actions

Related issues

Related to Arvados - Bug #19501: unsetup should remove all sharing permissions for the deactivated userResolvedTom Clegg10/17/2022Actions
Actions #1

Updated by Brett Smith 9 months ago

  • Related to Bug #19501: unsetup should remove all sharing permissions for the deactivated user added
Actions #3

Updated by Peter Amstutz 6 months ago

  • Target version changed from Future to Development 2023-11-29 sprint
Actions #4

Updated by Peter Amstutz 6 months ago

As part of the user account synchronization ApiClientAuthorization.validate, it will unsetup the user:

        if user.is_invited && !remote_user['is_invited']
          # Remote user is not "invited" state, they should be unsetup, which
          # also makes them inactive.
          user.unsetup
        else

However, the UsersController.batch_update method doesn't do this.

I think what we want is a new method like User.handle_remote_record is used by both the token validation and batch_update so that the logic for things like user activation/deactivation and usernames is consistent.

Actions #5

Updated by Peter Amstutz 6 months ago

  • Release set to 67
Actions #6

Updated by Peter Amstutz 6 months ago

  • Assigned To set to Lucas Di Pentima
Actions #7

Updated by Peter Amstutz 6 months ago

  • Assigned To changed from Lucas Di Pentima to Peter Amstutz
Actions #8

Updated by Peter Amstutz 6 months ago

20831-user-table-locks @ 282562ff358c549980a48ccca41944039f86483a

developer-run-tests: #3901

  • All agreed upon points are implemented / addressed.
    • Behavior for updating user records that belong to remote clusters now uses the same code path for both token validate and batch_update
  • Anything not implemented (discovered or discussed during work) has a follow-up story.
    • n/a
  • Code is tested and passing, both automated and manual, what manual testing was done is described
    • passes existing tests, which are pretty thorough
  • Documentation has been updated.
    • n/a
  • Behaves appropriately at the intended scale (describe intended scale).
    • now takes a row lock on the user record before updating it. this shouldn't affect scaling but it does mean updates that would have raced (believed to be the cause of #21059) should now be serialized.
  • Considered backwards and forwards compatibility issues between client and server.
    • when the remote cluster indicates an account has been deactivated, it should now be correctly deactivated on the satellite cluster, which was always the intended behavior but does represent a minor behavior change.
  • Follows our coding standards and GUI style guidelines.
    • yes

This change is pretty straightforward, the logic for batch update and user update from token validate has been combined into a single method. It also now takes a row lock on the user record, which is intended to solve multiple-activation email bug #21059.

Actions #9

Updated by Peter Amstutz 6 months ago

  • Status changed from New to In Progress
Actions #10

Updated by Peter Amstutz 5 months ago

20831-user-table-locks @ e42bf8fbe66f822066e13c08b346005a52e1aa4a

developer-run-tests: #3906

  • All agreed upon points are implemented / addressed.
    • Behavior for updating user records that belong to remote clusters now uses the same code path for both token validate and batch_update
  • Anything not implemented (discovered or discussed during work) has a follow-up story.
    • n/a
  • Code is tested and passing, both automated and manual, what manual testing was done is described
    • passes existing tests, including controller tests, which are pretty thorough
  • Documentation has been updated.
    • n/a
  • Behaves appropriately at the intended scale (describe intended scale).
    • now takes a row lock on the user record before updating it. this shouldn't affect scaling but it does mean updates that would have raced (believed to be the cause of #21059) should now be serialized.
  • Considered backwards and forwards compatibility issues between client and server.
    • when the remote cluster indicates an account has been deactivated, it should now be correctly deactivated on the satellite cluster, which was always the intended behavior but does represent a minor behavior change.
    • it turns out there was an issue, the is_admin and is_invited fields were not always returned; the Go SDK was turning them into "false" in this case which caused it to incorrectly deactivate the user. The User.IsAdmin and User.IsInvited fields are now pointers. The code using those fields (only a handful of places) has been updated.
    • Since it handles situations where is_admin and is_invited are present or not present, it should be backwards compatible. However, to get the new user deactivation behavior, you need to update the Login cluster so that it starts sending those fields in situations where it currently isn't.
  • Follows our coding standards and GUI style guidelines.
    • yes

This change is pretty straightforward, the logic for batch update and user update from token validate has been combined into a single method. It also now takes a row lock on the user record, which is intended to solve multiple-activation email bug #21059.

Actions #11

Updated by Tom Clegg 5 months ago

As discussed offline, I still think changing IsAdmin to a pointer (implicitly using nil to mean "field not populated") is not appropriate. We've been eliminating the ad-hoc mix of pointers and values by sticking with / changing to values. Changing that strategy to edit-warring individual fields as needed seems like the worst side to land on, especially since this is an SDK, purportedly usable by code other than arvados itself. Sync-groups etc. (see failing tests) aren't the only things that will break.

But given that the problem here is that the correct way to interpret a false value depends on the remote cluster's API version, I think the right solution is probably to check the remote cluster's API version using the discovery doc.

Actions #12

Updated by Peter Amstutz 5 months ago

Tom Clegg wrote in #note-11:

As discussed offline, I still think changing IsAdmin to a pointer (implicitly using nil to mean "field not populated") is not appropriate. We've been eliminating the ad-hoc mix of pointers and values by sticking with / changing to values. Changing that strategy to edit-warring individual fields as needed seems like the worst side to land on, especially since this is an SDK, purportedly usable by code other than arvados itself. Sync-groups etc. (see failing tests) aren't the only things that will break.

But given that the problem here is that the correct way to interpret a false value depends on the remote cluster's API version, I think the right solution is probably to check the remote cluster's API version using the discovery doc.

It's more complicated than that, because sometimes the is_admin and is_invited fields are returned, and when that does happen, it is better if we don't ignore them (because the whole point of this ticket was to correctly "unsetup" users when we find out they have been disabled upstream).

So now it (a) it checks the discovery document and assumes valid is_admin and is_invited for newer API server revisions and (b) for backwards compatibility, it checks for the conditions where we expect valid is_admin and is_invited to be returned.

My main reservation about this solution is that if there is a case I didn't account for where it expects the fields to be returned but they are not sent, users will be accidentally deactivated because the logic is based on expected behavior and not ground truth. I guess that risk is acceptable to avoid introducing pointers in the User struct.

20831-user-table-locks @ 135bada0fe08de2b678ede684d43a155c4351ed3

developer-run-tests: #3909

Actions #13

Updated by Lucas Di Pentima 5 months ago

There are failing tests, so I'm assigning the review task back to you so I don't get notified until it's really ready.

Actions #15

Updated by Peter Amstutz 5 months ago

0b0a1bec011d8fb19205f53d78848fee800f2d88

developer-run-tests: #3916

Passing tests, now checks the discover doc revision and current user to decide whether to treat is_admin and is_invited fields as valid or not, as discussed in note 11.

Actions #16

Updated by Lucas Di Pentima 5 months ago

Just a few comments:

  • arvados/lib/controller/federation/conn.go
    • L708: Should that fmt.Printf() call be passed to the logging facility instead?
    • L728: Shouldn't err be checked too?
  • Do you think User.update_remote_user() would benefit of having tests written for it?
Actions #17

Updated by Peter Amstutz 5 months ago

Lucas Di Pentima wrote in #note-16:

Just a few comments:

  • arvados/lib/controller/federation/conn.go
    • L708: Should that fmt.Printf() call be passed to the logging facility instead?

That was a debug printf that was left in by mistake, I removed it.

  • L728: Shouldn't err be checked too?

Yes, added an error check.

  • Do you think User.update_remote_user() would benefit of having tests written for it?

api/test/integration/remote_user_test.rb and functional/arvados/v1/users_controller_test.rb indirectly invoke User.update_remote_user() with a variety of remote user cases, I don't know what additional tests I would write that are not covered by the existing tests.

2c0bf8219eb3ff5f978d147bff7ae6c6a73e8188

developer-run-tests: #3921

Actions #18

Updated by Tom Clegg 5 months ago

Not that anyone asked me, but
  • in rpc.Conn, we might not want to cache the discovery doc forever -- maybe refresh after 1h?
  • rpc.Conn should be goroutine-safe, so discoveryDocument access should be protected with a mutex
  • in federation, conn.chooseBackend(conn.cluster.ClusterID) should be just conn.local (both in the new code and in the VocabularyGet method above which it was probably copied from)
Actions #19

Updated by Peter Amstutz 5 months ago

Tom Clegg wrote in #note-18:

Not that anyone asked me, but
  • in rpc.Conn, we might not want to cache the discovery doc forever -- maybe refresh after 1h?
  • rpc.Conn should be goroutine-safe, so discoveryDocument access should be protected with a mutex
  • in federation, conn.chooseBackend(conn.cluster.ClusterID) should be just conn.local (both in the new code and in the VocabularyGet method above which it was probably copied from)

9f48c431a829cfa26e2a20ab306b37613d445fa1

developer-run-tests: #3922

Actions #22

Updated by Lucas Di Pentima 5 months ago

LGTM, thanks.

Actions #23

Updated by Peter Amstutz 5 months ago

  • Status changed from In Progress to Resolved
Actions

Also available in: Atom PDF