Bug #18076
closedProperly handle cached user records that don't exist anymore on federation scenarios
Added by Lucas Di Pentima about 3 years ago. Updated about 3 years ago.
Description
If for some reason a user's UUID on the LoginCluster changes (or the user record is removed), the cached copies on the federated clusters get stale and probably cause username validation issues when trying to cache the new record.
The federated controllers use the batch_update
internal API to refresh the user list cache but it doesn't take into account that some users may not exist anymore so cache records aren't properly cleaned/handled.
Related issues
Updated by Peter Amstutz about 3 years ago
- Target version set to 2021-09-15 sprint
Updated by Lucas Di Pentima about 3 years ago
- Status changed from New to In Progress
Updated by Lucas Di Pentima about 3 years ago
- Related to Bug #18094: Remove unused Users#update_uuid endpoint added
Updated by Lucas Di Pentima about 3 years ago
Updates at 7aa1380 - branch 18076-stale-cached-users-handling
Test run: developer-run-tests: #2670
If there's a stale cached user record creating the username
field collision, set it to nil
before retrying the update. There're two scenarios this is covering:
- The stale user record belongs to an existing user on LoginCluster. Another user took its username, so the new username is coming in the
/arvados/v1/users/batch_update
operation -- it's ok to have it set tonil
temporarily. - The stale user record doesn't exist on LoginCluster anymore, so resetting it to
nil
isn't harmful and avoids future collisions.
Updated by Tom Clegg about 3 years ago
The fix looks great.
Just some nits about the test:Limit: math.MaxInt64
can also be spelledLimit: -1
(probably easier to read in logs)- The test first checks whether all clusters use z1111 as their login cluster, but all it really needs to be effective is for z1111 and z3333 to use z1111 as login cluster. (I figure eventually we'll add another cluster to our integration test suite with a different config, and at that point it will be nice to have fewer test failures to fix.)
for userNr := range []int{0, 1}
is sort of misleading in that userNr is the array index, not the array element --[]int{9,3}
would have the same result. Idiomatic Go is to sayfor userNr := 0; userNr < 2; userNr++
if you want a sequence of numbers, orfor _, userNr := range ...
if you want specific int values. This is a very nitpicky nitpick though!
LGTM, thanks!
Updated by Lucas Di Pentima about 3 years ago
Thanks for the suggestions! Updated at c534e74c5 and merged.
Updated by Lucas Di Pentima about 3 years ago
- Status changed from In Progress to Resolved
Applied in changeset arvados|726f65153228eb81aa526df078f4fca6045e1779.
Updated by Lucas Di Pentima about 3 years ago
- Status changed from Resolved to In Progress
Updated by Lucas Di Pentima about 3 years ago
Once Jenkins deployed this on ce8i5, I started to see the following:
$ arv user list Error: error updating local user records: //railsapi.internal/arvados/v1/users/batch_update: 422 Unprocessable Entity: #<ActiveRecord::RecordNotSaved: Failed to save the record> (req-vr2et8tnawgndjcxocwb)
...and in ce8i5's logs:
[req-vr2et8tnawgndjcxocwb] cached username 'ward2' collision with user 'ce8i5-tpzed-249xu4osx4us5lg' - resetting #<ActiveRecord::RecordNotSaved: Failed to save the record> Error 1630934673+e96dc935: 422 {"method":"PATCH","path":"/arvados/v1/users/batch_update","format":"html","controller":"Arvados::V1::UsersController","action":"batch_update","status":422,"duration":74.54,"view":0.32,"db":22.33,"request_id":null,"client_ipaddr":"127.0.0.1","client_auth":"tordo-gj3su-000000000000000","exception":"#\u003cActiveRecord::RecordNotSaved: Failed to save the record\u003e","exception_backtrace":"/var/www/arvados-api/shared/vendor_bundle/ruby/2.5.0/gems/activerecord-5.2.6/lib/active_record/persistence.rb:308:in `save!'\n/var/www/arvados-api/shared/vendor_bundle/ruby/2.5.0/gems/activerecord-5.2.6/lib/active_record/validations.rb:52:in `save!'\n/var/www/arvados-api/shared/vendor_bundle/ruby/2.5.0/gems/activereco[...]
From reading the Rails docs, the ActiveRecord::RecordNotSaved
exception is raised when a before_*
callback fails when calling save!()
, so I thought there was some issue with resetting the user's username... and when trying to do so manually, I got:
$ arv user update -b --uuid ce8i5-tpzed-249xu4osx4us5lg --user='{"username": null}' Error: //railsapi.internal/arvados/v1/users/ce8i5-tpzed-249xu4osx4us5lg: 422 Unprocessable Entity: Username can't be unset when the user owns repositories (req-wt8kut96pkj91l187pjn)
So I guess this is what's triggering the problem on the users model:
before_update :verify_repositories_empty, :if => Proc.new {
username.nil? and username_changed?
}
Updated by Lucas Di Pentima about 3 years ago
My approach would be to set the conflicting username to something random (maybe a random suffix?) instead of nil
, as it seems that this wouldn't cause any issues with preexisting repositories.
Updated by Tom Clegg about 3 years ago
Ugh! Yeah, a random string/suffix sounds like a decent workaround.
Updated by Lucas Di Pentima about 3 years ago
Fix at 7edaf0605 - branch 18076-user-cache-with-repository
Test run: developer-run-tests: #2676
WB1 integration test re-run: developer-run-tests-apps-workbench-integration: #2837
- Sets a "
conflictNNNNNNN
" random suffix to the conflicting user's username instead ofnil
. - Expands the previous test.
Updated by Lucas Di Pentima about 3 years ago
- % Done changed from 50 to 100
- Status changed from In Progress to Resolved
Applied in changeset arvados-private:commit:arvados|f860ff683634d73edd63df32c284aba42f1bbf0c.