Project

General

Profile

Actions

Bug #18004

closed

Cached federated token creation race condition on RailsAPI

Added by Lucas Di Pentima over 2 years ago. Updated over 2 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
API
Target version:
Story points:
-
Release relationship:
Auto

Description

We got a report that some users were sometimes seeing 502 errors on wb2 when accessing collections, nginx error log show a primary key violation on insert:

App 18334 stderr: [ 2021-08-06 09:54:55.4629 18396/0x00000000022f0380(Worker 1) utils.rb:87 ]: *** Exception ActiveRecord::RecordNotUnique in Rack application object (PG::UniqueViolation: ERROR:  duplicate key value violates unique constraint "index_api_client_authorizations_on_api_token" 
App 18334 stderr: DETAIL:  Key (api_token)=(TOKEN) already exists.
App 18334 stderr: : INSERT INTO "api_client_authorizations" ("api_token", "api_client_id", "user_id", "expires_at", "created_at", "updated_at", "uuid") VALUES ($1, $2, $3, $4, $5, $6, $7) RETURNING "id") (process 18396, thread 0x00000000022f0380(Worker 1)):
App 18334 stderr: from /var/www/arvados-api/shared/vendor_bundle/ruby/2.7.0/gems/activerecord-5.2.6/lib/active_record/connection_adapters/postgresql_adapter.rb:611:in `async_exec_params'
[...]
App 18334 stderr: from /var/www/arvados-api/current/app/models/arvados_model.rb:145:in `create'
App 18334 stderr: from /var/www/arvados-api/shared/vendor_bundle/ruby/2.7.0/gems/activerecord-5.2.6/lib/active_record/relation.rb:85:in `block in create'
App 18334 stderr: from /var/www/arvados-api/shared/vendor_bundle/ruby/2.7.0/gems/activerecord-5.2.6/lib/active_record/relation.rb:281:in `scoping'
App 18334 stderr: from /var/www/arvados-api/shared/vendor_bundle/ruby/2.7.0/gems/activerecord-5.2.6/lib/active_record/relation.rb:85:in `create'
App 18334 stderr: from /var/www/arvados-api/shared/vendor_bundle/ruby/2.7.0/gems/activerecord-5.2.6/lib/active_record/relation.rb:164:in `find_or_create_by'
App 18334 stderr: from /var/www/arvados-api/shared/vendor_bundle/ruby/2.7.0/gems/activerecord-5.2.6/lib/active_record/querying.rb:8:in `find_or_create_by'
App 18334 stderr: from /var/www/arvados-api/current/app/models/api_client_authorization.rb:367:in `block in validate'
App 18334 stderr: from /var/www/arvados-api/current/lib/current_api_client.rb:138:in `block in act_as_system_user'
App 18334 stderr: from /var/www/arvados-api/current/lib/current_api_client.rb:149:in `act_as_user'
App 18334 stderr: from /var/www/arvados-api/current/lib/current_api_client.rb:137:in `act_as_system_user'
App 18334 stderr: from /var/www/arvados-api/current/app/models/api_client_authorization.rb:312:in `validate'
App 18334 stderr: from /var/www/arvados-api/current/app/middlewares/arvados_api_token.rb:52:in `block in call'
App 18334 stderr: from /var/www/arvados-api/current/app/middlewares/arvados_api_token.rb:49:in `each'
App 18334 stderr: from /var/www/arvados-api/current/app/middlewares/arvados_api_token.rb:49:in `call'
App 18334 stderr: from /var/www/arvados-api/shared/vendor_bundle/ruby/2.7.0/gems/railties-5.2.6/lib/rails/engine.rb:524:in `call'
App 18334 stderr: from /var/www/arvados-api/shared/vendor_bundle/ruby/2.7.0/gems/railties-5.2.6/lib/rails/railtie.rb:190:in `public_send'
App 18334 stderr: from /var/www/arvados-api/shared/vendor_bundle/ruby/2.7.0/gems/railties-5.2.6/lib/rails/railtie.rb:190:in `method_missing'
App 18334 stderr: from /usr/share/ruby/vendor_ruby/phusion_passenger/rack/thread_handler_extension.rb:97:in `process_request'
App 18334 stderr: from /usr/share/ruby/vendor_ruby/phusion_passenger/request_handler/thread_handler.rb:152:in `accept_and_process_next_request'
App 18334 stderr: from /usr/share/ruby/vendor_ruby/phusion_passenger/request_handler/thread_handler.rb:113:in `main_loop'
App 18334 stderr: from /usr/share/ruby/vendor_ruby/phusion_passenger/request_handler.rb:416:in `block (3 levels) in start_threads'
App 18334 stderr: from /usr/share/ruby/vendor_ruby/phusion_passenger/utils.rb:113:in `block in create_thread_and_abort_on_exception'
[ 2021-08-06 09:54:55.4633 18288/7f15cbfff700 age/Cor/Con/InternalUtils.cpp:112 ]: [Client 2-787] Sending 502 response: application did not send a complete response

It seems the problem is related to a race when the remote token isn't cached on the cluster and many request for the same token are in flight, some of them will try to create the same cached token producing a collision.


Subtasks 1 (0 open1 closed)

Task #18025: Review 18004-cached-token-race-conditionResolvedTom Clegg08/17/2021Actions
Actions

Also available in: Atom PDF