Actions
Bug #18004
closedCached federated token creation race condition on RailsAPI
Status:
Resolved
Priority:
Normal
Assigned To:
Category:
API
Target version:
Story points:
-
Release:
Release relationship:
Auto
Description
We got a report that some users were sometimes seeing 502 errors on wb2 when accessing collections, nginx error log show a primary key violation on insert:
App 18334 stderr: [ 2021-08-06 09:54:55.4629 18396/0x00000000022f0380(Worker 1) utils.rb:87 ]: *** Exception ActiveRecord::RecordNotUnique in Rack application object (PG::UniqueViolation: ERROR: duplicate key value violates unique constraint "index_api_client_authorizations_on_api_token" App 18334 stderr: DETAIL: Key (api_token)=(TOKEN) already exists. App 18334 stderr: : INSERT INTO "api_client_authorizations" ("api_token", "api_client_id", "user_id", "expires_at", "created_at", "updated_at", "uuid") VALUES ($1, $2, $3, $4, $5, $6, $7) RETURNING "id") (process 18396, thread 0x00000000022f0380(Worker 1)): App 18334 stderr: from /var/www/arvados-api/shared/vendor_bundle/ruby/2.7.0/gems/activerecord-5.2.6/lib/active_record/connection_adapters/postgresql_adapter.rb:611:in `async_exec_params' [...] App 18334 stderr: from /var/www/arvados-api/current/app/models/arvados_model.rb:145:in `create' App 18334 stderr: from /var/www/arvados-api/shared/vendor_bundle/ruby/2.7.0/gems/activerecord-5.2.6/lib/active_record/relation.rb:85:in `block in create' App 18334 stderr: from /var/www/arvados-api/shared/vendor_bundle/ruby/2.7.0/gems/activerecord-5.2.6/lib/active_record/relation.rb:281:in `scoping' App 18334 stderr: from /var/www/arvados-api/shared/vendor_bundle/ruby/2.7.0/gems/activerecord-5.2.6/lib/active_record/relation.rb:85:in `create' App 18334 stderr: from /var/www/arvados-api/shared/vendor_bundle/ruby/2.7.0/gems/activerecord-5.2.6/lib/active_record/relation.rb:164:in `find_or_create_by' App 18334 stderr: from /var/www/arvados-api/shared/vendor_bundle/ruby/2.7.0/gems/activerecord-5.2.6/lib/active_record/querying.rb:8:in `find_or_create_by' App 18334 stderr: from /var/www/arvados-api/current/app/models/api_client_authorization.rb:367:in `block in validate' App 18334 stderr: from /var/www/arvados-api/current/lib/current_api_client.rb:138:in `block in act_as_system_user' App 18334 stderr: from /var/www/arvados-api/current/lib/current_api_client.rb:149:in `act_as_user' App 18334 stderr: from /var/www/arvados-api/current/lib/current_api_client.rb:137:in `act_as_system_user' App 18334 stderr: from /var/www/arvados-api/current/app/models/api_client_authorization.rb:312:in `validate' App 18334 stderr: from /var/www/arvados-api/current/app/middlewares/arvados_api_token.rb:52:in `block in call' App 18334 stderr: from /var/www/arvados-api/current/app/middlewares/arvados_api_token.rb:49:in `each' App 18334 stderr: from /var/www/arvados-api/current/app/middlewares/arvados_api_token.rb:49:in `call' App 18334 stderr: from /var/www/arvados-api/shared/vendor_bundle/ruby/2.7.0/gems/railties-5.2.6/lib/rails/engine.rb:524:in `call' App 18334 stderr: from /var/www/arvados-api/shared/vendor_bundle/ruby/2.7.0/gems/railties-5.2.6/lib/rails/railtie.rb:190:in `public_send' App 18334 stderr: from /var/www/arvados-api/shared/vendor_bundle/ruby/2.7.0/gems/railties-5.2.6/lib/rails/railtie.rb:190:in `method_missing' App 18334 stderr: from /usr/share/ruby/vendor_ruby/phusion_passenger/rack/thread_handler_extension.rb:97:in `process_request' App 18334 stderr: from /usr/share/ruby/vendor_ruby/phusion_passenger/request_handler/thread_handler.rb:152:in `accept_and_process_next_request' App 18334 stderr: from /usr/share/ruby/vendor_ruby/phusion_passenger/request_handler/thread_handler.rb:113:in `main_loop' App 18334 stderr: from /usr/share/ruby/vendor_ruby/phusion_passenger/request_handler.rb:416:in `block (3 levels) in start_threads' App 18334 stderr: from /usr/share/ruby/vendor_ruby/phusion_passenger/utils.rb:113:in `block in create_thread_and_abort_on_exception' [ 2021-08-06 09:54:55.4633 18288/7f15cbfff700 age/Cor/Con/InternalUtils.cpp:112 ]: [Client 2-787] Sending 502 response: application did not send a complete response
It seems the problem is related to a race when the remote token isn't cached on the cluster and many request for the same token are in flight, some of them will try to create the same cached token producing a collision.
Actions