Project

General

Profile

Actions

Bug #14966

closed

[API] Fix hanging test - suspect permission changes

Added by Tom Clegg about 5 years ago. Updated almost 5 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
-
Target version:
Story points:
-
Release relationship:
Auto

Description

test console https://ci.curoverse.com/job/run-tests-services-api/2605/console

...
12:55:16 Arvados::V1::KeepDisksControllerTest#test_refuse_to_add_keep_disk_without_admin_token = 0.01 s = .
12:55:16 Arvados::V1::KeepDisksControllerTest#test_search_keep_services_with_'any'_operator = 0.03 s = .
[hangs here]

ps

postgres 15393  0.4  1.1 329040 91640 ?        Ss   16:49   0:18 postgres: 9.6/main: ci_test_user arvados_test_api ::1(40670) REFRESH MATERIALIZED VIEW waiting
postgres 15519  0.2  1.0 318256 82320 ?        Ss   16:50   0:08 postgres: 9.6/main: ci_test_user arvados_test_api ::1(40706) idle in transaction
postgres 15521  0.2  1.2 336216 92800 ?        Ss   16:50   0:08 postgres: 9.6/main: ci_test_user arvados_test_api ::1(40708) PARSE waiting

Suspicious of a race condition in the async-permission-update changes recently merged in #13593.


Subtasks 1 (0 open1 closed)

Task #14970: Review 14966-api-test-hang-fixResolvedPeter Amstutz03/13/2019Actions

Related issues

Related to Arvados - Bug #13593: [API] Sequence of "create group" requests runs slowly, and can crash API serverResolvedLucas Di Pentima03/12/2019Actions
Actions #1

Updated by Tom Clegg about 5 years ago

  • Related to Bug #13593: [API] Sequence of "create group" requests runs slowly, and can crash API server added
Actions #2

Updated by Lucas Di Pentima about 5 years ago

  • Assigned To set to Lucas Di Pentima
  • Target version set to 2019-03-27 Sprint
Actions #3

Updated by Lucas Di Pentima about 5 years ago

  • Status changed from New to In Progress
Actions #4

Updated by Lucas Di Pentima about 5 years ago

Updates at 0e9aa62c7 - branch 14966-api-test-hang-fix
Test run: https://ci.curoverse.com/job/developer-run-tests/1123/

  • Facts
  • Hypothesis
    • When a permission-changing operation is made within a test, we’re getting nested transactions that share the same database connection. This is usually fine because both run within the same thread, but for this special case that we expect the inner transaction to run concurrently in its own thread, it doesn’t happen because the connection is already occupied and sits waiting to be released before continuing.
  • Fix
    • Although I already tried to manually ask for a new connection using ActiveRecord::Base.connection_pool.with_connection do ... end on refresh_permission_view(), it seems there’s no obvious way to ask for a new connection so instead the proposed fix would be to move this special test on its own test case class, disabling transactional fixtures.
    • The “cons” that this has is that the changes made on this test would leak to other tests, so we have to make sure to clean up before finishing.
Actions #5

Updated by Peter Amstutz about 5 years ago

Lucas Di Pentima wrote:

Updates at 0e9aa62c7 - branch 14966-api-test-hang-fix
Test run: https://ci.curoverse.com/job/developer-run-tests/1123/

  • Facts
  • Hypothesis
    • When a permission-changing operation is made within a test, we’re getting nested transactions that share the same database connection. This is usually fine because both run within the same thread, but for this special case that we expect the inner transaction to run concurrently in its own thread, it doesn’t happen because the connection is already occupied and sits waiting to be released before continuing.
  • Fix
    • Although I already tried to manually ask for a new connection using ActiveRecord::Base.connection_pool.with_connection do ... end on refresh_permission_view(), it seems there’s no obvious way to ask for a new connection so instead the proposed fix would be to move this special test on its own test case class, disabling transactional fixtures.
    • The “cons” that this has is that the changes made on this test would leak to other tests, so we have to make sure to clean up before finishing.

I'm not sure I totally understand the explanation, but agree that transactions / connection pooling could be causing trouble. Since it is an integration test it seems reasonable to run the test without the an outer transaction. Couple notes:

The test should be wrapped in a begin ... ensure .. end.

To reset the database, suggest using the database reset endpoint:

    post '/database/reset', {}, admin_auth
    assert_response :success

typo "accesible" -> "accessible"

Actions #6

Updated by Lucas Di Pentima about 5 years ago

Updates at 0145b3654
Test run: https://ci.curoverse.com/job/developer-run-tests/1126/

Instead of resetting the database inside an ensure block, do it at test case's teardown.

Actions #7

Updated by Lucas Di Pentima about 5 years ago

  • Status changed from In Progress to Resolved
  • % Done changed from 0 to 100
Actions #8

Updated by Tom Morris almost 5 years ago

  • Release set to 15
Actions

Also available in: Atom PDF