Bug #14878

[API] Fix unreliable test

Added by Tom Clegg 10 months ago. Updated 6 months ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
Tests
Target version:
Start date:
05/28/2019
Due date:
% Done:

100%

Estimated time:
(Total: 0.00 h)
Story points:
-

Description

"Update priority" tests fail occasionally.

From https://ci.curoverse.com/job/developer-run-tests-services-api/1095/console

  1) Failure:
UpdatePriorityTest#test_priority_0_but_should_be_>0 [/home/ci-jenkins/.jenkins-slave/workspace/developer-run-tests-services-api/services/api/test/unit/update_priority_test.rb:20]:
Expected 0 to be < 0.

Possible explanation: if a background update task (UpdatePriority.run_update_thread(), started from an after_commit callback in a previous test) is still running when the test case does an explicit inline call to UpdatePriority.update_priority(), the inline call fails to get the lock, and returns immediately. But the background task's updates do not necessarily incorporate the latest priority updates, and even if they do, the resulting updates are not necessarily committed before the test case checks for them.

Possible ways to fix:
  • Don't start the background update thread in the test environment.
  • Pass a "block if needed" flag to update_priority(), so the test case works even when background tasks are running.

Subtasks

Task #14955: Review 14878-priority-raceResolvedTom Clegg

Associated revisions

Revision 3549e648
Added by Tom Clegg 6 months ago

Merge branch '14878-priority-race'

fixes #14878

Arvados-DCO-1.1-Signed-off-by: Tom Clegg <>

History

#1 Updated by Tom Clegg 10 months ago

  • Description updated (diff)

#2 Updated by Tom Morris 9 months ago

  • Target version set to 2019-03-27 Sprint

#3 Updated by Lucas Di Pentima 9 months ago

  • Assigned To set to Lucas Di Pentima

#4 Updated by Tom Morris 9 months ago

  • Target version changed from 2019-03-27 Sprint to 2019-04-10 Sprint

#5 Updated by Lucas Di Pentima 8 months ago

  • Target version changed from 2019-04-10 Sprint to 2019-04-24 Sprint

#6 Updated by Lucas Di Pentima 8 months ago

  • Target version changed from 2019-04-24 Sprint to 2019-05-08 Sprint

#7 Updated by Lucas Di Pentima 7 months ago

  • Target version changed from 2019-05-08 Sprint to 2019-05-22 Sprint

#8 Updated by Lucas Di Pentima 7 months ago

  • Target version changed from 2019-05-22 Sprint to 2019-06-05 Sprint

#9 Updated by Tom Clegg 6 months ago

  • Assigned To changed from Lucas Di Pentima to Tom Clegg

#10 Updated by Tom Clegg 6 months ago

On second thought, disabling the background thread (and updating synchronously instead) in test cases seems undesirable: it would prevent all tests (even other components' integration tests) from experiencing async behavior.

A blocking flock() doesn't work: the background task sometimes gets the lock, then does a postgresql statement that blocks to wait for the main thread to commit/rollback. If the main thread then waits for the lock before (instead of) committing, everyone deadlocks in a way that postgresql can't detect.

Instead, solved by just skipping the lock when running "update priority" from the test case.

14878-priority-race @ 435a5df3e505dfbf67467bd02073f97e63c4c61d -- https://ci.curoverse.com/view/Developer/job/developer-run-tests/1262/

#11 Updated by Tom Clegg 6 months ago

  • Status changed from New to In Progress

#12 Updated by Lucas Di Pentima 6 months ago

I tried to make master fail by running update_priority_test.rb repeatedly on my local machine but wasn't able to reproduce the flaky behavior, even running the entire unit test suite several times.

I think it would be useful to add a comment on update_priority() explaining noblock's use, for posterity. Apart from that, it LGTM.

#13 Updated by Tom Clegg 6 months ago

FWIW, my dev box reproduced the failure about 30% of the time on master, and passed 10 times in a row with this change.

#14 Updated by Tom Clegg 6 months ago

  • Status changed from In Progress to Resolved

Also available in: Atom PDF