Project

General

Profile

Actions

Bug #20909

closed

PySDK tests.test_keep_client.KeepDiskCacheTestCase.test_disk_cache_cap fails on Debian 12 with a "real" $TMPDIR filesystem

Added by Brett Smith over 1 year ago. Updated about 1 month ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
Tests
Target version:
Story points:
0.5
Release relationship:
Auto

Description

This test fails consistently on my Debian 12 system running Python 3.11 (from the Debian package) or Python 3.8 (built from source):

======================================================================
FAIL: test_disk_cache_cap (tests.test_keep_client.KeepDiskCacheTestCase.test_disk_cache_cap)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/brett/Curii/arvados/sdk/python/.eggs/mock-3.0.5-py3.11.egg/mock/mock.py", line 1330, in patched
    return func(*args, **keywargs)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/brett/Curii/arvados/sdk/python/tests/test_keep_client.py", line 1700, in test_disk_cache_cap
    self.assertFalse(os.path.exists(os.path.join(self.disk_cache_dir, self.locator[0:3], self.locator+".keepcacheblock")))
AssertionError: True is not false

This might be specific to my system and a non-issue but looking at the test I'm skeptical. My first guess is that something is changing the ordering of things somewhere such that KeepBlockCache removes the more recent block, not the first one.


Files

pip-freeze.log (1000 Bytes) pip-freeze.log Brett Smith, 02/13/2025 09:52 PM
apt-installed.log (88.1 KB) apt-installed.log Brett Smith, 02/13/2025 09:52 PM

Subtasks 1 (0 open1 closed)

Task #22584: Review 20909-keep-disk-cache-test-mtimeResolvedTom Clegg02/14/2025Actions
Actions #1

Updated by Brett Smith over 1 year ago

  • Subject changed from Failing PySDK test on Debian 12/Python 3.11 to Failing PySDK test on Debian 12

This test also fails even if you build your own Python 3.8 and run the tests with it.

Actions #2

Updated by Brett Smith over 1 year ago

  • Description updated (diff)
Actions #3

Updated by Peter Amstutz about 1 year ago

  • Target version changed from To be scheduled to Future
Actions #4

Updated by Brett Smith about 1 year ago

  • Subject changed from Failing PySDK test on Debian 12 to PySDK tests.test_keep_client.KeepDiskCacheTestCase fails on Debian 12
Actions #5

Updated by Brett Smith about 1 year ago

  • Subject changed from PySDK tests.test_keep_client.KeepDiskCacheTestCase fails on Debian 12 to PySDK tests.test_keep_client.KeepDiskCacheTestCase.test_disk_cache_cap fails on Debian 12
Actions #6

Updated by Tom Clegg 9 months ago

The entire sdk/python test suite passes for me on debian 12, stock python 3.11.2. Also tried running this test 32x, and there were no failures.

Actions #7

Updated by Tom Clegg 5 months ago

On my new dev box (i.e., not the same hardware as #note-6), my new debian 12 VM with stock python 3.11.2 fails this test on 29 of 30 attempts.

Actions #8

Updated by Peter Amstutz 3 months ago

  • Target version changed from Future to Development 2025-02-12
Actions #9

Updated by Peter Amstutz 2 months ago

  • Target version changed from Development 2025-02-12 to Development 2025-02-26
Actions #10

Updated by Peter Amstutz about 2 months ago

  • Target version changed from Development 2025-02-26 to Development 2025-03-19
Actions #11

Updated by Peter Amstutz about 2 months ago

  • Target version changed from Development 2025-03-19 to Development 2025-02-26
Actions #12

Updated by Brett Smith about 2 months ago

I cannot currently reproduce this in my Debian 12 VM. I created a completely fresh run-tests tempdir with both Python 3.8 and Python 3.11 and ran 10 test sdk/python for both. Everything passed.

Actions #13

Updated by Peter Amstutz about 2 months ago

  • Assigned To set to Brett Smith
Actions #14

Updated by Brett Smith about 2 months ago

It seems plausible that the work on #22420 fixed this.

I would say, see if Tom can reproduce, if he can't either let's call it good.

Actions #15

Updated by Tom Clegg about 2 months ago

Still failing 30/30 times for me at 7301a282a5. Starting with a fresh run-tests tempdir doesn't help, still fails 30/30 times. Same VM as #note-7, same stock python 3.11.

$ dpkg-query --show python3.11
python3.11      3.11.2-6+deb12u5

Updated by Brett Smith about 2 months ago

I made a new VM and using the same stock Python as Tom I still can't reproduce the failure after multiple attempts.

At this point I'm gonna just start digging into the code but for posterity I've attached all the versions of Debian packages installed on the system as well as PyPI packages installed in the test VENV3DIR.

Actions #17

Updated by Brett Smith about 2 months ago

I am putting together a matrix but the short update is I have figured out that the difference is down to the underlying filesystem of $TMPDIR. It mostly passes is $TMPDIR is tmpfs, and mostly fails if $TMPDIR is btrfs or ext4.

Actions #18

Updated by Brett Smith about 2 months ago

btrfs TMPDIR - Failed 50/50 times but I have seen it pass occasionally
ext4 TMPDIR - Failed 50/50 times, I have never seen it pass
tmpfs TMPDIR - Failed 0/50 times

Current hypothesis is that the test is expecting a particular order of operations from the kernel and it gets that on tmpfs but doesn't on other filesystems (maybe with more recent kernels?).

Actions #19

Updated by Brett Smith about 2 months ago

  • Status changed from New to In Progress

You can get the test to pass if you insert time.sleep(1) in between the two file creations at the top of the test method. Current theory is that on typical deployments of real disk filesystems, mtimes are crushed a little bit to preserve disk wear. When this happens, the code under test will see them as equal, and may choose to delete the opposite of the intended cache file.

Actions #20

Updated by Brett Smith about 2 months ago

20909-keep-disk-cache-test-mtime @ 16b7169b5a96d369c3da1520d3ff5f19aca230cf - developer-run-tests: #4665

  • All agreed upon points are implemented / addressed.
    • Yes
  • Anything not implemented (discovered or discussed during work) has a follow-up story.
    • N/A
  • Code is tested and passing, both automated and manual, what manual testing was done is described
    • See above. On my system this specific test now passes 50/50 runs when backed by an ext4 $TMPDIR where it previously failed 50/50.
  • Documentation has been updated.
    • N/A, pure test change
  • Behaves appropriately at the intended scale (describe intended scale).
    • No change
  • Considered backwards and forwards compatibility issues between client and server.
    • No change
  • Follows our coding standards and GUI style guidelines.
    • Yes
Actions #21

Updated by Brett Smith about 2 months ago

  • Subject changed from PySDK tests.test_keep_client.KeepDiskCacheTestCase.test_disk_cache_cap fails on Debian 12 to PySDK tests.test_keep_client.KeepDiskCacheTestCase.test_disk_cache_cap fails on Debian 12 with a "real" $TMPDIR filesystem
Actions #22

Updated by Brett Smith about 2 months ago

  • Subtask #22584 added
Actions #23

Updated by Tom Clegg about 2 months ago

LGTM. On my VM just now (ext4):
  • main passed 1/50
  • 20909-keep-disk-cache-test-mtime passed 50/50
Actions #24

Updated by Brett Smith about 2 months ago

  • Status changed from In Progress to Resolved
Actions #25

Updated by Peter Amstutz about 1 month ago

  • Release set to 75
Actions

Also available in: Atom PDF