Bug #20909
closedPySDK tests.test_keep_client.KeepDiskCacheTestCase.test_disk_cache_cap fails on Debian 12 with a "real" $TMPDIR filesystem
Description
This test fails consistently on my Debian 12 system running Python 3.11 (from the Debian package) or Python 3.8 (built from source):
====================================================================== FAIL: test_disk_cache_cap (tests.test_keep_client.KeepDiskCacheTestCase.test_disk_cache_cap) ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/brett/Curii/arvados/sdk/python/.eggs/mock-3.0.5-py3.11.egg/mock/mock.py", line 1330, in patched return func(*args, **keywargs) ^^^^^^^^^^^^^^^^^^^^^^^ File "/home/brett/Curii/arvados/sdk/python/tests/test_keep_client.py", line 1700, in test_disk_cache_cap self.assertFalse(os.path.exists(os.path.join(self.disk_cache_dir, self.locator[0:3], self.locator+".keepcacheblock"))) AssertionError: True is not false
This might be specific to my system and a non-issue but looking at the test I'm skeptical. My first guess is that something is changing the ordering of things somewhere such that KeepBlockCache removes the more recent block, not the first one.
Files
Updated by Brett Smith over 1 year ago
- Subject changed from Failing PySDK test on Debian 12/Python 3.11 to Failing PySDK test on Debian 12
This test also fails even if you build your own Python 3.8 and run the tests with it.
Updated by Peter Amstutz about 1 year ago
- Target version changed from To be scheduled to Future
Updated by Brett Smith about 1 year ago
- Subject changed from Failing PySDK test on Debian 12 to PySDK tests.test_keep_client.KeepDiskCacheTestCase fails on Debian 12
Updated by Brett Smith about 1 year ago
- Subject changed from PySDK tests.test_keep_client.KeepDiskCacheTestCase fails on Debian 12 to PySDK tests.test_keep_client.KeepDiskCacheTestCase.test_disk_cache_cap fails on Debian 12
Updated by Peter Amstutz 3 months ago
- Target version changed from Future to Development 2025-02-12
Updated by Peter Amstutz 2 months ago
- Target version changed from Development 2025-02-12 to Development 2025-02-26
Updated by Peter Amstutz about 2 months ago
- Target version changed from Development 2025-02-26 to Development 2025-03-19
Updated by Peter Amstutz about 2 months ago
- Target version changed from Development 2025-03-19 to Development 2025-02-26
Updated by Brett Smith about 2 months ago
I cannot currently reproduce this in my Debian 12 VM. I created a completely fresh run-tests tempdir with both Python 3.8 and Python 3.11 and ran 10 test sdk/python
for both. Everything passed.
Updated by Brett Smith about 2 months ago
It seems plausible that the work on #22420 fixed this.
I would say, see if Tom can reproduce, if he can't either let's call it good.
Updated by Tom Clegg about 2 months ago
Still failing 30/30 times for me at 7301a282a5. Starting with a fresh run-tests tempdir doesn't help, still fails 30/30 times. Same VM as #note-7, same stock python 3.11.
$ dpkg-query --show python3.11 python3.11 3.11.2-6+deb12u5
Updated by Brett Smith about 2 months ago
- File pip-freeze.log pip-freeze.log added
- File apt-installed.log apt-installed.log added
I made a new VM and using the same stock Python as Tom I still can't reproduce the failure after multiple attempts.
At this point I'm gonna just start digging into the code but for posterity I've attached all the versions of Debian packages installed on the system as well as PyPI packages installed in the test VENV3DIR
.
Updated by Brett Smith about 2 months ago
I am putting together a matrix but the short update is I have figured out that the difference is down to the underlying filesystem of $TMPDIR
. It mostly passes is $TMPDIR
is tmpfs, and mostly fails if $TMPDIR
is btrfs or ext4.
Updated by Brett Smith about 2 months ago
btrfs TMPDIR - Failed 50/50 times but I have seen it pass occasionally
ext4 TMPDIR - Failed 50/50 times, I have never seen it pass
tmpfs TMPDIR - Failed 0/50 times
Current hypothesis is that the test is expecting a particular order of operations from the kernel and it gets that on tmpfs but doesn't on other filesystems (maybe with more recent kernels?).
Updated by Brett Smith about 2 months ago
- Status changed from New to In Progress
You can get the test to pass if you insert time.sleep(1)
in between the two file creations at the top of the test method. Current theory is that on typical deployments of real disk filesystems, mtimes are crushed a little bit to preserve disk wear. When this happens, the code under test will see them as equal, and may choose to delete the opposite of the intended cache file.
Updated by Brett Smith about 2 months ago
20909-keep-disk-cache-test-mtime @ 16b7169b5a96d369c3da1520d3ff5f19aca230cf - developer-run-tests: #4665
- All agreed upon points are implemented / addressed.
- Yes
- Anything not implemented (discovered or discussed during work) has a follow-up story.
- N/A
- Code is tested and passing, both automated and manual, what manual testing was done is described
- See above. On my system this specific test now passes 50/50 runs when backed by an ext4
$TMPDIR
where it previously failed 50/50.
- See above. On my system this specific test now passes 50/50 runs when backed by an ext4
- Documentation has been updated.
- N/A, pure test change
- Behaves appropriately at the intended scale (describe intended scale).
- No change
- Considered backwards and forwards compatibility issues between client and server.
- No change
- Follows our coding standards and GUI style guidelines.
- Yes
Updated by Brett Smith about 2 months ago
- Subject changed from PySDK tests.test_keep_client.KeepDiskCacheTestCase.test_disk_cache_cap fails on Debian 12 to PySDK tests.test_keep_client.KeepDiskCacheTestCase.test_disk_cache_cap fails on Debian 12 with a "real" $TMPDIR filesystem
Updated by Tom Clegg about 2 months ago
- main passed 1/50
- 20909-keep-disk-cache-test-mtime passed 50/50
Updated by Brett Smith about 2 months ago
- Status changed from In Progress to Resolved