Project

General

Profile

Actions

Idea #21207

closed

Decide and implement a modern way to run Python tests

Added by Brett Smith about 1 year ago. Updated 3 months ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
SDKs
Start date:
05/10/2024
Due date:
Story points:
-
Release:
Release relationship:
Auto

Description

Running setup.py test is deprecated. We should decide and implement a new way to run tests.

Probably the most straightforward migration would be to set up tox. The simplest commands could install our test dependencies and run python -m unittest.

We may get benefits like nicer test reporting if we switch to a front-end like pytest. At least some of them natively support unittest-style tests.


Files

21207-killed-retrying.txt (20.1 KB) 21207-killed-retrying.txt Tom Clegg, 05/06/2024 08:43 PM

Subtasks 1 (0 open1 closed)

Task #21733: Review 21207-pytestResolvedBrett Smith05/10/2024Actions

Related issues

Related to Arvados - Idea #20723: Stop running setup.py in our build+test infrastructureNewActions
Actions #1

Updated by Brett Smith about 1 year ago

  • Related to Idea #20723: Stop running setup.py in our build+test infrastructure added
Actions #2

Updated by Brett Smith 7 months ago

  • Target version changed from Future to Development 2024-05-08 sprint
  • Assigned To set to Brett Smith
Actions #3

Updated by Brett Smith 7 months ago

  • Status changed from New to In Progress

It's gonna be pytest. The project is super-mature and can run unittest tests mostly out of the box. https://docs.pytest.org/en/8.2.x/how-to/unittest.html

I ran it on the Python SDK tests and the only issue were some arv-put tests that use a fake Keep block signature. Under pytest this seems to be making it to the real test API server, validated, and failed, while under current unittest some part of that chain is getting stubbed out.

But I think that's pretty good all things considered. And if we go this route we get things like built-in parameterization and the ability to test with plain assert statements, so it seems worth a little trouble for the long-term ergonomic benefit.

Actions #4

Updated by Brett Smith 7 months ago

Brett Smith wrote in #note-3:

I ran it on the Python SDK tests and the only issue were some arv-put tests that use a fake Keep block signature. Under pytest this seems to be making it to the real test API server, validated, and failed, while under current unittest some part of that chain is getting stubbed out.

It's not even that, one of the test suites is contaminating global state. From there I guess pytest runs tests in a different order where this actually causes a failure. It's a one-line fix and good to do anyway.

Actions #5

Updated by Brett Smith 7 months ago

21207-pytest @ b8751a1a220ea451d44303268ea9207c2152aed2 - developer-run-tests: #4205

Tests failed only on #21635 and #21660. Which is a little convenient in a twisted way: now you can see what new pytest output looks like compared to the old unittest output, both for success and failure. There are options to customize this if we want.

  • All agreed upon points are implemented / addressed.
    • Yes
  • Anything not implemented (discovered or discussed during work) has a follow-up story.
    • N/A - there's lots we could do from here but there should probably be rough team agreement on what that is
  • Code is tested and passing, both automated and manual, what manual testing was done is described
    • See above
  • Documentation has been updated.
    • There are maybe some development wiki pages that need updating but that would be better done post-merge.
  • Behaves appropriately at the intended scale (describe intended scale).
    • No significant change in scale expected. On the one hand, pytest does more and you might expect it to be slower accordingly. On the other hand, it's gotten a lot more developer attention over the years so maybe it's better optimized. There's no huge glaring difference in the test run and any difference would probably be swamped by improving the speed of our own tests over the test runner.
  • Considered backwards and forwards compatibility issues between client and server.
    • N/A
  • Follows our coding standards and GUI style guidelines.
    • Yes
Actions #6

Updated by Tom Clegg 7 months ago

LGTM.

One doc thing that could be updated in the branch itself: in source:build/run-tests.sh, there's a usage example sdk/python_test="--test-suite tests.test_keep_locator.py" which (with an update to refer to a test file that still exists) is now spelled sdk/python_test=tests/test_keep_client.py

While trying this out, I encountered "services/fuse tests killed -- retrying" frequently. This is a feature in run-test.sh but I don't recall seeing it at all lately, so I'm wondering: Is there something about pytest that makes the SIGKILL outcome more frequent, or is this a coincidence?

Edit: This is a coincidence / just a difference in how the crash/retry is reported, although the pre-pytest runner made it easier to see what led to SIGKILL:

2024-05-06 15:46:12 arvados.arv-mount[3443850] WARNING: Mount.__exit__: llfuse thread still alive 60.000000s after umount -- abandoning and exiting anyway
Mount.__exit__: llfuse thread still alive 60.000000s after umount -- abandoning and exiting anyway
IntegrationTest.mount: llfuse thread still alive after umount -- killing test suite to avoid deadlock
./build/run-tests.sh: line 659: 3443850 Killed                  python setup.py ${short:+--short-tests-only} test ${testargs[$1]}

I suppose this is an inevitable side effect of pytest buffering/hiding logs until tests fail. Perhaps it's possible to do something like "fail test, abort test run, let pytest display the errors, then sigkill after that" -- or perhaps finally solve the mystery of how to reliably unmount a fuse mount.

Perhaps the most expedient solution is to update the "killed -- retrying" message to say "see #21207#note-6 or run with flags XYZ to put pytest in unbuffered mode so you can see the cause"...?

Actions #8

Updated by Brett Smith 7 months ago

Tom Clegg wrote in #note-6:

One doc thing that could be updated in the branch itself: in source:build/run-tests.sh, there's a usage example sdk/python_test="--test-suite tests.test_keep_locator.py" which (with an update to refer to a test file that still exists) is now spelled sdk/python_test=tests/test_keep_client.py

Thank you for catching that, done in 6e3b86c8e0c90d71a7920d595e1087ffe465130e. Longer reference

While trying this out, I encountered "services/fuse tests killed -- retrying" frequently. This is a feature in run-test.sh but I don't recall seeing it at all lately, so I'm wondering: Is there something about pytest that makes the SIGKILL outcome more frequent, or is this a coincidence?

I have not been able to reproduce this in 20 tries so I suspect something else is going on. I would be curious if you can still reproduce it after you sudo rm -rf all of your Python build directories (causing the wheel build to fail in your logs), your test VENV3DIR, and ~/.cache/arvados/keep. It might also have to do with the version of something else; I have Python 3.8.19 which is different from your setup (installed from source so I can intentionally test with the oldest Python we support).

I suppose this is an inevitable side effect of pytest buffering/hiding logs until tests fail. Perhaps it's possible to do something like "fail test, abort test run, let pytest display the errors, then sigkill after that" -- or perhaps finally solve the mystery of how to reliably unmount a fuse mount.

Added dcdaf7450fb66da2bc904a21f29660ac75e5864a to do all of that except the SIGKILL part. To me it's not clear whether SIGKILL is actually necessary to get the mount to stop (in which case this might need more work) or if it was just the most expedient way we knew to stop testing with the old unittest runner.

Actions #9

Updated by Brett Smith 7 months ago

Brett Smith wrote in #note-8:

I have not been able to reproduce this in 20 tries so I suspect something else is going on.

Nevermind, I was able to reproduce with 100 test services/fuse:py3 tests/test_cache.py. But only 9 of those failed, so while I don't remember noticing this before I'm not sure this feels like pytest increasing the rate to me. It also shows that the new code does end the tests as desired with improved reporting:

========================================================== test session starts ==========================================================
platform linux -- Python 3.8.19, pytest-8.2.0, pluggy-1.5.0                                                                              
rootdir: /home/brett/Curii/arvados/services/fuse                                                                                         
configfile: pytest.ini                                                                                                                   
collected 1 item                                                                                                                         

tests/test_cache.py Sent SIGTERM to 226351 (/home/brett/Curii/arvados/tmp/keep0.pid)                                                     
Sent SIGTERM to 226370 (/home/brett/Curii/arvados/tmp/keep1.pid)                                                                         

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! _pytest.outcomes.Exit: llfuse thread outlived test !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
=================================================== no tests ran in 63.11s (0:01:03) ====================================================
======= services/fuse tests -- FAILED                                                                                                    
======= test services/fuse -- 65s                                                                                                        

… except the exit code in this case is 2, even though I specified EX_TEMPFAIL and that works fine in smaller test. Blegh.

Actions #10

Updated by Tom Clegg 7 months ago

Brett Smith wrote in #note-8:

To me it's not clear whether SIGKILL is actually necessary to get the mount to stop (in which case this might need more work) or if it was just the most expedient way we knew to stop testing with the old unittest runner.

Based on my recollection, the code, the commit comment, and the redmine comments (#9986#note-3), it was just an expedient way to stop the test runner in a way that was easy for run-tests.sh to recognize.

Actions #11

Updated by Peter Amstutz 7 months ago

  • Target version changed from Development 2024-05-08 sprint to Development 2024-05-22 sprint
Actions #12

Updated by Brett Smith 7 months ago

Now at d7eff3ca2b97bd1729e85c3c85547c9519957b51 - developer-run-tests: #4210

This gets the retry functionality going again by standardizing on exit code 2 (see discussion in comments) and moves all the log information to the exit message to keep it visible. I repeated FUSE tests until they failed to confirm the retry works as intended. I believe this retains all the existing functionality while switching to a clean exit rather than SIGKILL.

Actions #13

Updated by Tom Clegg 6 months ago

LGTM, works here too.

Just one more thought: After running "test services/fuse:py3 tests/test_cache.py" a few dozen times and noticing that it always either succeeds in ~2s or does the fail-retry thing, I'm thinking our unmount timeout could be 10s instead of 60s. Although admittedly with 60s it's very clear that it was never going to work if only we had waited a bit longer. So, either changing it or leaving it is fine with me.

Actions #14

Updated by Brett Smith 6 months ago

Tom Clegg wrote in #note-13:

I'm thinking our unmount timeout could be 10s instead of 60s.

I had a similar thought but digging through history I noticed the timeout recently increased from 10 to 60 seconds in b9fde93b6e24b0575ce81a964be6884231647ee4. There's no explicit rationale but I don't care enough to get into an edit war about it.

Actions #15

Updated by Brett Smith 6 months ago

  • Status changed from In Progress to Resolved
Actions #16

Updated by Peter Amstutz 3 months ago

  • Release set to 70
Actions

Also available in: Atom PDF