-
-
Notifications
You must be signed in to change notification settings - Fork 26.2k
CI Run free-threaded test suite with pytest-run-parallel #32023
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
CI Run free-threaded test suite with pytest-run-parallel #32023
Conversation
5e0ee8f
to
47b7d33
Compare
@@ -1316,6 +1316,7 @@ def _check_stop_words_consistency(estimator): | |||
return estimator._check_stop_words_consistency(stop_words, preprocess, tokenize) | |||
|
|||
|
|||
@pytest.mark.thread_unsafe |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you know why this is needed? I thought that the warnings
module was made thread-safe in Python 3.14.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I got an error when running the test locally but I did not investigate this one more closely:
❯ PYTHON_GIL=1 pytest sklearn/feature_extraction/tests/test_text.py --parallel-threads 4 --iterations 100 -k inconsistent -vl
===================================================================================================================================================== test session starts ======================================================================================================================================================
platform linux -- Python 3.14.0rc1, pytest-8.4.1, pluggy-1.6.0 -- /home/lesteve/micromamba/envs/py314t/bin/python
cachedir: .pytest_cache
rootdir: /home/lesteve/dev/alt-scikit-learn
configfile: pyproject.toml
plugins: run-parallel-0.6.1
collected 131 items / 130 deselected / 1 selected
Collected 128 items to run in parallel
sklearn/feature_extraction/tests/test_text.py::test_vectorizer_stop_words_inconsistent PARALLEL FAILED [100%]
============================================================================================================================================================ ERRORS ============================================================================================================================================================
___________________________________________________________________________________________________________________________________ ERROR at call of test_vectorizer_stop_words_inconsistent ___________________________________________________________________________________________________________________________________
def test_vectorizer_stop_words_inconsistent():
lstr = r"\['and', 'll', 've'\]"
message = (
"Your stop_words may be inconsistent with your "
"preprocessing. Tokenizing the stop words generated "
"tokens %s not in stop_words." % lstr
)
for vec in [CountVectorizer(), TfidfVectorizer(), HashingVectorizer()]:
vec.set_params(stop_words=["you've", "you", "you'll", "AND"])
> with pytest.warns(UserWarning, match=message):
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
E Failed: DID NOT WARN. No warnings of type (<class 'UserWarning'>,) were emitted.
E Emitted warnings: [].
lstr = "\\['and', 'll', 've'\\]"
message = "Your stop_words may be inconsistent with your preprocessing. Tokenizing the stop words generated tokens \\['and', 'll', 've'\\] not in stop_words."
vec = HashingVectorizer(stop_words=["you've", 'you', "you'll", 'AND'])
sklearn/feature_extraction/tests/test_text.py:1329: Failed
---------------------------------------------------------------------------------------------------------------------------------------------------- Captured stdout setup -----------------------------------------------------------------------------------------------------------------------------------------------------
I: Seeding RNGs with 1478313413
************************************************************************************************************************************************** pytest-run-parallel report **************************************************************************************************************************************************
3 tests were not run in parallel because of use of thread-unsafe functionality, to list the tests that were not run in parallel, re-run while setting PYTEST_RUN_PARALLEL_VERBOSE=1 in your shell environment
=================================================================================================================================================== short test summary info ====================================================================================================================================================
PARALLEL FAILED sklearn/feature_extraction/tests/test_text.py::test_vectorizer_stop_words_inconsistent - Failed: DID NOT WARN. No warnings of type (<class 'UserWarning'>,) were emitted.
======================================================================================================================================== 130 deselected, 860 warnings, 1 error in 1.84s ========================================================================================================================================
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking a bit at it, the test passes if I had a uuid.uuid1()
or threading.get_ident()
in the warning message. Maybe a bug in pytest or in the default warnings "once" strategy 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually the warnings semantics w.r.t. thread-safety are different depending on flags that have different values on free-threading and regular builds:
https://docs.python.org/3.14/whatsnew/3.14.html#free-threaded-mode
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I have seen this, but I don't have a good understanding of the implications yet ... it seems that's at least part of the reason behind the remaining failures but this needs more investigation.
… the case in Python 3.14 free-threading
…nto free-threaded-pytest-run-parallel
… around _check_stop_words_consistency and avoid a weird side effect
…314t package is on conda-forge
@@ -1329,18 +1329,19 @@ def test_vectorizer_stop_words_inconsistent(): | |||
vec.fit_transform(["hello world"]) | |||
# reset stop word validation | |||
del vec._stop_words_id | |||
assert _check_stop_words_consistency(vec) is False | |||
with pytest.warns(UserWarning, match=message): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the actual fix, see previous discussion in #32023 (comment).
My guess right now is that the first call _check_stop_words_consistency
was raising a warning and that the next pytest.warns
was sometimes failing because of the default warnings once
strategy plus some unfavourable thread ordering. By putting this inside pytest.warns
we avoid the side-effect and the warning is always issued.
Edit: scipy free-threaded Python 3.14 package was added in conda-forge so the following 2 points are not a problem anymore:
conda-lock has a bug with rc Python versions, I generated the lock-file with a local work-aroundI opened Handle Python release candidates in PyPI solver conda/conda-lock#837 about thisconda-lock does not pick the right free-threaded pip wheel. This is an issue because there is not scipy conda package for free-threaded Python 3.14 yet. I manually tweaked the wheel URL in the lock-file. See free-threaded wheel not picked in pip dependencies conda/conda-lock#754 for the conda-lock bug.