MAINT replace enable_slep006 fixture by @config_context(enable_metadata_routing=True) #30038

ogrisel · 2024-10-09T15:17:40Z

This is a first step towards making it possible to use thread-based parallelism to run the tests to be able to test with the new free-threading mode of Python 3.13.

Discussed in #30007 (comment).

In short: sklearn.config_context stores the active config in thread local variables. However, all the pytest fixtures are typically run in the main thread while the tests itself are run in a different thread when using one of the recent pytest plugins dedicated to running the tests in parallel using a thread pool (to detect GIL related compatibility issues).

One could argue that this is a design bug of those plugins, but more pragmatically, we can simplify our test suite by directly using sklearn.config_context as a function decorator instead of relying on a dedicated fixture. The function decorator is executed on the same thread as where the test function itself is run.

…ta_routing=True)

github-actions · 2024-10-09T15:19:06Z

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

_{Generated for commit: 925b8c7. Link to the linter CI: here}

ogrisel · 2024-10-09T16:35:33Z

/cc @lesteve @adrinjalali

glemaitre

LGTM.

ogrisel · 2024-10-10T09:04:51Z

/cc @StefanieSenger just to let you know about this PR for future PRs related to metadata routing.

lesteve · 2024-10-10T09:19:28Z

Let's merge this one, thanks!

…ta_routing=True) (scikit-learn#30038)

StefanieSenger · 2024-10-14T07:42:01Z

I am wondering why the @config_context(enable_metadata_routing=True) decorator is put after the parametrization of the tests compared to the fixture that was used before.
This way, the context is set (and unset) many more times than it would be necessary.

adrinjalali · 2024-10-15T07:54:47Z

sklearn/tests/test_metadata_routing.py

@@ -62,13 +62,6 @@
 my_other_weights = rng.rand(N)


-@pytest.fixture(autouse=True)


@ogrisel was there no other way in the files where this needs to be always active? I don't like that every single test has it here.

I personally lean slightly towards being explicit even if that means more repeated code but yeah I do get your point.

I guess if you really wanted to, you could do something like below. I am not sure I would recommend it though, it feels slightly too magical. One caveat is that the for loop needs to be at the very bottom of the file otherwise it will not decorate all test functions.

def decorator(f): def wrapped(*args, **kwargs): print('decorator') return f(*args, **kwargs) return wrapped def test_1(): pass def test_2(): pass my_locals = [(name, obj) for name, obj in locals().items() if name.startswith('test_') and callable(obj)] for name, obj in my_locals: locals()[name] = decorator(obj)

Or we simply use a thread safe version of a pytest fixture?

I wished "simply" applied here but unfortunately it's not the case 😉

One of the issue is that the fixture runs in the main thread and the test function (through the pytest plugin like pytest-run-parallel) runs inside a different thread. So if you change scikit-learn config settings in the fixture your changing the config in the main thread and that does not affect the config in the test function.

See #30007 (comment) for more details.

ogrisel · 2024-10-15T16:26:57Z

Answering @StefanieSenger's remarks:

I am wondering why the @config_context(enable_metadata_routing=True) decorator is put after the parametrization of the tests compared to the fixture that was used before.

I don't think it matters, but I thought I would put the non-pytest decorator closer to the function definition to better convey the intention that we want this context config to be set right around the function call, even if the different test parametrization are not run consecutively and furthermore if they are run concurrently.

This way, the context is set (and unset) many more times than it would be necessary.

Context is set and unset at each function call in either cases (whatever the relative ordering of the decorators). This can be checked by adding print statements inside the body of config_context (before and after the yield line) and running pytest with the -s command line flag to disabled print output.

diff --git a/sklearn/_config.py b/sklearn/_config.py
index 05549c88a9..0518a796c0 100644
--- a/sklearn/_config.py
+++ b/sklearn/_config.py
@@ -357,6 +357,7 @@ def config_context(
     ValueError: Input contains NaN...
     """
     old_config = get_config()
+    new_config = {k: locals().get(k) for k in old_config.keys()}
     set_config(
         assume_finite=assume_finite,
         working_memory=working_memory,
@@ -369,8 +370,13 @@ def config_context(
         enable_metadata_routing=enable_metadata_routing,
         skip_parameter_validation=skip_parameter_validation,
     )
-
+    filtered_new_config = {k: v for k, v in new_config.items() if v is not None}
+    print(f"set_config with temporary values: {filtered_new_config}")
     try:
         yield
     finally:
         set_config(**old_config)
+        filtered_old_config = {
+            k: old_config[k] for k, v in new_config.items() if v is not None
+        }
+        print(f"set_config with original values {filtered_old_config}")

If we run pytest for a function with the double decoration (@pytest.mark.parametrize + @config_context) we get:

$ pytest -s -v -k test_get_metadata_routing_without_fit sklearn/ensemble/tests/test_voting.py
...
collected 39 items / 37 deselected / 2 selected                                                                                                                         

sklearn/ensemble/tests/test_voting.py::test_get_metadata_routing_without_fit[VotingClassifier-ConsumingClassifier] I: Seeding RNGs with 189435274
set_config with temporary values: {'enable_metadata_routing': True}
set_config with original values {'enable_metadata_routing': False}
PASSED
sklearn/ensemble/tests/test_voting.py::test_get_metadata_routing_without_fit[VotingRegressor-ConsumingRegressor] set_config with temporary values: {'enable_metadata_routing': True}
set_config with original values {'enable_metadata_routing': False}
PASSED

As expected.

Then we can swap the order of the decorators for a specific test and rerun the pytest command for that test function:

index e944ecc4ab..e2a78219da 100644
--- a/sklearn/ensemble/tests/test_stacking.py
+++ b/sklearn/ensemble/tests/test_stacking.py
@@ -921,6 +921,7 @@ def test_routing_passed_metadata_not_supported(Estimator, Child):
         )
 
 
+@config_context(enable_metadata_routing=True)
 @pytest.mark.parametrize(
     "Estimator, Child",
     [
@@ -928,7 +929,6 @@ def test_routing_passed_metadata_not_supported(Estimator, Child):
         (StackingRegressor, ConsumingRegressor),
     ],
 )
-@config_context(enable_metadata_routing=True)
 def test_get_metadata_routing_without_fit(Estimator, Child):
     # Test that metadata_routing() doesn't raise when called before fit.
     est = Estimator([("sub_est", Child())])

we get the same behavior:

$ pytest -s -v -k test_get_metadata_routing_without_fit sklearn/ensemble/tests/test_voting.py
...
collected 39 items / 37 deselected / 2 selected                                                                                                                         

sklearn/ensemble/tests/test_voting.py::test_get_metadata_routing_without_fit[VotingClassifier-ConsumingClassifier] I: Seeding RNGs with 189435274
set_config with temporary values: {'enable_metadata_routing': True}
set_config with original values {'enable_metadata_routing': False}
PASSED
sklearn/ensemble/tests/test_voting.py::test_get_metadata_routing_without_fit[VotingRegressor-ConsumingRegressor] set_config with temporary values: {'enable_metadata_routing': True}
set_config with original values {'enable_metadata_routing': False}
PASSED

But this is what we want: we need the config_context to be configured and reconfigured for each individual test to make their execution isolated, even if they run out of order (with other unrelated test functions in between two invocations of test_get_metadata_routing_without_fit for instance with pytest-random-order) or via thread-based concurrency.

Furthermore, the per-function call overhead of config_context decoration is very cheap (probably less than 100 micro-seconds) while test function execution themselves are typically more expensive (often more than 1 millisecond), so it's not a problem to use fine-grained @config_context decoration.

StefanieSenger · 2024-10-17T06:46:21Z

@ogrisel Thank you for your comment on my question. It helped me understand the context gets set the same number of times either way (still a bit startled on the why, but it's okay, this is not what matters).

But this is what we want: we need the config_context to be configured and reconfigured for each individual test to make their execution isolated, even if they run out of order (with other unrelated test functions in between two invocations of test_get_metadata_routing_without_fit for instance with pytest-random-order) or via thread-based concurrency.

Happy, that this is the case. :)

MAINT replace enable_slep006 fixture by @config_context(enable_metada…

925b8c7

…ta_routing=True)

ogrisel added the No Changelog Needed label Oct 9, 2024

glemaitre approved these changes Oct 9, 2024

View reviewed changes

ogrisel added the Quick Review For PRs that are quick to review label Oct 10, 2024

lesteve merged commit ff02e17 into scikit-learn:main Oct 10, 2024
42 of 43 checks passed

ogrisel deleted the drop-enable_slep006-fixture branch October 10, 2024 10:13

ogrisel mentioned this pull request Oct 10, 2024

Make the test suite itself thread-safe to be able to detect thread-safety problems with or without free-threading #30041

Draft

BenJourdan pushed a commit to gregoryschwartzman/scikit-learn that referenced this pull request Oct 11, 2024

MAINT replace enable_slep006 fixture by @config_context(enable_metada…

84d2879

…ta_routing=True) (scikit-learn#30038)

adrinjalali reviewed Oct 15, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

MAINT replace enable_slep006 fixture by @config_context(enable_metadata_routing=True) #30038

MAINT replace enable_slep006 fixture by @config_context(enable_metadata_routing=True) #30038

Uh oh!

ogrisel commented Oct 9, 2024 •

edited

Loading

Uh oh!

github-actions bot commented Oct 9, 2024

Uh oh!

ogrisel commented Oct 9, 2024

Uh oh!

glemaitre left a comment

Uh oh!

ogrisel commented Oct 10, 2024

Uh oh!

lesteve commented Oct 10, 2024

Uh oh!

Uh oh!

StefanieSenger commented Oct 14, 2024

Uh oh!

adrinjalali Oct 15, 2024

Uh oh!

lesteve Oct 15, 2024 •

edited

Loading

Uh oh!

adrinjalali Oct 15, 2024

Uh oh!

lesteve Oct 15, 2024

Uh oh!

ogrisel commented Oct 15, 2024 •

edited

Loading

Uh oh!

StefanieSenger commented Oct 17, 2024

Uh oh!

Uh oh!

		@@ -62,13 +62,6 @@
		my_other_weights = rng.rand(N)


		@pytest.fixture(autouse=True)

Uh oh!

MAINT replace enable_slep006 fixture by @config_context(enable_metadata_routing=True) #30038

MAINT replace enable_slep006 fixture by @config_context(enable_metadata_routing=True) #30038

Uh oh!

Conversation

ogrisel commented Oct 9, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Oct 9, 2024

✔️ Linting Passed

Uh oh!

ogrisel commented Oct 9, 2024

Uh oh!

glemaitre left a comment

Choose a reason for hiding this comment

Uh oh!

ogrisel commented Oct 10, 2024

Uh oh!

lesteve commented Oct 10, 2024

Uh oh!

Uh oh!

StefanieSenger commented Oct 14, 2024

Uh oh!

adrinjalali Oct 15, 2024

Choose a reason for hiding this comment

Uh oh!

lesteve Oct 15, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

adrinjalali Oct 15, 2024

Choose a reason for hiding this comment

Uh oh!

lesteve Oct 15, 2024

Choose a reason for hiding this comment

Uh oh!

ogrisel commented Oct 15, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

StefanieSenger commented Oct 17, 2024

Uh oh!

Uh oh!

ogrisel commented Oct 9, 2024 •

edited

Loading

lesteve Oct 15, 2024 •

edited

Loading

ogrisel commented Oct 15, 2024 •

edited

Loading