Skip to content

MAINT replace enable_slep006 fixture by @config_context(enable_metadata_routing=True) #30038

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Oct 10, 2024

Conversation

ogrisel
Copy link
Member

@ogrisel ogrisel commented Oct 9, 2024

This is a first step towards making it possible to use thread-based parallelism to run the tests to be able to test with the new free-threading mode of Python 3.13.

Discussed in #30007 (comment).

In short: sklearn.config_context stores the active config in thread local variables. However, all the pytest fixtures are typically run in the main thread while the tests itself are run in a different thread when using one of the recent pytest plugins dedicated to running the tests in parallel using a thread pool (to detect GIL related compatibility issues).

One could argue that this is a design bug of those plugins, but more pragmatically, we can simplify our test suite by directly using sklearn.config_context as a function decorator instead of relying on a dedicated fixture. The function decorator is executed on the same thread as where the test function itself is run.

Copy link

github-actions bot commented Oct 9, 2024

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: 925b8c7. Link to the linter CI: here

@ogrisel
Copy link
Member Author

ogrisel commented Oct 9, 2024

/cc @lesteve @adrinjalali

Copy link
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@ogrisel ogrisel added the Quick Review For PRs that are quick to review label Oct 10, 2024
@ogrisel
Copy link
Member Author

ogrisel commented Oct 10, 2024

/cc @StefanieSenger just to let you know about this PR for future PRs related to metadata routing.

@lesteve
Copy link
Member

lesteve commented Oct 10, 2024

Let's merge this one, thanks!

@lesteve lesteve merged commit ff02e17 into scikit-learn:main Oct 10, 2024
42 of 43 checks passed
@ogrisel ogrisel deleted the drop-enable_slep006-fixture branch October 10, 2024 10:13
BenJourdan pushed a commit to gregoryschwartzman/scikit-learn that referenced this pull request Oct 11, 2024
@StefanieSenger
Copy link
Contributor

I am wondering why the @config_context(enable_metadata_routing=True) decorator is put after the parametrization of the tests compared to the fixture that was used before.
This way, the context is set (and unset) many more times than it would be necessary.

@@ -62,13 +62,6 @@
my_other_weights = rng.rand(N)


@pytest.fixture(autouse=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ogrisel was there no other way in the files where this needs to be always active? I don't like that every single test has it here.

Copy link
Member

@lesteve lesteve Oct 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally lean slightly towards being explicit even if that means more repeated code but yeah I do get your point.

I guess if you really wanted to, you could do something like below. I am not sure I would recommend it though, it feels slightly too magical. One caveat is that the for loop needs to be at the very bottom of the file otherwise it will not decorate all test functions.

def decorator(f):
    def wrapped(*args, **kwargs):
        print('decorator')
        return f(*args, **kwargs)
    return wrapped

def test_1():
    pass

def test_2():
    pass

my_locals = [(name, obj) for name, obj in locals().items() if name.startswith('test_') and callable(obj)]

for name, obj in my_locals:
    locals()[name] = decorator(obj)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or we simply use a thread safe version of a pytest fixture?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wished "simply" applied here but unfortunately it's not the case 😉

One of the issue is that the fixture runs in the main thread and the test function (through the pytest plugin like pytest-run-parallel) runs inside a different thread. So if you change scikit-learn config settings in the fixture your changing the config in the main thread and that does not affect the config in the test function.

See #30007 (comment) for more details.

@ogrisel
Copy link
Member Author

ogrisel commented Oct 15, 2024

Answering @StefanieSenger's remarks:

I am wondering why the @config_context(enable_metadata_routing=True) decorator is put after the parametrization of the tests compared to the fixture that was used before.

I don't think it matters, but I thought I would put the non-pytest decorator closer to the function definition to better convey the intention that we want this context config to be set right around the function call, even if the different test parametrization are not run consecutively and furthermore if they are run concurrently.

This way, the context is set (and unset) many more times than it would be necessary.

Context is set and unset at each function call in either cases (whatever the relative ordering of the decorators). This can be checked by adding print statements inside the body of config_context (before and after the yield line) and running pytest with the -s command line flag to disabled print output.

diff --git a/sklearn/_config.py b/sklearn/_config.py
index 05549c88a9..0518a796c0 100644
--- a/sklearn/_config.py
+++ b/sklearn/_config.py
@@ -357,6 +357,7 @@ def config_context(
     ValueError: Input contains NaN...
     """
     old_config = get_config()
+    new_config = {k: locals().get(k) for k in old_config.keys()}
     set_config(
         assume_finite=assume_finite,
         working_memory=working_memory,
@@ -369,8 +370,13 @@ def config_context(
         enable_metadata_routing=enable_metadata_routing,
         skip_parameter_validation=skip_parameter_validation,
     )
-
+    filtered_new_config = {k: v for k, v in new_config.items() if v is not None}
+    print(f"set_config with temporary values: {filtered_new_config}")
     try:
         yield
     finally:
         set_config(**old_config)
+        filtered_old_config = {
+            k: old_config[k] for k, v in new_config.items() if v is not None
+        }
+        print(f"set_config with original values {filtered_old_config}")

If we run pytest for a function with the double decoration (@pytest.mark.parametrize + @config_context) we get:

$ pytest -s -v -k test_get_metadata_routing_without_fit sklearn/ensemble/tests/test_voting.py
...
collected 39 items / 37 deselected / 2 selected                                                                                                                         

sklearn/ensemble/tests/test_voting.py::test_get_metadata_routing_without_fit[VotingClassifier-ConsumingClassifier] I: Seeding RNGs with 189435274
set_config with temporary values: {'enable_metadata_routing': True}
set_config with original values {'enable_metadata_routing': False}
PASSED
sklearn/ensemble/tests/test_voting.py::test_get_metadata_routing_without_fit[VotingRegressor-ConsumingRegressor] set_config with temporary values: {'enable_metadata_routing': True}
set_config with original values {'enable_metadata_routing': False}
PASSED

As expected.

Then we can swap the order of the decorators for a specific test and rerun the pytest command for that test function:

index e944ecc4ab..e2a78219da 100644
--- a/sklearn/ensemble/tests/test_stacking.py
+++ b/sklearn/ensemble/tests/test_stacking.py
@@ -921,6 +921,7 @@ def test_routing_passed_metadata_not_supported(Estimator, Child):
         )
 
 
+@config_context(enable_metadata_routing=True)
 @pytest.mark.parametrize(
     "Estimator, Child",
     [
@@ -928,7 +929,6 @@ def test_routing_passed_metadata_not_supported(Estimator, Child):
         (StackingRegressor, ConsumingRegressor),
     ],
 )
-@config_context(enable_metadata_routing=True)
 def test_get_metadata_routing_without_fit(Estimator, Child):
     # Test that metadata_routing() doesn't raise when called before fit.
     est = Estimator([("sub_est", Child())])

we get the same behavior:

$ pytest -s -v -k test_get_metadata_routing_without_fit sklearn/ensemble/tests/test_voting.py
...
collected 39 items / 37 deselected / 2 selected                                                                                                                         

sklearn/ensemble/tests/test_voting.py::test_get_metadata_routing_without_fit[VotingClassifier-ConsumingClassifier] I: Seeding RNGs with 189435274
set_config with temporary values: {'enable_metadata_routing': True}
set_config with original values {'enable_metadata_routing': False}
PASSED
sklearn/ensemble/tests/test_voting.py::test_get_metadata_routing_without_fit[VotingRegressor-ConsumingRegressor] set_config with temporary values: {'enable_metadata_routing': True}
set_config with original values {'enable_metadata_routing': False}
PASSED

But this is what we want: we need the config_context to be configured and reconfigured for each individual test to make their execution isolated, even if they run out of order (with other unrelated test functions in between two invocations of test_get_metadata_routing_without_fit for instance with pytest-random-order) or via thread-based concurrency.

Furthermore, the per-function call overhead of config_context decoration is very cheap (probably less than 100 micro-seconds) while test function execution themselves are typically more expensive (often more than 1 millisecond), so it's not a problem to use fine-grained @config_context decoration.

@StefanieSenger
Copy link
Contributor

@ogrisel Thank you for your comment on my question. It helped me understand the context gets set the same number of times either way (still a bit startled on the why, but it's okay, this is not what matters).

But this is what we want: we need the config_context to be configured and reconfigured for each individual test to make their execution isolated, even if they run out of order (with other unrelated test functions in between two invocations of test_get_metadata_routing_without_fit for instance with pytest-random-order) or via thread-based concurrency.

Happy, that this is the case. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
No Changelog Needed Quick Review For PRs that are quick to review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants