FIX pass explicit configuration to delayed #25290

glemaitre · 2023-01-04T16:11:18Z

Working alternative to #25242
closes #25242
closes #25239

This is an alternative to #25242 that does not work if the thread import scikit-learn is different from the thread making the call to Parallel.

Here, we have an alternative where we pass explicitly the configuration that is obtained by the thread that makes the Parallel code.

We raise a warning if this is not the case. It makes sure that it will turn into an error if we forget to pass the config to delayed. The code will still be working if joblib decides to provide a way to provide a context and a config.

ogrisel

I think this approach is a good trade-off between verbosity and non-magicness.

I think the tests should be expanded to check the thread-safety of the config management in conjunction with joblib calls, both when using the loky backend and the threading backend.

setup.cfg

sklearn/utils/tests/test_fixes.py

doc/whats_new/v1.2.rst

sklearn/ensemble/_forest.py

ogrisel · 2023-01-04T17:07:22Z

sklearn/model_selection/_search.py

@@ -840,8 +841,12 @@ def evaluate_candidates(candidate_params, cv=None, more_results=None):
                        )
                    )

+                # Capture the config of the current thread here instead of inside the
+                # generator expression. The generator expression can be consumed by
+                # an auxiliary thread in joblib.


This comment was written only once for the purpose of this PR only. Repeating it everywhere might be too verbose. Not sure what to do...

Maybe the content of the warning message is explicit enough.

ogrisel · 2023-01-04T17:15:13Z

sklearn/utils/tests/test_parallel.py

        )

-    assert_array_equal(results, [123] * 2)
+    assert_array_equal(results, [123] * 10)


Maybe you could extend this test to show that this patterns also work from other threads, e.g. with something like the following (untested):

results = [] def parallel_inspect_config(): with sklearn.config_context(working_memory=123): config = sklearn.get_config() results.extend( Parallel(n_jobs=2, pre_dispatch=4)( delayed(get_working_memory, config=config)() for _ in range(n_iter) ) ) other_thread = threading.Thread(target=parallel_inspect_config) other_thread.start() other_thread.join() assert results == [123] * n_iter

It would even be better to have a test with ThreadpoolExecutor that checks that concurrently running threads calling joblib parallel with different contexts do not result in mixed up configurations.

I added this test.

sklearn/utils/fixes.py

Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>

thomasjpfan

I agree this implementation is less magical compared to #25242

thomasjpfan · 2023-01-07T02:12:41Z

sklearn/utils/fixes.py

@@ -107,22 +109,39 @@ def _eigh(*args, **kwargs):


 # remove when https://github.com/joblib/joblib/issues/1071 is fixed
-def delayed(function):
+def delayed(function, config=None):


I think the original bug is big enough for us to have a private _delayed(function, config) so it can be backported to 1.2.1 to avoid new API in a bug fix release. I'll be okay if this is done in separate PR.

Unfortunately, developers using joblib, will need to update their code to work correctly with globally setting transform="pandas". Moreover, developers will need to depend on utils.fixes.delayed for a while. This suggests to me that we need to render the docs for utils.fixes.delayed and properly document it.

ogrisel · 2023-01-09T12:56:07Z

Thinking more about it, we could also make this more automatic by subclassing joblib.Parallel as sklearn.fixes.Parallel to overried the Parallel.__call__ method to automatically call sklearn.get_config there and then rewrap the generator args of Parallel.__call__ to call delayed_object.set_config(config) on each task.

That would mandate using the sklearn.fixes.Parallel subclass everywhere though.

And indeed, maybe we should consider those tools (Parallel and delayed) semi-public with proper docstrings to explain how they extend the joblib equivalent to propagate scikit-learn specific configuration to worker threads and processes.

FIX pass explicit configuration to delayed

c091e69

glemaitre marked this pull request as draft January 4, 2023 16:11

github-actions bot added module:model_selection module:utils labels Jan 4, 2023

glemaitre added 2 commits January 4, 2023 17:26

TST check the warning message

d589b7f

change all delayed pattern

f2246b8

glemaitre marked this pull request as ready for review January 4, 2023 17:02

DOC add entry in changelog

86e0e43

ogrisel reviewed Jan 4, 2023

View reviewed changes

glemaitre and others added 6 commits January 4, 2023 18:26

TST add non-regression test

1a19f02

revert warning

dbc6a9a

Apply suggestions from code review

6fdf7c9

Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>

TST extend parallel test

7fede76

Merge remote-tracking branch 'glemaitre/is/25242' into is/25242

3b2d395

ENH add code snippet in warning message

43b4d29

glemaitre mentioned this pull request Jan 4, 2023

FIX get config from dispatcher thread in delayed by default #25242

Closed

thomasjpfan reviewed Jan 7, 2023

View reviewed changes

Merge remote-tracking branch 'origin/main' into is/25242

d50ff05

glemaitre mentioned this pull request Jan 11, 2023

FIX propagate configuration to workers in parallel #25363

Merged

thomasjpfan closed this in #25363 Jan 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FIX pass explicit configuration to delayed #25290

FIX pass explicit configuration to delayed #25290

glemaitre commented Jan 4, 2023

ogrisel left a comment

ogrisel Jan 4, 2023

ogrisel Jan 4, 2023

glemaitre Jan 4, 2023

thomasjpfan left a comment

thomasjpfan Jan 7, 2023

ogrisel commented Jan 9, 2023

FIX pass explicit configuration to delayed #25290

FIX pass explicit configuration to delayed #25290

Conversation

glemaitre commented Jan 4, 2023

ogrisel left a comment

Choose a reason for hiding this comment

ogrisel Jan 4, 2023

Choose a reason for hiding this comment

ogrisel Jan 4, 2023

Choose a reason for hiding this comment

glemaitre Jan 4, 2023

Choose a reason for hiding this comment

thomasjpfan left a comment

Choose a reason for hiding this comment

thomasjpfan Jan 7, 2023

Choose a reason for hiding this comment

ogrisel commented Jan 9, 2023