-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
FIX pass explicit configuration to delayed #25290
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this approach is a good trade-off between verbosity and non-magicness.
I think the tests should be expanded to check the thread-safety of the config management in conjunction with joblib calls, both when using the loky backend and the threading backend.
@@ -840,8 +841,12 @@ def evaluate_candidates(candidate_params, cv=None, more_results=None): | |||
) | |||
) | |||
|
|||
# Capture the config of the current thread here instead of inside the | |||
# generator expression. The generator expression can be consumed by | |||
# an auxiliary thread in joblib. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment was written only once for the purpose of this PR only. Repeating it everywhere might be too verbose. Not sure what to do...
Maybe the content of the warning message is explicit enough.
sklearn/utils/tests/test_parallel.py
Outdated
) | ||
|
||
assert_array_equal(results, [123] * 2) | ||
assert_array_equal(results, [123] * 10) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe you could extend this test to show that this patterns also work from other threads, e.g. with something like the following (untested):
results = []
def parallel_inspect_config():
with sklearn.config_context(working_memory=123):
config = sklearn.get_config()
results.extend(
Parallel(n_jobs=2, pre_dispatch=4)(
delayed(get_working_memory, config=config)() for _ in range(n_iter)
)
)
other_thread = threading.Thread(target=parallel_inspect_config)
other_thread.start()
other_thread.join()
assert results == [123] * n_iter
It would even be better to have a test with ThreadpoolExecutor that checks that concurrently running threads calling joblib parallel with different contexts do not result in mixed up configurations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added this test.
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree this implementation is less magical compared to #25242
@@ -107,22 +109,39 @@ def _eigh(*args, **kwargs): | |||
|
|||
|
|||
# remove when https://github.com/joblib/joblib/issues/1071 is fixed | |||
def delayed(function): | |||
def delayed(function, config=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the original bug is big enough for us to have a private _delayed(function, config)
so it can be backported to 1.2.1
to avoid new API in a bug fix release. I'll be okay if this is done in separate PR.
Unfortunately, developers using joblib
, will need to update their code to work correctly with globally setting transform="pandas"
. Moreover, developers will need to depend on utils.fixes.delayed
for a while. This suggests to me that we need to render the docs for utils.fixes.delayed
and properly document it.
Thinking more about it, we could also make this more automatic by subclassing That would mandate using the And indeed, maybe we should consider those tools ( |
Working alternative to #25242
closes #25242
closes #25239
This is an alternative to #25242 that does not work if the thread import scikit-learn is different from the thread making the call to
Parallel
.Here, we have an alternative where we pass explicitly the configuration that is obtained by the thread that makes the
Parallel
code.We raise a warning if this is not the case. It makes sure that it will turn into an error if we forget to pass the config to
delayed
. The code will still be working ifjoblib
decides to provide a way to provide acontext
and aconfig
.