-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
ColumnTransformers don't honor set_config(transform_output="pandas") when multiprocessing with n_jobs>1 #25239
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I suspect that this is related to #17634 and joblib/joblib#1071. I thought that we fixed the issue. |
So there is something weird. The setting got changed in the middle of the processing. I did try 3
|
Taking the minimal example in #17634, we can reproduce the issue, so we need to have more tasks than In [1]: from joblib import Parallel
...: from sklearn.utils.fixes import delayed
...: from sklearn import get_config, config_context
...:
...: def get_working_memory():
...: return get_config()['working_memory']
...:
...: with config_context(working_memory=123):
...: results = Parallel(n_jobs=2)(
...: delayed(get_working_memory)() for _ in range(10)
...: )
...:
...: results
Out[1]: [123, 123, 123, 123, 1024, 1024, 1024, 1024, 1024, 1024] |
And the fun part is that things start to go sideways for the task |
with |
Yep but you might blow up your memory :) |
I assume that it could be linked to this comment: We did not associate the config with the jobs that are not consumed by the main thread. |
How did you print from within the transformer? I am having troubles debugging while multiprocessing.. I only get one print |
I assume that you get this issue: #22849 |
That was it! thanks |
Describe the bug
I'm trying to do a grid search with
n_jobs=-1
, working with pandas output, and it fails despiteset_config(transform_output = "pandas")
I have to manually
.set_output(transform='pandas')
in the ColumnTransformer for it to work.Steps/Code to Reproduce
Preparation
This WORKS (
n_jobs=1
):This FAILS (
n_jobs=-1
):This WORKS again (
n_jobs=-1
and force output):Actual Results
AssertionError: Fit failed
Versions
The text was updated successfully, but these errors were encountered: