You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
func(rfe, self.estimator, X, y, train, test, scorer)
fortrain, testincv.split(X, y, groups)
)
the estimator is passed as-is to the fit function. Since fit modifies the object without copying, this is prone to race conditions (see example below).
Contrast this to BaseSearchCV, where the estimator is properly cloned:
On my system, with parameter `n_jobs=-1`, I got the following error:
5 fits failed with the following error:
Traceback (most recent call last):
File ".../site-packages/sklearn/model_selection/_validation.py", line 686, in _fit_and_score
estimator.fit(X_train, y_train, **fit_params)
File ".../site-packages/sklearn/feature_selection/_rfe.py", line 723, in fit
scores = parallel(
File ".../site-packages/joblib/parallel.py", line 1056, in __call__
self.retrieve()
File ".../site-packages/joblib/parallel.py", line 935, in retrieve
self._output.extend(job.get(timeout=self.timeout))
File "/usr/lib/python3.10/multiprocessing/pool.py", line 771, in get
raise self._value
File "/usr/lib/python3.10/multiprocessing/pool.py", line 125, in worker
result = (True, func(*args, **kwds))
File ".../site-packages/joblib/_parallel_backends.py", line 595, in __call__
return self.func(*args, **kwargs)
File ".../site-packages/joblib/parallel.py", line 262, in __call__
return [func(*args, **kwargs)
File ".../site-packages/joblib/parallel.py", line 262, in <listcomp>
return [func(*args, **kwargs)
File ".../site-packages/sklearn/utils/fixes.py", line 117, in __call__
return self.function(*args, **kwargs)
File ".../site-packages/sklearn/feature_selection/_rfe.py", line 37, in _rfe_single_fit
return rfe._fit(
File ".../site-packages/sklearn/feature_selection/_rfe.py", line 327, in _fit
self.scores_.append(step_score(self.estimator_, features))
File ".../site-packages/sklearn/feature_selection/_rfe.py", line 40, in <lambda>
lambda estimator, features: _score(
File ".../site-packages/sklearn/model_selection/_validation.py", line 767, in _score
scores = scorer(estimator, X_test, y_test)
File ".../site-packages/sklearn/metrics/_scorer.py", line 219, in __call__
return self._score(
File ".../site-packages/sklearn/metrics/_scorer.py", line 261, in _score
y_pred = method_caller(estimator, "predict", X)
File ".../site-packages/sklearn/metrics/_scorer.py", line 71, in _cached_call
return getattr(estimator, method)(*args, **kwargs)
File ".../site-packages/sklearn/ensemble/_forest.py", line 835, in predict
return self.classes_.take(np.argmax(proba, axis=1), axis=0)
AttributeError: 'list' object has no attribute 'take'
We should be cloning self.estimator in the parallel call I think.
We usually always do that. Indeed, we don't want to mutate self.estimator if we follow our own API.
No reproducer could be made in #23560, and that I couldn't make a reproducer either.
It is possible that whatever it was, it was fixed as a side effect of #30176.
With that in mind, I'm closing this issue. Feel free to reopen and provide a reproducer if you think that the bug still exists.
In RFECV, at
scikit-learn/sklearn/feature_selection/_rfe.py
Lines 723 to 726 in fb3ed90
the estimator is passed as-is to the fit function. Since
fit
modifies the object without copying, this is prone to race conditions (see example below).Contrast this to BaseSearchCV, where the estimator is properly cloned:
scikit-learn/sklearn/model_selection/_search.py
Lines 823 to 833 in fb3ed90
On my system, with parameter `n_jobs=-1`, I got the following error:
It is generated from the following snippet:
The error appears to happen because
n_outputs_
is not constant between runs. The error does not happen without parallelism.The text was updated successfully, but these errors were encountered: