Closed
Description
In RFECV, at
scikit-learn/sklearn/feature_selection/_rfe.py
Lines 723 to 726 in fb3ed90
the estimator is passed as-is to the fit function. Since
fit
modifies the object without copying, this is prone to race conditions (see example below).
Contrast this to BaseSearchCV, where the estimator is properly cloned:
scikit-learn/sklearn/model_selection/_search.py
Lines 823 to 833 in fb3ed90
On my system, with parameter `n_jobs=-1`, I got the following error:
5 fits failed with the following error:
Traceback (most recent call last):
File ".../site-packages/sklearn/model_selection/_validation.py", line 686, in _fit_and_score
estimator.fit(X_train, y_train, **fit_params)
File ".../site-packages/sklearn/feature_selection/_rfe.py", line 723, in fit
scores = parallel(
File ".../site-packages/joblib/parallel.py", line 1056, in __call__
self.retrieve()
File ".../site-packages/joblib/parallel.py", line 935, in retrieve
self._output.extend(job.get(timeout=self.timeout))
File "/usr/lib/python3.10/multiprocessing/pool.py", line 771, in get
raise self._value
File "/usr/lib/python3.10/multiprocessing/pool.py", line 125, in worker
result = (True, func(*args, **kwds))
File ".../site-packages/joblib/_parallel_backends.py", line 595, in __call__
return self.func(*args, **kwargs)
File ".../site-packages/joblib/parallel.py", line 262, in __call__
return [func(*args, **kwargs)
File ".../site-packages/joblib/parallel.py", line 262, in <listcomp>
return [func(*args, **kwargs)
File ".../site-packages/sklearn/utils/fixes.py", line 117, in __call__
return self.function(*args, **kwargs)
File ".../site-packages/sklearn/feature_selection/_rfe.py", line 37, in _rfe_single_fit
return rfe._fit(
File ".../site-packages/sklearn/feature_selection/_rfe.py", line 327, in _fit
self.scores_.append(step_score(self.estimator_, features))
File ".../site-packages/sklearn/feature_selection/_rfe.py", line 40, in <lambda>
lambda estimator, features: _score(
File ".../site-packages/sklearn/model_selection/_validation.py", line 767, in _score
scores = scorer(estimator, X_test, y_test)
File ".../site-packages/sklearn/metrics/_scorer.py", line 219, in __call__
return self._score(
File ".../site-packages/sklearn/metrics/_scorer.py", line 261, in _score
y_pred = method_caller(estimator, "predict", X)
File ".../site-packages/sklearn/metrics/_scorer.py", line 71, in _cached_call
return getattr(estimator, method)(*args, **kwargs)
File ".../site-packages/sklearn/ensemble/_forest.py", line 835, in predict
return self.classes_.take(np.argmax(proba, axis=1), axis=0)
AttributeError: 'list' object has no attribute 'take'
It is generated from the following snippet:
rf = RandomForestClassifier()
rfecv = RFECV(rf, scoring='accuracy', n_jobs=-1)
rfecv.fit(X_train, y_train)
The error appears to happen because n_outputs_
is not constant between runs. The error does not happen without parallelism.