Skip to content

RFECV race condition on estimator #23533

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
cpiber opened this issue Jun 3, 2022 · 3 comments
Closed

RFECV race condition on estimator #23533

cpiber opened this issue Jun 3, 2022 · 3 comments

Comments

@cpiber
Copy link

cpiber commented Jun 3, 2022

In RFECV, at

scores = parallel(
func(rfe, self.estimator, X, y, train, test, scorer)
for train, test in cv.split(X, y, groups)
)

the estimator is passed as-is to the fit function. Since fit modifies the object without copying, this is prone to race conditions (see example below).

Contrast this to BaseSearchCV, where the estimator is properly cloned:

delayed(_fit_and_score)(
clone(base_estimator),
X,
y,
train=train,
test=test,
parameters=parameters,
split_progress=(split_idx, n_splits),
candidate_progress=(cand_idx, n_candidates),
**fit_and_score_kwargs,
)


On my system, with parameter `n_jobs=-1`, I got the following error:
5 fits failed with the following error:
Traceback (most recent call last):
  File ".../site-packages/sklearn/model_selection/_validation.py", line 686, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File ".../site-packages/sklearn/feature_selection/_rfe.py", line 723, in fit
    scores = parallel(
  File ".../site-packages/joblib/parallel.py", line 1056, in __call__
    self.retrieve()
  File ".../site-packages/joblib/parallel.py", line 935, in retrieve
    self._output.extend(job.get(timeout=self.timeout))
  File "/usr/lib/python3.10/multiprocessing/pool.py", line 771, in get
    raise self._value
  File "/usr/lib/python3.10/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File ".../site-packages/joblib/_parallel_backends.py", line 595, in __call__
    return self.func(*args, **kwargs)
  File ".../site-packages/joblib/parallel.py", line 262, in __call__
    return [func(*args, **kwargs)
  File ".../site-packages/joblib/parallel.py", line 262, in <listcomp>
    return [func(*args, **kwargs)
  File ".../site-packages/sklearn/utils/fixes.py", line 117, in __call__
    return self.function(*args, **kwargs)
  File ".../site-packages/sklearn/feature_selection/_rfe.py", line 37, in _rfe_single_fit
    return rfe._fit(
  File ".../site-packages/sklearn/feature_selection/_rfe.py", line 327, in _fit
    self.scores_.append(step_score(self.estimator_, features))
  File ".../site-packages/sklearn/feature_selection/_rfe.py", line 40, in <lambda>
    lambda estimator, features: _score(
  File ".../site-packages/sklearn/model_selection/_validation.py", line 767, in _score
    scores = scorer(estimator, X_test, y_test)
  File ".../site-packages/sklearn/metrics/_scorer.py", line 219, in __call__
    return self._score(
  File ".../site-packages/sklearn/metrics/_scorer.py", line 261, in _score
    y_pred = method_caller(estimator, "predict", X)
  File ".../site-packages/sklearn/metrics/_scorer.py", line 71, in _cached_call
    return getattr(estimator, method)(*args, **kwargs)
  File ".../site-packages/sklearn/ensemble/_forest.py", line 835, in predict
    return self.classes_.take(np.argmax(proba, axis=1), axis=0)
AttributeError: 'list' object has no attribute 'take'

It is generated from the following snippet:

  rf = RandomForestClassifier()
  rfecv = RFECV(rf, scoring='accuracy', n_jobs=-1)
  rfecv.fit(X_train, y_train)

The error appears to happen because n_outputs_ is not constant between runs. The error does not happen without parallelism.

@github-actions github-actions bot added the Needs Triage Issue requires triage label Jun 3, 2022
@glemaitre
Copy link
Member

We should be cloning self.estimator in the parallel call I think.
We usually always do that. Indeed, we don't want to mutate self.estimator if we follow our own API.

@jeremiedbb
Copy link
Member

No reproducer could be made in #23560, and that I couldn't make a reproducer either.
It is possible that whatever it was, it was fixed as a side effect of #30176.

With that in mind, I'm closing this issue. Feel free to reopen and provide a reproducer if you think that the bug still exists.

@cpiber
Copy link
Author

cpiber commented Apr 13, 2025

Yes, I think that was it, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants