Skip to content

RFECV race condition on estimator #23533

Closed
@cpiber

Description

@cpiber

In RFECV, at

scores = parallel(
func(rfe, self.estimator, X, y, train, test, scorer)
for train, test in cv.split(X, y, groups)
)

the estimator is passed as-is to the fit function. Since fit modifies the object without copying, this is prone to race conditions (see example below).

Contrast this to BaseSearchCV, where the estimator is properly cloned:

delayed(_fit_and_score)(
clone(base_estimator),
X,
y,
train=train,
test=test,
parameters=parameters,
split_progress=(split_idx, n_splits),
candidate_progress=(cand_idx, n_candidates),
**fit_and_score_kwargs,
)


On my system, with parameter `n_jobs=-1`, I got the following error:
5 fits failed with the following error:
Traceback (most recent call last):
  File ".../site-packages/sklearn/model_selection/_validation.py", line 686, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File ".../site-packages/sklearn/feature_selection/_rfe.py", line 723, in fit
    scores = parallel(
  File ".../site-packages/joblib/parallel.py", line 1056, in __call__
    self.retrieve()
  File ".../site-packages/joblib/parallel.py", line 935, in retrieve
    self._output.extend(job.get(timeout=self.timeout))
  File "/usr/lib/python3.10/multiprocessing/pool.py", line 771, in get
    raise self._value
  File "/usr/lib/python3.10/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File ".../site-packages/joblib/_parallel_backends.py", line 595, in __call__
    return self.func(*args, **kwargs)
  File ".../site-packages/joblib/parallel.py", line 262, in __call__
    return [func(*args, **kwargs)
  File ".../site-packages/joblib/parallel.py", line 262, in <listcomp>
    return [func(*args, **kwargs)
  File ".../site-packages/sklearn/utils/fixes.py", line 117, in __call__
    return self.function(*args, **kwargs)
  File ".../site-packages/sklearn/feature_selection/_rfe.py", line 37, in _rfe_single_fit
    return rfe._fit(
  File ".../site-packages/sklearn/feature_selection/_rfe.py", line 327, in _fit
    self.scores_.append(step_score(self.estimator_, features))
  File ".../site-packages/sklearn/feature_selection/_rfe.py", line 40, in <lambda>
    lambda estimator, features: _score(
  File ".../site-packages/sklearn/model_selection/_validation.py", line 767, in _score
    scores = scorer(estimator, X_test, y_test)
  File ".../site-packages/sklearn/metrics/_scorer.py", line 219, in __call__
    return self._score(
  File ".../site-packages/sklearn/metrics/_scorer.py", line 261, in _score
    y_pred = method_caller(estimator, "predict", X)
  File ".../site-packages/sklearn/metrics/_scorer.py", line 71, in _cached_call
    return getattr(estimator, method)(*args, **kwargs)
  File ".../site-packages/sklearn/ensemble/_forest.py", line 835, in predict
    return self.classes_.take(np.argmax(proba, axis=1), axis=0)
AttributeError: 'list' object has no attribute 'take'

It is generated from the following snippet:

  rf = RandomForestClassifier()
  rfecv = RFECV(rf, scoring='accuracy', n_jobs=-1)
  rfecv.fit(X_train, y_train)

The error appears to happen because n_outputs_ is not constant between runs. The error does not happen without parallelism.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions