Skip to content

Broken estimator_ attribute on some ensemble models #25588

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
BenjaminBossan opened this issue Feb 10, 2023 · 5 comments · Fixed by #25668
Closed

Broken estimator_ attribute on some ensemble models #25588

BenjaminBossan opened this issue Feb 10, 2023 · 5 comments · Fixed by #25668

Comments

@BenjaminBossan
Copy link
Contributor

BenjaminBossan commented Feb 10, 2023

Describe the bug

Several ensemble models raise an error when trying to access the existing estimator_ attribute.

The problem is that this property tries to access self._estimator, which is set by sklearn.ensemble.BaseEnsemble._validate_estimator, but that method is not called by all subclasses.

def _validate_estimator(self, default=None):

For VotingClassifier and VotingRegressor, it's understandable IMO, but the error message could be better. For gradient boosting, estimator_ could return something useful.

More as a reminder to myself, _validate_estimator is being rewritten in #24250 to return the estimator instead of setting it inplace.

Steps/Code to Reproduce

import sklearn.ensemble
from sklearn.neighbors import KNeighborsClassifier, KNeighborsRegressor

estimators = [
    sklearn.ensemble.AdaBoostClassifier(),
    sklearn.ensemble.AdaBoostRegressor(),
    sklearn.ensemble.BaggingClassifier(),
    sklearn.ensemble.BaggingRegressor(),
    sklearn.ensemble.ExtraTreesClassifier(),
    sklearn.ensemble.ExtraTreesRegressor(),
    sklearn.ensemble.GradientBoostingClassifier(),
    sklearn.ensemble.GradientBoostingRegressor(),
    sklearn.ensemble.HistGradientBoostingClassifier(),
    sklearn.ensemble.HistGradientBoostingRegressor(),
    sklearn.ensemble.RandomForestClassifier(),
    sklearn.ensemble.RandomForestRegressor(),
    sklearn.ensemble.VotingClassifier([('5', KNeighborsClassifier(5)), ('10', KNeighborsClassifier(10))]),
    sklearn.ensemble.VotingRegressor([('5', KNeighborsRegressor(5)), ('10', KNeighborsRegressor(10))]),
]

X, y = [[1], [2]], [0, 1]

msg = "Got {} error when trying to access .estimator_ in {}"
for estimator in estimators:
    estimator.fit(X, y)
    try:
        estimator.estimator_
    except Exception as e:
        print(msg.format(e.__class__.__name__, estimator.__class__.__name__))

Expected Results

No error is printed.

Actual Results

Got AttributeError error when trying to access .estimator_ in GradientBoostingClassifier
Got AttributeError error when trying to access .estimator_ in GradientBoostingRegressor
Got AttributeError error when trying to access .estimator_ in HistGradientBoostingClassifier
Got AttributeError error when trying to access .estimator_ in HistGradientBoostingRegressor
Got AttributeError error when trying to access .estimator_ in VotingClassifier
Got AttributeError error when trying to access .estimator_ in VotingRegressor

Versions

System:
    python: 3.10.9 | packaged by conda-forge | (main, Feb  2 2023, 20:20:04) [GCC 11.3.0]
executable: /home/name/anaconda3/envs/skops/bin/python
   machine: Linux-5.15.0-60-generic-x86_64-with-glibc2.35

Python dependencies:
      sklearn: 1.2.0
          pip: 22.3.1
   setuptools: 65.5.1
        numpy: 1.23.5
        scipy: 1.9.3
       Cython: None
       pandas: 1.5.3
   matplotlib: 3.6.3
       joblib: 1.2.0
threadpoolctl: 3.1.0

Built with OpenMP: True

threadpoolctl info:
       user_api: blas
   internal_api: openblas
         prefix: libopenblas
       filepath: /home/name/anaconda3/envs/skops/lib/libopenblasp-r0.3.21.so
        version: 0.3.21
threading_layer: pthreads
   architecture: Haswell
    num_threads: 8

       user_api: openmp
   internal_api: openmp
         prefix: libgomp
       filepath: /home/name/anaconda3/envs/skops/lib/libgomp.so.1.0.0
        version: None
    num_threads: 8
@BenjaminBossan BenjaminBossan added Bug Needs Triage Issue requires triage labels Feb 10, 2023
@glemaitre
Copy link
Member

@BenjaminBossan what would you like to return for gradient boosting?

Since we use _ to show that something has been fitted, I am not sure to see which individual estimator from the boosted chain is actually interesting to report. If we are interesting by the original tree that will be used for the boosted chain, then it does make sense to have an _ because we don't need it to be fitted on any data.

@BenjaminBossan
Copy link
Contributor Author

I agree that there is not any specific estimator that should be expected for gradient boosting or voting. A solution could be to remove the estimator_ property from BaseEnsemble and only add it to those subclasses that really need it. Another option could be to override estimator_ on gb and voting and to raise a sensible error.

@glemaitre
Copy link
Member

A solution could be to remove the estimator_ property from BaseEnsemble and only add it to those subclasses that really need it

At a first glance, it seems a good solution since this is not shared between ensemble methods.

@jeremiedbb
Copy link
Member

This is a side effect of the deprecation of base_estimator. It will be resolved by itself in 1.4 when the deprecation cycle ends. I opened #25668 to fix the error message in the meantime

@BenjaminBossan
Copy link
Contributor Author

Thanks for the update.

@thomasjpfan thomasjpfan added module:ensemble and removed Needs Triage Issue requires triage labels Feb 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants