Skip to content

Enable mixed ensembles with estimators that do & don't accept the sample_weight fit_param #20167

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ajcallegari opened this issue May 30, 2021 · 2 comments

Comments

@ajcallegari
Copy link

ajcallegari commented May 30, 2021

I need to make a VotingRegressor ensemble with some estimators that accept sample weights during fitting and some that don't. Currently, mixed ensembles raise an exception:

from sklearn.datasets import make_regression
from sklearn.linear_model import LinearRegression
from sklearn.ensemble import RandomForestRegressor, VotingRegressor
from sklearn.neighbors import KNeighborsRegressor

X, y = make_regression()
weights = abs(y)

rgr = VotingRegressor(estimators=[('LR', LinearRegression()), 
                                  ('KNN', KNeighborsRegressor()),
                                  ('XGBR', RandomForestRegressor())])

rgr.fit(X, y, sample_weight=weights)

result:
TypeError: Underlying estimator KNeighborsRegressor does not support sample weights.

A possible solution would be to have the ensemble class (e.g. VotingRegressor, VotingClassifier, StackingRegressor, StackingClassifier) read the fit() signatures of the estimators in the ensemble and not pass sample_weight to estimators that don't accept sample_weight. Or more realistically, catch exceptions caused by calls to fit() with the sample_weight parameter and then default to calling fit() without this parameter. This behavior could be default, or activated by flag like "enable_mixed_sample_weight" in the ensemble class's __init__ method. If it's important to notify the user when an estimator doesn't accept the sample_weight parameter, notification and the exception currently in place could be enabled with a flag like "enforce_sample_weight".

As a workaround I'm using the Ensemble class from the pipecaster library (https://github.com/ajcallegari/pipecaster) which allows mixed ensembles by catching exceptions caused by fit() and then defaulting to a fit() call without the sample_weight parameter. This Ensemble class has the scikit-learn interface and supports classification, regression, voting, and model stacking.

@adrinjalali
Copy link
Member

This will be fixed once we have sample props implemented, ref: #20350

@adrinjalali
Copy link
Member

Now supported with metadata routing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants