-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
[MRG] ENH: Make StackingRegressor support Multioutput #27704
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[MRG] ENH: Make StackingRegressor support Multioutput #27704
Conversation
@OmarManzoor would you maybe have time to have a look here? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the PR @hmasdev. A few comments. I think we will also need a changelog entry for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the changes. Here are a few more changes. Also I think we are still not handling the case where if an estimator/regressor that does not support multioutput is specified. Or do we not need to worry about such a case?
# NOTE: In this case the estimator can predict almost exactly the target | ||
assert_allclose( | ||
y_pred, | ||
# NOTE: when the target is 2D but with a single output, | ||
# the predictions are 1D because of column_or_1d |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# NOTE: In this case the estimator can predict almost exactly the target | |
assert_allclose( | |
y_pred, | |
# NOTE: when the target is 2D but with a single output, | |
# the predictions are 1D because of column_or_1d | |
# NOTE: In this case the estimator can predict almost exactly the target. | |
# When the target is 2D but with a single output the predictions are 1D | |
# because of column_or_1d | |
assert_allclose( | |
y_pred, |
rtol=acceptable_relative_tolerance, | ||
atol=acceptable_aboslute_tolerance, | ||
) | ||
# transform |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# transform |
) | ||
|
||
reg.fit(X_train, y_train) | ||
# predict |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# predict |
|
||
|
||
def test_stacking_regressor_multioutput_with_passthrough(): | ||
"""Check that a stacking regressor with multioutput works""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"""Check that a stacking regressor with multioutput works""" | |
"""Check that a stacking regressor with passthrough works with multioutput""" |
rtol=acceptable_relative_tolerance, | ||
atol=acceptable_aboslute_tolerance, | ||
) | ||
# transform |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# transform |
|
||
|
||
def test_stacking_regressor_multioutput(): | ||
"""Check that a stacking regressor with multioutput works""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"""Check that a stacking regressor with multioutput works""" | |
"""Check that a stacking regressor works with multioutput""" |
@OmarManzoor
Actually, I don't have a good idea yet on how to handle an estimator that does not support multiple outputs when it is used in a multi-output problem. Do you know such an API? Python 3.10.13 (main, Feb 22 2024, 10:50:12) [GCC 11.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from sklearn.ensemble import StackingRegressor
>>> from sklearn.svm import SVR
>>> from sklearn.linear_model import LinearRegression
>>> lr = LinearRegression()
>>> svr = SVR()
>>> model = StackingRegressor(estimators=[('lr', lr), ('svr', svr)])
>>> import numpy as np
>>> X = np.random.randn(10, 2)
>>> Y = X ** 2
>>> model.fit(X, Y)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/root/workspace/scikit-learn/sklearn/ensemble/_stacking.py", line 973, in fit
return super().fit(X, y, sample_weight)
File "/root/workspace/scikit-learn/sklearn/base.py", line 1473, in wrapper
return fit_method(estimator, *args, **kwargs)
File "/root/workspace/scikit-learn/sklearn/ensemble/_stacking.py", line 224, in fit
self.estimators_ = Parallel(n_jobs=self.n_jobs)(
File "/root/workspace/scikit-learn/sklearn/utils/parallel.py", line 67, in __call__
return super().__call__(iterable_with_config)
File "/root/workspace/scikit-learn/sklearn-env/lib/python3.10/site-packages/joblib/parallel.py", line 1918, in __call__
return output if self.return_generator else list(output)
File "/root/workspace/scikit-learn/sklearn-env/lib/python3.10/site-packages/joblib/parallel.py", line 1847, in _get_sequential_output
res = func(*args, **kwargs)
File "/root/workspace/scikit-learn/sklearn/utils/parallel.py", line 129, in __call__
return self.function(*args, **kwargs)
File "/root/workspace/scikit-learn/sklearn/ensemble/_base.py", line 40, in _fit_single_estimator
estimator.fit(X, y, **fit_params)
File "/root/workspace/scikit-learn/sklearn/base.py", line 1473, in wrapper
return fit_method(estimator, *args, **kwargs)
File "/root/workspace/scikit-learn/sklearn/svm/_base.py", line 190, in fit
X, y = self._validate_data(
File "/root/workspace/scikit-learn/sklearn/base.py", line 650, in _validate_data
X, y = check_X_y(X, y, **check_params)
File "/root/workspace/scikit-learn/sklearn/utils/validation.py", line 1282, in check_X_y
y = _check_y(y, multi_output=multi_output, y_numeric=y_numeric, estimator=estimator)
File "/root/workspace/scikit-learn/sklearn/utils/validation.py", line 1303, in _check_y
y = column_or_1d(y, warn=True)
File "/root/workspace/scikit-learn/sklearn/utils/validation.py", line 1370, in column_or_1d
raise ValueError(
ValueError: y should be a 1d array, got an array of shape (10, 2) instead. |
Note that >>> import numpy as np
>>> from sklearn.linear_model import LogisticRegression
>>> from sklearn.ensemble import StackingClassifier
>>> from sklearn.multioutput import MultiOutputClassifier
>>> X = np.random.randn(10, 2)
>>> Y = X > 0 # multilabel classification
>>> model = StackingClassifier(estimators=[('lr', MultiOutputClassifier(LogisticRegression(C=1e3))), ('lr2', MultiOutputClassifier(LogisticRegression(C=1e3)))], final_estimator=MultiOutputClassifier(LogisticRegression(C=1e3)))
>>> model.fit(X, Y)
StackingClassifier(estimators=[('lr',
MultiOutputClassifier(estimator=LogisticRegression(C=1000.0))),
('lr2',
MultiOutputClassifier(estimator=LogisticRegression(C=1000.0)))],
final_estimator=MultiOutputClassifier(estimator=LogisticRegression(C=1000.0)))
>>> model.predict(X)[:3]
array([[ True, True],
[False, True],
[False, False]])
>>> model.predict_proba(X)[:3]
array([[1.55247883e-03, 7.05602027e-04],
[9.99983741e-01, 2.31839536e-03],
[9.99873471e-01, 9.99473360e-01]])
>>> Z = np.random.choice(range(3), size=X.shape) # multiclass-multioutput classification
>>> model.fit(X, Z)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/root/workspace/scikit-learn/sklearn/ensemble/_stacking.py", line 669, in fit
self._label_encoder = LabelEncoder().fit(y)
File "/root/workspace/scikit-learn/sklearn/preprocessing/_label.py", line 97, in fit
y = column_or_1d(y, warn=True)
File "/root/workspace/scikit-learn/sklearn/utils/validation.py", line 1370, in column_or_1d
raise ValueError(
ValueError: y should be a 1d array, got an array of shape (10, 2) instead. Ref. https://scikit-learn.org/stable/modules/multiclass.html |
Reference Issues/PRs
Related to #25597
Similar to #8547
Similar to #19223
What does this implement/fix? Explain your changes.
Any other comments?
I am concerned the followings: