Skip to content

PLSRegression not working with VotingRegressor #26549

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
michaelsimmler opened this issue Jun 8, 2023 · 5 comments · Fixed by #26602
Closed

PLSRegression not working with VotingRegressor #26549

michaelsimmler opened this issue Jun 8, 2023 · 5 comments · Fixed by #26602
Assignees

Comments

@michaelsimmler
Copy link

michaelsimmler commented Jun 8, 2023

Hi,

Unfortunately, PLSRegression does not work in VotingRegressor. Likely related to PLSRegression returning predictions in shape of (n_samples, 1) instead of (n_samples,) like other single-target regressors.

thanks!

import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.cross_decomposition import PLSRegression
from sklearn.ensemble import VotingRegressor

r1 = LinearRegression()
r2 = PLSRegression()

X = np.array([[1, 1], [2, 4], [3, 9], [4, 16], [5, 25], [6, 36]])
y = np.array([2, 6, 12, 20, 30, 42])

r1.fit(X, y).predict(X).shape
(6,)
r2.fit(X, y).predict(X).shape
(6, 1)
er = VotingRegressor([('lr', r1), ('plsr', r2)])
er.fit(X, y).predict(X)
Traceback (most recent call last):
  Cell In[25], line 1
    er.fit(X, y).predict(X)
  File ~/code/scikit-learn/sklearn/ensemble/_voting.py:625 in predict
    return np.average(self._predict(X), axis=1, weights=self._weights_not_none)
  File ~/code/scikit-learn/sklearn/ensemble/_voting.py:69 in _predict
    return np.asarray([est.predict(X) for est in self.estimators_]).T
ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 2 dimensions. The detected shape was (2, 6) + inhomogeneous part.
@github-actions github-actions bot added the Needs Triage Issue requires triage label Jun 8, 2023
@Shreesha3112
Copy link
Contributor

PLSRegresson supports multi-target regression, hence the predicition is 2d-array.

As a proposed solution, I suggest creating a custom subclass of PLSRegression, which we can call PLS1D. This subclass overrides the predict method of PLSRegression to ensure it returns a 1-dimensional array, making it compatible with the VotingRegressor. This will effectively handle the case where n_components=1.

Here is the relevant code for the proposed PLS1D subclass:

import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.cross_decomposition import PLSRegression
from sklearn.ensemble import VotingRegressor



class PLS1D(PLSRegression):
    def __init__(self, n_components=1, scale=True, max_iter=500, tol=1e-06, copy=True):
        if n_components > 1:
            raise ValueError("PLS1D can only handle n_components=1")
        super().__init__(n_components=n_components, scale=scale, max_iter=max_iter, tol=tol, copy=copy)
    
    def predict(self, X):
        y_pred = super().predict(X)
        return y_pred.ravel()


r1 = LinearRegression()
r2 = PLS1D()

X = np.array([[1, 1], [2, 4], [3, 9], [4, 16], [5, 25], [6, 36]])
y = np.array([2, 6, 12, 20, 30, 42])
er = VotingRegressor([('lr', r1), ('plsr', r2)])
print(er.fit(X, y).predict(X))

r1.fit(X, y).predict(X).shape
r2.fit(X, y).predict(X).shape

Output
[ 1.21007429 6.09332284 12.55111269 20.58344384 30.1903163 41.37173005]

@ogrisel
Copy link
Member

ogrisel commented Jun 16, 2023

I think that we might want PLSRegression to behave consistently with LinearRegression and Ridge, that is, when y is provided as shape (n_samples, 1) at fit, then predict should output that same shape, but when y is passed as shape (n_samples,) then predict would automatically ravel its output.

I think other multi-output regressors such as RandomForestRegressor and other tree-based regressors should also do the same.

@ogrisel ogrisel removed the Needs Triage Issue requires triage label Jun 16, 2023
@ogrisel
Copy link
Member

ogrisel commented Jun 16, 2023

We might also accept a PR to make VotingRegressor regressor do such raveling automatically when all inputs have a mix of (n_samples,) and (n_samples, 1) shapes.

@Charlie-XIAO
Copy link
Contributor

I would like to make a PR for the first suggestion (i.e., automatically ravel the prediction when Y.ndim == 1 when fitting). I often needed to ravel manually when using scikit-learn so I think this is desirable, but I do think this is a breaking change.

@Charlie-XIAO
Copy link
Contributor

/take

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants