PLSRegression not working with VotingRegressor #26549

michaelsimmler · 2023-06-08T21:23:13Z

Hi,

Unfortunately, PLSRegression does not work in VotingRegressor. Likely related to PLSRegression returning predictions in shape of (n_samples, 1) instead of (n_samples,) like other single-target regressors.

thanks!

import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.cross_decomposition import PLSRegression
from sklearn.ensemble import VotingRegressor

r1 = LinearRegression()
r2 = PLSRegression()

X = np.array([[1, 1], [2, 4], [3, 9], [4, 16], [5, 25], [6, 36]])
y = np.array([2, 6, 12, 20, 30, 42])

r1.fit(X, y).predict(X).shape

(6,)

r2.fit(X, y).predict(X).shape

(6, 1)

er = VotingRegressor([('lr', r1), ('plsr', r2)])
er.fit(X, y).predict(X)

Traceback (most recent call last):
  Cell In[25], line 1
    er.fit(X, y).predict(X)
  File ~/code/scikit-learn/sklearn/ensemble/_voting.py:625 in predict
    return np.average(self._predict(X), axis=1, weights=self._weights_not_none)
  File ~/code/scikit-learn/sklearn/ensemble/_voting.py:69 in _predict
    return np.asarray([est.predict(X) for est in self.estimators_]).T
ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 2 dimensions. The detected shape was (2, 6) + inhomogeneous part.

The text was updated successfully, but these errors were encountered:

Shreesha3112 · 2023-06-14T10:01:50Z

PLSRegresson supports multi-target regression, hence the predicition is 2d-array.

As a proposed solution, I suggest creating a custom subclass of PLSRegression, which we can call PLS1D. This subclass overrides the predict method of PLSRegression to ensure it returns a 1-dimensional array, making it compatible with the VotingRegressor. This will effectively handle the case where n_components=1.

Here is the relevant code for the proposed PLS1D subclass:

import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.cross_decomposition import PLSRegression
from sklearn.ensemble import VotingRegressor



class PLS1D(PLSRegression):
    def __init__(self, n_components=1, scale=True, max_iter=500, tol=1e-06, copy=True):
        if n_components > 1:
            raise ValueError("PLS1D can only handle n_components=1")
        super().__init__(n_components=n_components, scale=scale, max_iter=max_iter, tol=tol, copy=copy)
    
    def predict(self, X):
        y_pred = super().predict(X)
        return y_pred.ravel()


r1 = LinearRegression()
r2 = PLS1D()

X = np.array([[1, 1], [2, 4], [3, 9], [4, 16], [5, 25], [6, 36]])
y = np.array([2, 6, 12, 20, 30, 42])
er = VotingRegressor([('lr', r1), ('plsr', r2)])
print(er.fit(X, y).predict(X))

r1.fit(X, y).predict(X).shape
r2.fit(X, y).predict(X).shape

Output
[ 1.21007429 6.09332284 12.55111269 20.58344384 30.1903163 41.37173005]

ogrisel · 2023-06-16T15:27:55Z

I think that we might want PLSRegression to behave consistently with LinearRegression and Ridge, that is, when y is provided as shape (n_samples, 1) at fit, then predict should output that same shape, but when y is passed as shape (n_samples,) then predict would automatically ravel its output.

I think other multi-output regressors such as RandomForestRegressor and other tree-based regressors should also do the same.

ogrisel · 2023-06-16T15:29:29Z

We might also accept a PR to make VotingRegressor regressor do such raveling automatically when all inputs have a mix of (n_samples,) and (n_samples, 1) shapes.

Charlie-XIAO · 2023-06-18T11:51:41Z

I would like to make a PR for the first suggestion (i.e., automatically ravel the prediction when Y.ndim == 1 when fitting). I often needed to ravel manually when using scikit-learn so I think this is desirable, but I do think this is a breaking change.

Charlie-XIAO · 2023-06-18T12:39:57Z

/take

github-actions bot added the Needs Triage Issue requires triage label Jun 8, 2023

ogrisel removed the Needs Triage Issue requires triage label Jun 16, 2023

Charlie-XIAO mentioned this issue Jun 18, 2023

FIX ravel prediction of PLSRegression when fitted on 1d y #26602

Merged

github-actions bot assigned Charlie-XIAO Jun 18, 2023

OmarManzoor closed this as completed in #26602 Jul 24, 2023

vincentblot28 mentioned this issue Nov 9, 2023

Shape mismatch with sklearn.cross_decomposition.PLSRegression scikit-learn-contrib/MAPIE#368

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PLSRegression not working with VotingRegressor #26549

PLSRegression not working with VotingRegressor #26549

michaelsimmler commented Jun 8, 2023 •

edited by ogrisel

Loading

Shreesha3112 commented Jun 14, 2023

ogrisel commented Jun 16, 2023

ogrisel commented Jun 16, 2023

Charlie-XIAO commented Jun 18, 2023

Charlie-XIAO commented Jun 18, 2023

PLSRegression not working with VotingRegressor #26549

PLSRegression not working with VotingRegressor #26549

Comments

michaelsimmler commented Jun 8, 2023 • edited by ogrisel Loading

Shreesha3112 commented Jun 14, 2023

ogrisel commented Jun 16, 2023

ogrisel commented Jun 16, 2023

Charlie-XIAO commented Jun 18, 2023

Charlie-XIAO commented Jun 18, 2023

michaelsimmler commented Jun 8, 2023 •

edited by ogrisel

Loading