Bias correction for TransformedTargetRegressor #15881

lorentzenchr · 2019-12-13T13:17:11Z

Description

If one is interested in predicting the (conditional on X) expected value of a target y, aka mean, then TransformedTargetRegressor should use a bias corrected inverse transform.

It would be nice to have an option for bias correction in TransformedTargetRegressor. At least, I would mention this in the user guide.

References

https://robjhyndman.com/hyndsight/backtransforming/

The text was updated successfully, but these errors were encountered:

lorentzenchr · 2020-01-07T07:56:41Z

Here is an example based on plot_transformed_target.html for the Boston house prices dataset. For simplicity, I used LinearRegression instead of RidgeCV and a PowerTransformer(method='box-cox') instead of QuantileTransformer, so that one can use the bias corrected backtransformation of the Box-Cox given in the referenced link above:

import numpy as np
import pandas as pd
import scipy

import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.compose import TransformedTargetRegressor
from sklearn.datasets import load_boston
from sklearn.linear_model import LinearRegression, Ridge
from sklearn.preprocessing import PowerTransformer
from sklearn.metrics import r2_score, median_absolute_error
from sklearn.model_selection import train_test_split


def BC_inverse(y, lam, sig):
    """Back transform Box-Cox with Bias correction.
    See https://robjhyndman.com/hyndsight/backtransforming
    """
    if lam == 0:
        return np.exp(y) * (1 + 0.5 * sig**2)
    else:
        res = np.power(lam*y + 1., 1/lam)
        res *= (1 + 0.5 * sig**2 * (1 - lam) / (lam*y + 1.)**2)
        return res

dataset = load_boston()
X = dataset.data
y = dataset.target
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=123, test_size = 0.25)

model = LinearRegression().fit(X_train, y_train)
model_trans = TransformedTargetRegressor(
    regressor=LinearRegression(),
    transformer=PowerTransformer(method='box-cox', standardize=False)).fit(X_train, y_train)

lam = model_trans.transformer_.lambdas_[0]
# standard deviation of transformed target
z_train = model_trans.transformer_.transform(y_train[:, np.newaxis]).ravel()
z_predict = model_trans.regressor_.predict(X_train)
sig = np.sum((z_train - z_predict)**2)
sig = np.sqrt(sig / (len(y_train) - len(model_trans.regressor_.coef_) - 1))

# simple calibration factor, as a Linear Model with intercept should
# have sum(predicted) = sum(observed) on training set
corr_fac = y_train.sum() / model_trans.predict(X_train).sum()

d = {"untransformed": np.sum(y_train - model.predict(X_train)),
     "transformed": np.sum(y_train - model_trans.predict(X_train)),
     "bias corrected": np.sum(y_train - BC_inverse(model_trans.regressor_.predict(X_train), lam, sig)),
     "rescaled": np.sum(y_train - corr_fac * model_trans.predict(X_train))
}
print("Model calibration on training set (perfect is 0):")
print(d)

d = {"untransformed": r2_score(y_test, model.predict(X_test)),
     "transformed": r2_score(y_test, model_trans.predict(X_test)),
     "bias corrected": r2_score(y_test, BC_inverse(model_trans.regressor_.predict(X_test), lam, sig)),
     "rescaled": r2_score(y_test, corr_fac * model_trans.predict(X_test))
    }
print("r2 score on test set (perfect is 1):")
print(d)

Result
Model calibration on training set (perfect is 0):

'untransformed': 9.414691248821327e-12
'transformed': 115.6547264252301
'bias corrected': 5.4043223163621334
'rescaled': 7.673861546209082e-13

R2 score on test set (perfect is 1):

'untransformed': 0.6862448857295753
'transformed': 0.7405742064643839
'bias corrected': 0.7423703905632908
'rescaled': 0.7424282009454282

Summary
The bias corrected results are better calibrated (on training set) and even have a higher R2 score (on test set).

lorentzenchr · 2020-05-10T12:02:57Z

What about adding an argument bias_correction to TransformedTargetRegressor with the options:

None: No bias correction (as is).
rescale: Acts like a post-fit-step. After fitting, calculates c = sum(y)/sum(predict(X)) and then multiplies every prediction by c.
recenter: Acts like a post-fit-step. After fitting, calculates c = mean(y - predict(X)) and then adds c to every prediction.
transformer: Uses the bias correction of the transformer if it has one implemented.

Edit:
This must be adapted, if the aim is not the expectation (mean) but ~~the median for example~~ some other property of the target distribution.
Note: Quantiles are invariant (don't change) under monotone (increasing) transformations and therefore don't need a bias correction.

@glemaitre @jnothman @amueller might be interested (as you did #9041)

lorentzenchr · 2020-07-04T12:44:54Z

Is this something scikit-learn would consider implementing and is a PR worth the effort? A short comment from a core developer would be very welcome.

jnothman · 2020-07-05T10:50:54Z

I don't feel very confident to speak to this without more research or experimentation. When would you say this is likely to be useful / not?

lorentzenchr · 2020-07-05T15:32:07Z

If your goal is to predict the expectation E[Y|X] and for some reason transforming Y before fitting is favorable, then I'd say it is always useful and even strongly advised from a statistical point of view.

In the example above—which I should maybe replace with the diamonds dataset— the bias corrected versions don't show any disadvantage that I'm aware of.

thomasjpfan · 2020-07-05T16:33:51Z

Is there a reference comparing the different options for bias correction?

lorentzenchr · 2020-07-05T17:27:51Z

Unfortunately, I'm not a domain expert here nor do I know a reference. Maybe, if we ask kindly enough, @robjhyndman or @mayer79 could have a look and help out?

My understanding is, that the bias corrected inverse Box-Cox transformation has at least a theoretical advantage over the simple global correction factor approach: It is a true second order correction, potentially different for every sample. Disadvantage is that you need analytic formulae for every transformation (2nd derivative, so not too difficult).

robjhyndman · 2020-07-05T22:46:08Z

It can also be done numerically. This is how bias correction is handled in fabletools (for R) so that any transformation can be used: https://github.com/tidyverts/fabletools/blob/master/R/transform.R#L102

lorentzenchr · 2020-08-19T09:44:22Z

@robjhyndman Thank you for pointing to the numerical solution. If I understand correctly, this is an additive correction, i.e. the biased corrected prediction is predict_biased + bias_correction.

lorentzenchr · 2023-05-07T12:27:04Z

Is this something we want for scikit-learn? An implementation in line with fabletools seems like a small addition with great impact on the modelling side.

lorentzenchr · 2023-08-21T16:47:13Z

Summary

Let's say, we want to model $\mu_Y = E[Y|X]$, transform to $Y = f(Z)$ (inverse transform $f$), actually model $\mu_Z = E[Z|X]$ and finally want to transform back. Then there are 2 options:

Use (Taylor series) $\mu_Y \approx E[f(Z)|X] = f(\mu_Z) + \frac{\sigma_Z^2}{2}f^{\prime\prime}(\mu_Z)$ with $\sigma_Z^2 = Var[Z|X]$.

Advantage: The bias correction $\frac{\sigma_Z^2}{2}f^{\prime\prime}(\mu_Z)$ is $X$-dependent.
Disadvantage:
- We usually do not know/model the conditional variance $\sigma_Z^2$. Guassian processes may be the exception. We could approximate it by the unconditional variance, which is easily computed on the training sample.
- The second derivative $f^{\prime\prime}$ might require a numerical approach rather than an analytical.

Use an
- additive $E[f(Z)|X] \approx c + f(\mu_Z)$
- or multiplicative $E[f(Z)|X] \approx c \hspace{2pt} f(\mu_Z)$
approach.

Advantage: $c$ can be easily calculated on the training sample.
Disadvantage: It does not capture the $X$-dependence.

Recommendation

Because in most cases the variance is not available, I favor the second approach with a multiplicative or additive constant where a user can specify whether additive or multiplicative.

AhmedThahir · 2024-12-27T03:07:09Z

Any updates?

glemaitre added the Enhancement label Dec 14, 2019

cmarmo added the module:compose label Feb 6, 2020

lorentzenchr added help wanted Moderate Anything that requires some knowledge of conventions and best practices labels Mar 19, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bias correction for TransformedTargetRegressor #15881

Bias correction for TransformedTargetRegressor #15881

lorentzenchr commented Dec 13, 2019

lorentzenchr commented Jan 7, 2020

lorentzenchr commented May 10, 2020 •

edited

Loading

lorentzenchr commented Jul 4, 2020

jnothman commented Jul 5, 2020 via email

lorentzenchr commented Jul 5, 2020

thomasjpfan commented Jul 5, 2020

lorentzenchr commented Jul 5, 2020

robjhyndman commented Jul 5, 2020

lorentzenchr commented Aug 19, 2020

lorentzenchr commented May 7, 2023 •

edited

Loading

lorentzenchr commented Aug 21, 2023 •

edited

Loading

AhmedThahir commented Dec 27, 2024

Bias correction for TransformedTargetRegressor #15881

Bias correction for TransformedTargetRegressor #15881

Comments

lorentzenchr commented Dec 13, 2019

Description

References

lorentzenchr commented Jan 7, 2020

lorentzenchr commented May 10, 2020 • edited Loading

lorentzenchr commented Jul 4, 2020

jnothman commented Jul 5, 2020 via email

lorentzenchr commented Jul 5, 2020

thomasjpfan commented Jul 5, 2020

lorentzenchr commented Jul 5, 2020

robjhyndman commented Jul 5, 2020

lorentzenchr commented Aug 19, 2020

lorentzenchr commented May 7, 2023 • edited Loading

lorentzenchr commented Aug 21, 2023 • edited Loading

Summary

Recommendation

AhmedThahir commented Dec 27, 2024

lorentzenchr commented May 10, 2020 •

edited

Loading

lorentzenchr commented May 7, 2023 •

edited

Loading

lorentzenchr commented Aug 21, 2023 •

edited

Loading