Skip to content

Bias correction for TransformedTargetRegressor #15881

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
lorentzenchr opened this issue Dec 13, 2019 · 12 comments
Open

Bias correction for TransformedTargetRegressor #15881

lorentzenchr opened this issue Dec 13, 2019 · 12 comments
Labels
Enhancement help wanted Moderate Anything that requires some knowledge of conventions and best practices module:compose

Comments

@lorentzenchr
Copy link
Member

Description

If one is interested in predicting the (conditional on X) expected value of a target y, aka mean, then TransformedTargetRegressor should use a bias corrected inverse transform.

It would be nice to have an option for bias correction in TransformedTargetRegressor. At least, I would mention this in the user guide.

References

https://robjhyndman.com/hyndsight/backtransforming/

@lorentzenchr
Copy link
Member Author

Here is an example based on plot_transformed_target.html for the Boston house prices dataset. For simplicity, I used LinearRegression instead of RidgeCV and a PowerTransformer(method='box-cox') instead of QuantileTransformer, so that one can use the bias corrected backtransformation of the Box-Cox given in the referenced link above:

import numpy as np
import pandas as pd
import scipy

import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.compose import TransformedTargetRegressor
from sklearn.datasets import load_boston
from sklearn.linear_model import LinearRegression, Ridge
from sklearn.preprocessing import PowerTransformer
from sklearn.metrics import r2_score, median_absolute_error
from sklearn.model_selection import train_test_split


def BC_inverse(y, lam, sig):
    """Back transform Box-Cox with Bias correction.
    See https://robjhyndman.com/hyndsight/backtransforming
    """
    if lam == 0:
        return np.exp(y) * (1 + 0.5 * sig**2)
    else:
        res = np.power(lam*y + 1., 1/lam)
        res *= (1 + 0.5 * sig**2 * (1 - lam) / (lam*y + 1.)**2)
        return res

dataset = load_boston()
X = dataset.data
y = dataset.target
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=123, test_size = 0.25)

model = LinearRegression().fit(X_train, y_train)
model_trans = TransformedTargetRegressor(
    regressor=LinearRegression(),
    transformer=PowerTransformer(method='box-cox', standardize=False)).fit(X_train, y_train)

lam = model_trans.transformer_.lambdas_[0]
# standard deviation of transformed target
z_train = model_trans.transformer_.transform(y_train[:, np.newaxis]).ravel()
z_predict = model_trans.regressor_.predict(X_train)
sig = np.sum((z_train - z_predict)**2)
sig = np.sqrt(sig / (len(y_train) - len(model_trans.regressor_.coef_) - 1))

# simple calibration factor, as a Linear Model with intercept should
# have sum(predicted) = sum(observed) on training set
corr_fac = y_train.sum() / model_trans.predict(X_train).sum()

d = {"untransformed": np.sum(y_train - model.predict(X_train)),
     "transformed": np.sum(y_train - model_trans.predict(X_train)),
     "bias corrected": np.sum(y_train - BC_inverse(model_trans.regressor_.predict(X_train), lam, sig)),
     "rescaled": np.sum(y_train - corr_fac * model_trans.predict(X_train))
}
print("Model calibration on training set (perfect is 0):")
print(d)

d = {"untransformed": r2_score(y_test, model.predict(X_test)),
     "transformed": r2_score(y_test, model_trans.predict(X_test)),
     "bias corrected": r2_score(y_test, BC_inverse(model_trans.regressor_.predict(X_test), lam, sig)),
     "rescaled": r2_score(y_test, corr_fac * model_trans.predict(X_test))
    }
print("r2 score on test set (perfect is 1):")
print(d)

Result
Model calibration on training set (perfect is 0):

  • 'untransformed': 9.414691248821327e-12
  • 'transformed': 115.6547264252301
  • 'bias corrected': 5.4043223163621334
  • 'rescaled': 7.673861546209082e-13

R2 score on test set (perfect is 1):

  • 'untransformed': 0.6862448857295753
  • 'transformed': 0.7405742064643839
  • 'bias corrected': 0.7423703905632908
  • 'rescaled': 0.7424282009454282

Summary
The bias corrected results are better calibrated (on training set) and even have a higher R2 score (on test set).

@lorentzenchr
Copy link
Member Author

lorentzenchr commented May 10, 2020

What about adding an argument bias_correction to TransformedTargetRegressor with the options:

  • None: No bias correction (as is).
  • rescale: Acts like a post-fit-step. After fitting, calculates c = sum(y)/sum(predict(X)) and then multiplies every prediction by c.
  • recenter: Acts like a post-fit-step. After fitting, calculates c = mean(y - predict(X)) and then adds c to every prediction.
  • transformer: Uses the bias correction of the transformer if it has one implemented.

Edit:
This must be adapted, if the aim is not the expectation (mean) but the median for example some other property of the target distribution.
Note: Quantiles are invariant (don't change) under monotone (increasing) transformations and therefore don't need a bias correction.

@glemaitre @jnothman @amueller might be interested (as you did #9041)

@lorentzenchr
Copy link
Member Author

Is this something scikit-learn would consider implementing and is a PR worth the effort? A short comment from a core developer would be very welcome.

@jnothman
Copy link
Member

jnothman commented Jul 5, 2020 via email

@lorentzenchr
Copy link
Member Author

If your goal is to predict the expectation E[Y|X] and for some reason transforming Y before fitting is favorable, then I'd say it is always useful and even strongly advised from a statistical point of view.

In the example above—which I should maybe replace with the diamonds dataset— the bias corrected versions don't show any disadvantage that I'm aware of.

@thomasjpfan
Copy link
Member

Is there a reference comparing the different options for bias correction?

@lorentzenchr
Copy link
Member Author

Unfortunately, I'm not a domain expert here nor do I know a reference. Maybe, if we ask kindly enough, @robjhyndman or @mayer79 could have a look and help out?

My understanding is, that the bias corrected inverse Box-Cox transformation has at least a theoretical advantage over the simple global correction factor approach: It is a true second order correction, potentially different for every sample. Disadvantage is that you need analytic formulae for every transformation (2nd derivative, so not too difficult).

@robjhyndman
Copy link

It can also be done numerically. This is how bias correction is handled in fabletools (for R) so that any transformation can be used: https://github.com/tidyverts/fabletools/blob/master/R/transform.R#L102

@lorentzenchr
Copy link
Member Author

@robjhyndman Thank you for pointing to the numerical solution. If I understand correctly, this is an additive correction, i.e. the biased corrected prediction is predict_biased + bias_correction.

@lorentzenchr lorentzenchr added help wanted Moderate Anything that requires some knowledge of conventions and best practices labels Mar 19, 2021
@lorentzenchr
Copy link
Member Author

lorentzenchr commented May 7, 2023

Is this something we want for scikit-learn? An implementation in line with fabletools seems like a small addition with great impact on the modelling side.

@lorentzenchr
Copy link
Member Author

lorentzenchr commented Aug 21, 2023

Summary

Let's say, we want to model $\mu_Y = E[Y|X]$, transform to $Y = f(Z)$ (inverse transform $f$), actually model $\mu_Z = E[Z|X]$ and finally want to transform back. Then there are 2 options:

  1. Use (Taylor series) $\mu_Y \approx E[f(Z)|X] = f(\mu_Z) + \frac{\sigma_Z^2}{2}f^{\prime\prime}(\mu_Z)$ with $\sigma_Z^2 = Var[Z|X]$.
  • Advantage: The bias correction $\frac{\sigma_Z^2}{2}f^{\prime\prime}(\mu_Z)$ is $X$-dependent.
  • Disadvantage:
    • We usually do not know/model the conditional variance $\sigma_Z^2$. Guassian processes may be the exception. We could approximate it by the unconditional variance, which is easily computed on the training sample.
    • The second derivative $f^{\prime\prime}$ might require a numerical approach rather than an analytical.
  1. Use an

    • additive $E[f(Z)|X] \approx c + f(\mu_Z)$
    • or multiplicative $E[f(Z)|X] \approx c \hspace{2pt} f(\mu_Z)$

    approach.

  • Advantage: $c$ can be easily calculated on the training sample.
  • Disadvantage: It does not capture the $X$-dependence.

Recommendation

Because in most cases the variance is not available, I favor the second approach with a multiplicative or additive constant where a user can specify whether additive or multiplicative.

@AhmedThahir
Copy link

Any updates?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement help wanted Moderate Anything that requires some knowledge of conventions and best practices module:compose
Projects
None yet
Development

No branches or pull requests

7 participants