-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
Bias correction for TransformedTargetRegressor #15881
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Here is an example based on plot_transformed_target.html for the Boston house prices dataset. For simplicity, I used import numpy as np
import pandas as pd
import scipy
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.compose import TransformedTargetRegressor
from sklearn.datasets import load_boston
from sklearn.linear_model import LinearRegression, Ridge
from sklearn.preprocessing import PowerTransformer
from sklearn.metrics import r2_score, median_absolute_error
from sklearn.model_selection import train_test_split
def BC_inverse(y, lam, sig):
"""Back transform Box-Cox with Bias correction.
See https://robjhyndman.com/hyndsight/backtransforming
"""
if lam == 0:
return np.exp(y) * (1 + 0.5 * sig**2)
else:
res = np.power(lam*y + 1., 1/lam)
res *= (1 + 0.5 * sig**2 * (1 - lam) / (lam*y + 1.)**2)
return res
dataset = load_boston()
X = dataset.data
y = dataset.target
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=123, test_size = 0.25)
model = LinearRegression().fit(X_train, y_train)
model_trans = TransformedTargetRegressor(
regressor=LinearRegression(),
transformer=PowerTransformer(method='box-cox', standardize=False)).fit(X_train, y_train)
lam = model_trans.transformer_.lambdas_[0]
# standard deviation of transformed target
z_train = model_trans.transformer_.transform(y_train[:, np.newaxis]).ravel()
z_predict = model_trans.regressor_.predict(X_train)
sig = np.sum((z_train - z_predict)**2)
sig = np.sqrt(sig / (len(y_train) - len(model_trans.regressor_.coef_) - 1))
# simple calibration factor, as a Linear Model with intercept should
# have sum(predicted) = sum(observed) on training set
corr_fac = y_train.sum() / model_trans.predict(X_train).sum()
d = {"untransformed": np.sum(y_train - model.predict(X_train)),
"transformed": np.sum(y_train - model_trans.predict(X_train)),
"bias corrected": np.sum(y_train - BC_inverse(model_trans.regressor_.predict(X_train), lam, sig)),
"rescaled": np.sum(y_train - corr_fac * model_trans.predict(X_train))
}
print("Model calibration on training set (perfect is 0):")
print(d)
d = {"untransformed": r2_score(y_test, model.predict(X_test)),
"transformed": r2_score(y_test, model_trans.predict(X_test)),
"bias corrected": r2_score(y_test, BC_inverse(model_trans.regressor_.predict(X_test), lam, sig)),
"rescaled": r2_score(y_test, corr_fac * model_trans.predict(X_test))
}
print("r2 score on test set (perfect is 1):")
print(d) Result
R2 score on test set (perfect is 1):
Summary |
What about adding an argument
Edit: @glemaitre @jnothman @amueller might be interested (as you did #9041) |
Is this something scikit-learn would consider implementing and is a PR worth the effort? A short comment from a core developer would be very welcome. |
I don't feel very confident to speak to this without more research or
experimentation. When would you say this is likely to be useful / not?
|
If your goal is to predict the expectation E[Y|X] and for some reason transforming Y before fitting is favorable, then I'd say it is always useful and even strongly advised from a statistical point of view. In the example above—which I should maybe replace with the diamonds dataset— the bias corrected versions don't show any disadvantage that I'm aware of. |
Is there a reference comparing the different options for bias correction? |
Unfortunately, I'm not a domain expert here nor do I know a reference. Maybe, if we ask kindly enough, @robjhyndman or @mayer79 could have a look and help out? My understanding is, that the bias corrected inverse Box-Cox transformation has at least a theoretical advantage over the simple global correction factor approach: It is a true second order correction, potentially different for every sample. Disadvantage is that you need analytic formulae for every transformation (2nd derivative, so not too difficult). |
It can also be done numerically. This is how bias correction is handled in fabletools (for R) so that any transformation can be used: https://github.com/tidyverts/fabletools/blob/master/R/transform.R#L102 |
@robjhyndman Thank you for pointing to the numerical solution. If I understand correctly, this is an additive correction, i.e. the biased corrected prediction is |
Is this something we want for scikit-learn? An implementation in line with fabletools seems like a small addition with great impact on the modelling side. |
SummaryLet's say, we want to model
RecommendationBecause in most cases the variance is not available, I favor the second approach with a multiplicative or additive constant where a user can specify whether additive or multiplicative. |
Any updates? |
Description
If one is interested in predicting the (conditional on
X
) expected value of a targety
, aka mean, thenTransformedTargetRegressor
should use a bias corrected inverse transform.It would be nice to have an option for bias correction in
TransformedTargetRegressor
. At least, I would mention this in the user guide.References
https://robjhyndman.com/hyndsight/backtransforming/
The text was updated successfully, but these errors were encountered: