Skip to content

FEA post-fit calibration option in HGBT #22435

@lorentzenchr

Description

@lorentzenchr

Describe the workflow you want to enable

The histogram gradient boosted decision trees usually do not fulfil the so called balance property on the training data, i.e. sum([proba]predictions) == sum(observations). A simple "post-fit" step could ensure this condition. This will usually decrease the in-sample performance, but I observed in practise that is is quite beneficial for the out-of-sample performance, in particular for non-canonical link-loss combinations such as the Gamma deviance with log link (with XGBoost or LightGBM as it is not yet available in scikit-learn). The main advantage is, however, better calibrated models, in-sample as well as out-of-sample.

model = HistGradientBoostingRegressor()  # could also be a classifier
model.fit(X, y)
# post-fit recalibration to fulfil the balance property
model._baseline_prediction = link_function(np.mean(y) / np.mean(model.predit[_proba](X))

For quantiles, this would be slightly different.

Describe your proposed solution

Add a new option post_fit_calibration (better name?!):

model = HistGradientBoostingRegressor(post_fit_calibration=True)  # could also be a classifier
model.fit(X, y)

Describe alternatives you've considered, if relevant

One could also invent a meta-estimator.

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions