ENH Replace loss module HGBT #20811

lorentzenchr · 2021-08-22T16:19:06Z

Reference Issues/PRs

Follow-up of #20567.

What does this implement/fix? Explain your changes.

This PR replaces the losses of HGBT in sklearn/ensemble/_hist_gradient_boosting with the new common loss module of #20567.

Any other comments?

Similar to #19089, but only HGBT.
~~This PR is based on 7bee26b.~~
Edit: #20567 was merged 🚀

- function is_in_interval_range -> method Interval.includes

lorentzenchr · 2021-12-11T09:11:10Z

Marking as high priority because it enables #21800 very easily.

sklearn/_loss/tests/test_loss.py

sklearn/_loss/loss.py

thomasjpfan · 2021-12-18T00:40:17Z

sklearn/ensemble/_hist_gradient_boosting/gradient_boosting.py

+            if self._loss.constant_hessian:
+                self._loss.gradient(
+                    y_true=y_train,
+                    raw_prediction=raw_predictions.T,


Interesting, I thought there would have been some cpu cache performance issue with either "C" or "F" order. (Depending on which axis prange is iterating over)

sklearn/ensemble/_hist_gradient_boosting/gradient_boosting.py

thomasjpfan

Overall looks good. There is some overhead with remembering that raw_predictions has shape: (n_trees_per_iteration, n_samples) and gradient and hessians has shape: (n_samples, n_trees_per_iteration).

Was there a reason that BaseLoss to prefer (n_samples, n_trees_per_iteration) over the reverse?

sklearn/ensemble/_hist_gradient_boosting/gradient_boosting.py

lorentzenchr · 2021-12-18T15:29:30Z

Overall looks good. There is some overhead with remembering that raw_predictions has shape: (n_trees_per_iteration, n_samples) and gradient and hessians has shape: (n_samples, n_trees_per_iteration).

Was there a reason that BaseLoss to prefer (n_samples, n_trees_per_iteration) over the reverse?

TBH, that's just my preferred way of looking at it. This way, raw_predictions, X, y, predict() and predict_proba all have samples on the first axis (axis=0).

I could try to make it consistent and change all HGBT functions to use raw_prediction.shape=(n_samples, n_trees_per_iteration).

Edit: 27e818f is an attempt to do so. @thomasjpfan what do you think?

thomasjpfan · 2021-12-18T21:42:41Z

Edit: 27e818f is an attempt to do so. @thomasjpfan what do you think?

Yea, I think everything is much nicer now. (The diff is surprising smaller than I thought it would be.)

thomasjpfan

LGTM

sklearn/_loss/loss.py

sklearn/ensemble/_hist_gradient_boosting/tests/test_gradient_boosting.py

jjerphan

Thank you for this PR, @lorentzenchr.

Here are a few comments.

sklearn/ensemble/_hist_gradient_boosting/gradient_boosting.py

sklearn/_loss/loss.py

sklearn/ensemble/_hist_gradient_boosting/gradient_boosting.py

jjerphan

LGTM.

Edit: #22173 has been opened for a follow-up.

lorentzenchr · 2022-01-11T07:24:33Z

2 approvals 🎉
It seems custom that a reviewer merges. Is there a passage in the developer guideline that I missed or is it just custom to do so?

doc/whats_new/v1.1.rst

thomasjpfan

I did one more pass. This looks good for merging. Thank you for working on this!

thomasjpfan · 2022-01-11T20:05:30Z

Is there a passage in the developer guideline that I missed or is it just custom to do so?

I think it's mostly a custom. I do not see it documented anywhere.

lorentzenchr added 30 commits July 18, 2021 19:54

ENH add common link function submodule

f8362d7

ENH add common loss function submodule

afdb67e

CLN replace deprecated np.int by int

830b814

DOC document default=1 for n_threads

9504c89

CLN comments and line wrapping

fb3bce2

CLN comments and doc

2c86bf4

BUG remove useless line of code

d68c07e

CLN remove line that was commented out

3d9c800

CLN nitpicks in comments and docstrings

aba1b67

ENH set NPY_NO_DEPRECATED_API

022e418

MNT change NPY_1_13_API_VERSION to NPY_1_7_API_VERSION

49bb402

MNT comment out NPY_NO_DEPRECATED_API

6d77090

TST restructure domain test cases

ceda673

DOC add losses to API reference

c73e3fa

MNT add classes to __init__

e650522

CLN fix import

a31d8fb

DOC minor docstring changes

e5b6266

TST prefer docstring over comment

3492383

ENH define loss.is_multiclass

9d86d82

DOC fix typos

cc90e4d

CLN address review comments

d0b48ac

DOC small docstring improvements

7794617

TST test more losses in test_specific_fit_intercept_only

35b7423

FIX test_loss_boundary

b390002

ENH Tempita for losses

12b4634

MNT apply black

061a41b

TST replace np.quantile by np.percentile

98f8877

ENH make Interval a dataclass

3f8ffe9

- function is_in_interval_range -> method Interval.includes

DOC improve docstrings in link.py

b5e61d2

MNT use numpy dtype instead of Python type

cfdd67c

lorentzenchr added the High Priority High priority issues and pull requests label Dec 11, 2021

thomasjpfan reviewed Dec 18, 2021

View reviewed changes

sklearn/ensemble/_hist_gradient_boosting/gradient_boosting.py Outdated Show resolved Hide resolved

lorentzenchr added 3 commits December 18, 2021 16:35

TST do not test invalid order argument

131988b

CLN use numpy scalar and mention it in docstring

1985c94

CLN always make raw_predictions.shape=(n_samples, n_trees_per_iteration)

27e818f

lorentzenchr force-pushed the loss_module_hgbt branch from c867c2f to 27e818f Compare December 18, 2021 17:15

thomasjpfan approved these changes Dec 19, 2021

View reviewed changes

sklearn/_loss/loss.py Show resolved Hide resolved

sklearn/ensemble/_hist_gradient_boosting/tests/test_gradient_boosting.py Outdated Show resolved Hide resolved

CLN raw_prediction.shape=(n_sample,1) in test

4df1782

jjerphan reviewed Jan 10, 2022

View reviewed changes

lorentzenchr added 2 commits January 10, 2022 17:08

CLN address review comments

a0af877

Merge branch 'main' into loss_module_hgbt

54d4e5b

jjerphan approved these changes Jan 10, 2022

View reviewed changes

thomasjpfan changed the title ~~[MRG] ENH Replace loss module HGBT~~ ENH Replace loss module HGBT Jan 11, 2022

thomasjpfan reviewed Jan 11, 2022

View reviewed changes

doc/whats_new/v1.1.rst Outdated Show resolved Hide resolved

CLN remove old whatsnew entry

502d3b6

thomasjpfan approved these changes Jan 11, 2022

View reviewed changes

thomasjpfan merged commit 4e974e0 into scikit-learn:main Jan 11, 2022

lorentzenchr deleted the loss_module_hgbt branch January 11, 2022 22:18

lorentzenchr mentioned this pull request Feb 7, 2022

FEA Add Gamma deviance as loss function to HGBT #22409

Merged

lorentzenchr mentioned this pull request Feb 25, 2022

A common private module for differentiable loss functions used as objective functions in estimators #15123

Closed

lorentzenchr mentioned this pull request Mar 26, 2023

ENH Support sample weights when fitting HistGradientBoosting estimator #25431

Closed

6 tasks

lorentzenchr added Performance and removed Waiting for Reviewer labels May 22, 2025

Uh oh!

ENH Replace loss module HGBT #20811

ENH Replace loss module HGBT #20811

Uh oh!

Conversation

lorentzenchr commented Aug 22, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

lorentzenchr commented Dec 11, 2021

Uh oh!

Uh oh!

Uh oh!

thomasjpfan Dec 18, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

thomasjpfan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

lorentzenchr commented Dec 18, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

thomasjpfan commented Dec 18, 2021

Uh oh!

thomasjpfan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

jjerphan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jjerphan left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lorentzenchr commented Jan 11, 2022

Uh oh!

Uh oh!

thomasjpfan left a comment

Choose a reason for hiding this comment

Uh oh!

thomasjpfan commented Jan 11, 2022

Uh oh!

Uh oh!

lorentzenchr commented Aug 22, 2021 •

edited

Loading

thomasjpfan Dec 18, 2021 •

edited

Loading

lorentzenchr commented Dec 18, 2021 •

edited

Loading

jjerphan left a comment •

edited

Loading