[MRG] Monotonic constraints for GBDT #15582

NicolasHug · 2019-11-09T23:48:39Z

This PR adds support for monotonic constraints for the histogram GBDTs.

Addresses #6656

(see https://xgboost.readthedocs.io/en/latest/tutorials/monotonic.html)

The API is to pass e.g. HistGradientBoostingRegressor(monotonic_cst=[-1, 1])

For reviewers: The overall logic is pretty simple, but this still involved a lot of changes because I had to refactor the splitter, since now we require all nodes to have a value (previously only leaves would have a value).

@ogrisel @adrinjalali @thomasjpfan @amueller @glemaitre might be interested in this (after the release of course)

…notonic_constraints

NicolasHug · 2019-11-10T00:00:46Z

Reproducing https://xgboost.readthedocs.io/en/latest//tutorials/monotonic.html 🤓

amueller · 2019-11-10T00:02:49Z

why not make it an init argument for now?

NicolasHug · 2019-11-10T00:18:04Z

This is more of a fit parameter right? Since the constraints are a property of X.

I'm open to anything, but ideally we wouldn't merge this PR if we knew for sure that the API would change in the future.

NicolasHug · 2019-11-10T02:40:31Z

sklearn/ensemble/_hist_gradient_boosting/splitting.pyx

+            if gain > best_gain and gain > self.min_gain_to_split:
+                found_better_split = True
+                best_gain = gain
+                best_bin_idx = bin_idx
+                best_sum_gradient_left = sum_gradient_left
+                best_sum_hessian_left = sum_hessian_left
+                best_n_samples_left = n_samples_left
+
+        if found_better_split:
+            split_info.gain = best_gain
+            split_info.bin_idx = best_bin_idx
+            # we scan from left to right so missing values go to the right
+            split_info.missing_go_to_left = False
+            split_info.sum_gradient_left = best_sum_gradient_left
+            split_info.sum_gradient_right = sum_gradients - best_sum_gradient_left
+            split_info.sum_hessian_left = best_sum_hessian_left
+            split_info.sum_hessian_right = sum_hessians - best_sum_hessian_left
+            split_info.n_samples_left = best_n_samples_left
+            split_info.n_samples_right = n_samples - best_n_samples_left
+
+            # We recompute best values here but it's cheap
+            split_info.value_left = compute_value(
+                split_info.sum_gradient_left, split_info.sum_hessian_left,
+                lower_bound, upper_bound, self.l2_regularization)
+
+            split_info.value_right = compute_value(
+                split_info.sum_gradient_right, split_info.sum_hessian_right,
+                lower_bound, upper_bound, self.l2_regularization)


For reviewers: apart from the leaf value computation (which is new), the logic here is actually unchanged.

It's mostly just a small optimization: instead of setting the all the attribute of the split_info at each iteration, we only do it once at the end

NicolasHug · 2019-11-10T02:44:31Z

sklearn/ensemble/_hist_gradient_boosting/splitting.pyx

-    gain = negative_loss(sum_gradient_left, sum_hessian_left,
-                         l2_regularization)
-    gain += negative_loss(sum_gradient_right, sum_hessian_right,
-                          l2_regularization)
-    gain -= negative_loss_current_node


For reviewers:

The gain computation has been updated to:

first compute values of left and right child

cap those values according to the bounds for monotonic constraints

discard any split that does not respect left < right (INC) or right < left (DEC)

compute the loss reduction from the previously (bounded) computed values

NicolasHug · 2019-11-10T02:45:41Z

sklearn/ensemble/_hist_gradient_boosting/tests/test_monotonic_contraints.py

+    # Considering the following tree with a monotonic INC constraint, we
+    # should have:


Note for reviewers: start here

sklearn/ensemble/_hist_gradient_boosting/grower.py

NicolasHug

Thanks a lot for the review!

I addressed most comments and will try a fix tomorrow for the gain computation.

NicolasHug · 2020-03-19T22:38:03Z

sklearn/ensemble/_hist_gradient_boosting/tests/test_monotonic_contraints.py

+    # or all decreasing (or neither) depending on the monotonic constraint.
+    nodes = predictor.nodes
+
+    def get_leaves_values():


For me that would suggest that there is only one leaf and that one leaf has multiple values.

Thoughts @jnothman ?

NicolasHug · 2020-03-19T22:40:47Z

sklearn/ensemble/_hist_gradient_boosting/tests/test_monotonic_contraints.py

+        dfs(node.left_child)
+        dfs(node.right_child)
+
+    dfs(grower.root)


I'm not sure we can do that here because unlike the predictor object, the grower does not provide an array of the nodes. It only has the root, and a finalized_leaves list which isn't enough for us here.

sklearn/ensemble/_hist_gradient_boosting/tests/test_monotonic_contraints.py

NicolasHug · 2020-03-19T23:22:11Z

sklearn/ensemble/_hist_gradient_boosting/splitting.pyx

+    gain -= _loss_from_value(value_right, sum_gradient_right)  # with bounds
+    # Note that the losses for both children are computed with bounded values,
+    # while the loss of the current node isn't. It's OK since all the gain
+    # comparisons will be made with the same loss_current_node anyway.


I'm reasonably confident that lightgbm does it similarly: see https://github.com/microsoft/LightGBM/blob/master/src/treelearner/feature_histogram.hpp#L160, where the gain of the current node to be split will not take constraints into account (follow BeforeNumercal and then GetLeafGain). But I could be missing something.

You're right that it doesn't work well withmin_gain_to_split. I guess the correct way would be to pass the current node's value into find_node_split.

I'll try tomorrow and will report back!

sklearn/ensemble/_hist_gradient_boosting/tests/test_monotonic_contraints.py

…contraints.py Co-Authored-By: Olivier Grisel <olivier.grisel@ensta.org>

NicolasHug · 2020-03-20T14:47:33Z

OK, I pushed something and added a test.

I'm still a little bit hesitant on what we should be doing. I think that in the end, it all boils down on how we define the loss at a given node:

loss = -sum_grad**2 / sum_hessians

or

loss = sum_grad * value, where value = clip(-sum_grad / sum_hessians)

They're both equivalent if no clipping happens, like in the XGBoost paper.

In any case, I agree we should be consistent and use the same formula for all nodes. In the previous version (and in LightGBM as far as I understand), we were using 1 for the current node and 2 for the children. Now, we're using 2 for all nodes.

WDYT @ogrisel, good to go?

ogrisel

LGTM. Can you just do a quick benchmark to check that the latest change did not cause a significant performance regression w.r.t. master?

NicolasHug · 2020-03-21T21:02:58Z

Thanks for the reminder,

There actually was a slow-down due to compute_node_value having lots of interactions. I fixed it by putting it in splitting.pyx instead of common.pyx (Some Cython magic at work again).

With python benchmarks/bench_hist_gradient_boosting_higgsboson.py --n-trees 500 --subsample 500000 --n-leaf-nodes 255,

I get about 51sec in the current branch and 47 on master now.

So there's still a slow down, but as far as I can tell the Cython code is about just as fast. It seems we're spending more time in the Python part.

That might be due to the apply_shrinkage() method which has to go through all the leaves at the end. I tried to remove it but the speed gain didn't seem really significant (also that made the code quite uglier).

…notonic_constraints

ogrisel · 2020-03-22T18:33:10Z

Maybe you can use py-spy with native code profiling enabled both on master and on this branch and compare the 2 flamegraphs to help identify the discrepancy?

NicolasHug · 2020-03-23T00:01:23Z

Great suggestion!

It turns out the grower is spending a significant amount of time setting the bounds of the children. That's where the difference came from. I'm quite surprised because there's no constraint at all in the benchmark. But I guess it adds up in the end since this is done for every single node.

I added a fast way in 83dba40. Now both times are similar:

branch	('total', 'count')	('total', 'mean')	('total', 'std')	('total', 'min')	('total', '50%')	('total', 'max')	('histogram computation', 'count')	('histogram computation', 'mean')	('histogram computation', 'std')	('histogram computation', 'min')	('histogram computation', '50%')	('histogram computation', 'max')	('finding best splits', 'count')	('finding best splits', 'mean')	('finding best splits', 'std')	('finding best splits', 'min')	('finding best splits', '50%')	('finding best splits', 'max')	('applying splits', 'count')	('applying splits', 'mean')	('applying splits', 'std')	('applying splits', 'min')	('applying splits', '50%')	('applying splits', 'max')	('predicting', 'count')	('predicting', 'mean')	('predicting', 'std')	('predicting', 'min')	('predicting', '50%')	('predicting', 'max')
master	10	45.6105	0.723786	44.832	45.4225	47.25	10	19.1552	0.572901	18.522	18.976	20.476	10	4.1208	0.117993	4.003	4.1025	4.398	10	7.4773	0.159384	7.289	7.428	7.809	10	0.7342	0.0298619	0.687	0.7285	0.781
monotonic_cst	10	45.3345	0.405736	44.625	45.32	46.034	10	19.0646	0.426279	18.43	19.047	19.878	10	4.0572	0.0953832	3.879	4.076	4.162	10	7.3875	0.0822398	7.225	7.3965	7.496	10	0.7201	0.0145789	0.704	0.716	0.747

(I ran them on a different machine from #15582 (comment), in case you're wondering why it's faster in both cases)

adrinjalali

Thanks @NicolasHug , other than the one concern, looks all good.

sklearn/ensemble/_hist_gradient_boosting/grower.py

…notonic_constraints

sklearn/ensemble/_hist_gradient_boosting/tests/test_gradient_boosting.py

ogrisel · 2020-03-24T19:26:51Z

Alright, very nice work. Let's merge!

NicolasHug · 2020-03-24T19:38:22Z

Thanks a lot @adrinjalali and @ogrisel for the reviews!

NicolasHug added 23 commits November 6, 2019 14:05

WIP

7eb7827

more WIP

698636b

original tests OK except for splitting

e62fe14

some tests

ad0e9f1

more tests

cec48bc

comments

0e84854

cleaned splitter tests and ignored the warm start ones

583f2a1

Merge branch 'master' of github.com:scikit-learn/scikit-learn into mo…

e818904

…notonic_constraints

WIP

887ca02

WIP

086a766

fouund bug, will fix later

1605276

now only shrink at after tree is grown

b56538c

more tests

e823883

tests

5d943c1

Some cleaning

cf502cc

small optimization for best bin finding

165490e

cleaning

f6e9ad8

used enum type for constraint

e7913f5

flake8

2ad5d1a

comments

7d524ed

Merge branch 'master' of github.com:scikit-learn/scikit-learn into mo…

b818001

…notonic_constraints

cleaned diff

8629171

Added example

84e3e14

pep8

c727d9e

use rand instead of random

de83163

NicolasHug commented Nov 10, 2019

View reviewed changes

NicolasHug commented Nov 11, 2019

View reviewed changes

sklearn/ensemble/_hist_gradient_boosting/grower.py Show resolved Hide resolved

NicolasHug added 3 commits March 19, 2020 18:41

avoid dfs and parse array instead

aa37a1f

used assert_allclose

b885617

Added comment about dfs

15cbbb4

NicolasHug commented Mar 19, 2020

View reviewed changes

ogrisel reviewed Mar 20, 2020

View reviewed changes

sklearn/ensemble/_hist_gradient_boosting/tests/test_monotonic_contraints.py Outdated Show resolved Hide resolved

NicolasHug and others added 2 commits March 20, 2020 07:23

Update sklearn/ensemble/_hist_gradient_boosting/tests/test_monotonic_…

77ae16d

…contraints.py Co-Authored-By: Olivier Grisel <olivier.grisel@ensta.org>

Cap current node value when computing loss

39ed017

ogrisel approved these changes Mar 21, 2020

View reviewed changes

Avoid some interactions

0b835fb

NicolasHug added 3 commits March 22, 2020 14:10

Merge branch 'master' of github.com:scikit-learn/scikit-learn into mo…

f8ba277

…notonic_constraints

Added whatsnew

9b785fa

Put back scoring default to 'loss' (bad merge probably)

57fde2a

Use fast way when there's no constraints

83dba40

adrinjalali approved these changes Mar 23, 2020

View reviewed changes

sklearn/ensemble/_hist_gradient_boosting/grower.py Outdated Show resolved Hide resolved

NicolasHug added 6 commits March 23, 2020 10:45

Merge branch 'master' of github.com:scikit-learn/scikit-learn into mo…

abede37

…notonic_constraints

Never compute root's value, we don't need it

f0135d8

typo

1f0e056

Acutally set it in constructor

d953457

Added test for single node trees

c945f8e

pep8

20d4bd6

NicolasHug commented Mar 23, 2020

View reviewed changes

sklearn/ensemble/_hist_gradient_boosting/tests/test_gradient_boosting.py Show resolved Hide resolved

ogrisel merged commit 36ebf3e into scikit-learn:master Mar 24, 2020

gio8tisu pushed a commit to gio8tisu/scikit-learn that referenced this pull request May 15, 2020

[MRG] Monotonic constraints for GBDT (scikit-learn#15582)

d388a03

jjerphan mentioned this pull request Mar 9, 2022

Add monotonicity parameter to Gradient Boosting and Decision Trees #4950

Closed

		# Considering the following tree with a monotonic INC constraint, we
		# should have:

Uh oh!

[MRG] Monotonic constraints for GBDT #15582

[MRG] Monotonic constraints for GBDT #15582

Uh oh!

Conversation

NicolasHug commented Nov 9, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

NicolasHug commented Nov 10, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

amueller commented Nov 10, 2019

Uh oh!

NicolasHug commented Nov 10, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

NicolasHug Nov 10, 2019

Choose a reason for hiding this comment

Uh oh!

NicolasHug Nov 10, 2019

Choose a reason for hiding this comment

Uh oh!

NicolasHug Nov 10, 2019

Choose a reason for hiding this comment

Uh oh!

Uh oh!

NicolasHug left a comment

Choose a reason for hiding this comment

Uh oh!

NicolasHug Mar 19, 2020

Choose a reason for hiding this comment

Uh oh!

NicolasHug Mar 19, 2020

Choose a reason for hiding this comment

Uh oh!

Uh oh!

NicolasHug Mar 19, 2020

Choose a reason for hiding this comment

Uh oh!

Uh oh!

NicolasHug commented Mar 20, 2020

Uh oh!

ogrisel left a comment

Choose a reason for hiding this comment

Uh oh!

NicolasHug commented Mar 21, 2020

Uh oh!

ogrisel commented Mar 22, 2020

Uh oh!

NicolasHug commented Mar 23, 2020

Uh oh!

adrinjalali left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

ogrisel commented Mar 24, 2020

Uh oh!

NicolasHug commented Mar 24, 2020

Uh oh!

Uh oh!

NicolasHug commented Nov 9, 2019 •

edited

Loading

NicolasHug commented Nov 10, 2019 •

edited

Loading

NicolasHug commented Nov 10, 2019 •

edited

Loading