Add monotonicity parameter to Gradient Boosting and Decision Trees #4950

galvare2 · 2015-07-11T00:13:25Z

This PR adds a new optional parameter "monotonicity" to the "fit" method of Gradient Boosting models and of Decision Tree models, allowing the caller to pass in an array of length [n_features] specifying the desired monotonicity of the predicted output with respect to each input feature. When a monotonicity is specified, the algorithm only constructs decision trees that obey the desired monotonicity -- possible values are -1 (output must be decreasing), 1 (output must be increasing), and 0 (no constraint for this feature).

This functionality is modeled after the package "GBM" in R, which allows for a monotonicity parameter, and is implemented similarly.

agramfort · 2015-07-11T07:25:55Z

can you clarify what is it useful for? also a parameter that does not depend on n_samples should be in the init and not the fit.

amueller · 2015-07-11T15:58:44Z

Do we want that? That seems like it would be applicable to all tree and linear models.
Having a use-case and a domain where this is commonly used would be good indeed.

galvare2 · 2015-07-13T18:40:52Z

Constraining monotonicity can be a helpful tool for for preventing overfitting by incorporating prior information or assumptions about a dataset and its features. My team uses this constraint to prevent overfitting of quirks in our somewhat noisy training data. One significant example: a particular feature has values that are generally correlated well with the label, but there is a class of examples with low labels but high values of this feature. Without a monotonicity constraint, the algorithm puts a strong negative weight on a small range of feature values, but high weights on either side; see the attached example. Because of the nature of this feature, with the monotonicity constraint, we learn a smoother curve for the feature that generalizes better to new data.

galvare2 · 2015-07-14T18:34:38Z

I've added a commit that incorporates the feedback to move monotonicity from fit to init. Let me know what else I should do.

pprett · 2015-07-14T18:41:40Z

AFAIK this is sometimes used in insurance (e.g. pricing models, e.g. age of a car should have a monotonically increasing effect on price).

if it doesnt entail any performance penalty I think its a great addition

galvare2 · 2015-07-14T23:28:34Z

The nature of the code added to _tree.pyx doesn't lead me to believe that it would cause a slowdown, and I have not noticed a change in performance when testing the algorithm locally. If it would be productive, I could do a more official performance profile and report the results.

amueller · 2015-07-21T09:01:34Z

I'm a bit sceptical about complicating the tree code too much for a niche case.

jmschrei · 2015-08-03T19:42:34Z

From purely a coding point of view, the actual changes to the code do not look too complicated at the _tree.pyx level, and I don't ~~think~~ it would cause a slowdown. However, I am both not positive if it causes a slowdown (not having run the code myself), and unsure how niche a case this is. Does anyone know if this is a common use case?

EDIT: It would be productive to give a profile. Please use some of the standard datasets, and show results building forests that scale in both depth and number of trees. I'm guessing you should get similar results, but it's always better to back up your claims with data.

galvare2 · 2015-08-27T23:24:55Z

Sorry for the delay, I'm currently working on profiling the code. I'm not sure how much detail is necessary. I was planning on plotting time vs depth and number of trees for different data sets. Is more detail required or is this enough?

FedericoV · 2015-08-30T14:07:48Z

I can think of several useful use cases:

House price should never decrease with size (ceteris paribus)
Income should always increase with education

and so forth... I would like to see some examples where importing this kind of prior causes a strong improvement in prediction though. Can you show this on something like the house dataset?

galvare2 · 2015-08-31T23:37:40Z

Here I have some performance plots that show times to run on standard datasets with various values for n_estimators and max_depth. The three results being compared are the performance of my branch with a monotonicity vector used, my branch without a monotonicity vector used, and the master branch. The last two should be nearly identical. All performances are averaged over 10 runs and include one-sigma error bars. The results are that my branch with monotonicity enabled takes up to around 2-3x longer than with monotonicity disabled, with the larger performance differences only becoming apparent for high values of n_estimators. Using my branch but not passing in a monotonicity vector does not incur a significant performance difference for any parameter values.
depth_boston
depth_diabetes
depth_iris
n_estimators_boston
n_estimators_diabetes
n_estimators_iris

FedericoV · 2015-08-31T23:52:04Z

Maybe I'm missing something obvious, but did you plot accuracy/RMS anywhere?

On Tue, 1 Sep 2015 at 01:38 galvare2 notifications@github.com wrote:

Here I have some performance plots that show times to run on standard
datasets with various values for n_estimators and max_depth. The three
results being compared are the performance of my branch with a monotonicity
vector used, my branch without a monotonicity vector used, and the master
branch. The last two should be nearly identical. All performances are
averaged over 10 runs and include one-sigma error bars. The results are
that my branch with monotonicity enabled takes up to around 2-3x longer
than with monotonicity disabled, with the larger performance differences
only becoming apparent for high values of n_estimators. Using my branch but
not passing in a monotonicity vector does not incur a significant
performance difference for any parameter values.
depth_boston
https://cloud.githubusercontent.com/assets/5194963/9592958/cc04fe60-4ffd-11e5-8794-cc1d9890d057.png
depth_diabetes
https://cloud.githubusercontent.com/assets/5194963/9592959/cd8b3344-4ffd-11e5-9dd9-f9f586a2b4bf.png
depth_iris
https://cloud.githubusercontent.com/assets/5194963/9592961/cfc7e9cc-4ffd-11e5-8d69-0e6e0b8dd6dc.png
n_estimators_boston
https://cloud.githubusercontent.com/assets/5194963/9592962/d0da8fe0-4ffd-11e5-8189-59e6aa8fe893.png
n_estimators_diabetes
https://cloud.githubusercontent.com/assets/5194963/9592963/d21901f2-4ffd-11e5-8ff0-363e3ff259aa.png
n_estimators_iris
https://cloud.githubusercontent.com/assets/5194963/9592964/d300cd16-4ffd-11e5-8f50-85a97d0863e9.png

—
Reply to this email directly or view it on GitHub
#4950 (comment)
.

galvare2 · 2015-08-31T23:56:25Z

The purpose of my analysis here was just to report time slowdown rather than accuracy. I was using random monotonicity vectors to train the models, so I wouldn't expect the accuracy to yield any meaningful information. I will construct an example in which monotonicity actually does meaningfully improve accuracy and report that in the future.

ogrisel · 2015-09-01T07:29:53Z

The appveyor failure is caused by an old appveyor.yml setup that does not match the current state of the appveyor infrastructure. Please rebase this PR on top of the current master to fix it.

ogrisel · 2015-09-01T07:38:37Z

If I understand correctly, the gist of this contribution is similar to using spline smoothing as done in http://nerds.airbnb.com/aerosolve/ by AirBNB as explained in the second part of this video:

https://www.youtube.com/watch?v=Cwwu_AVh9n0

…on Trees.

amueller · 2015-09-08T21:30:59Z

It looks like it is doesn't add much time complexity. I'm not so sure about code complexity. @glouppe @arjoly do you have opinions?

shoelsch · 2016-02-02T21:24:33Z

What's the status on this PR? I'm very interested in enforcing monotonicity. @galvare2 @amueller @glouppe @arjoly

ogrisel · 2019-06-26T09:24:06Z

I someone is interested I think it would be relevant to add this feature to the fast Histogram-based GBRT introduced in scikit-learn 0.21 instead.

jjerphan · 2022-03-09T15:57:36Z

It seems that part of the changes proposed in PRs have respectively been super-seeded by #15582 and #13649.

jjerphan · 2022-08-03T09:44:23Z

Closing this PR in favour of #15582 and #13649 which have superseded this PR.

Add monotonicity constraint parameter to Gradient Boosting and Decisi…

550fd38

…on Trees.

galvare2 force-pushed the tree-monotonicity branch from ae699e5 to 550fd38 Compare September 4, 2015 19:57

amueller added the Waiting for Reviewer label Aug 6, 2019

github-actions bot added module:ensemble module:tree labels Mar 2, 2020

cmarmo removed the Waiting for Reviewer label Dec 14, 2020

Base automatically changed from master to main January 22, 2021 10:48

thomasjpfan added the cython label Apr 13, 2021

cmarmo added the Needs Decision Requires decision label Feb 14, 2022

jjerphan closed this Aug 3, 2022

Uh oh!

Add monotonicity parameter to Gradient Boosting and Decision Trees #4950

Add monotonicity parameter to Gradient Boosting and Decision Trees #4950

Uh oh!

Conversation

galvare2 commented Jul 11, 2015

Uh oh!

agramfort commented Jul 11, 2015 via email

Uh oh!

amueller commented Jul 11, 2015

Uh oh!

galvare2 commented Jul 13, 2015

Uh oh!

galvare2 commented Jul 14, 2015

Uh oh!

pprett commented Jul 14, 2015

Uh oh!

galvare2 commented Jul 14, 2015

Uh oh!

amueller commented Jul 21, 2015

Uh oh!

jmschrei commented Aug 3, 2015

Uh oh!

galvare2 commented Aug 27, 2015

Uh oh!

FedericoV commented Aug 30, 2015

Uh oh!

galvare2 commented Aug 31, 2015

Uh oh!

FedericoV commented Aug 31, 2015

Uh oh!

galvare2 commented Aug 31, 2015

Uh oh!

ogrisel commented Sep 1, 2015

Uh oh!

ogrisel commented Sep 1, 2015

Uh oh!

amueller commented Sep 8, 2015

Uh oh!

shoelsch commented Feb 2, 2016

Uh oh!

ogrisel commented Jun 26, 2019

Uh oh!

jjerphan commented Mar 9, 2022

Uh oh!

jjerphan commented Aug 3, 2022

Uh oh!

Uh oh!