Skip to content

Add monotonicity parameter to Gradient Boosting and Decision Trees #4950

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

galvare2
Copy link

This PR adds a new optional parameter "monotonicity" to the "fit" method of Gradient Boosting models and of Decision Tree models, allowing the caller to pass in an array of length [n_features] specifying the desired monotonicity of the predicted output with respect to each input feature. When a monotonicity is specified, the algorithm only constructs decision trees that obey the desired monotonicity -- possible values are -1 (output must be decreasing), 1 (output must be increasing), and 0 (no constraint for this feature).

This functionality is modeled after the package "GBM" in R, which allows for a monotonicity parameter, and is implemented similarly.

@agramfort
Copy link
Member

agramfort commented Jul 11, 2015 via email

@amueller
Copy link
Member

Do we want that? That seems like it would be applicable to all tree and linear models.
Having a use-case and a domain where this is commonly used would be good indeed.

@galvare2
Copy link
Author

Constraining monotonicity can be a helpful tool for for preventing overfitting by incorporating prior information or assumptions about a dataset and its features. My team uses this constraint to prevent overfitting of quirks in our somewhat noisy training data. One significant example: a particular feature has values that are generally correlated well with the label, but there is a class of examples with low labels but high values of this feature. Without a monotonicity constraint, the algorithm puts a strong negative weight on a small range of feature values, but high weights on either side; see the attached example. Because of the nature of this feature, with the monotonicity constraint, we learn a smoother curve for the feature that generalizes better to new data.

nonmonotonic-example

@galvare2
Copy link
Author

I've added a commit that incorporates the feedback to move monotonicity from fit to init. Let me know what else I should do.

@pprett
Copy link
Member

pprett commented Jul 14, 2015

AFAIK this is sometimes used in insurance (e.g. pricing models, e.g. age of a car should have a monotonically increasing effect on price).

if it doesnt entail any performance penalty I think its a great addition

@galvare2
Copy link
Author

The nature of the code added to _tree.pyx doesn't lead me to believe that it would cause a slowdown, and I have not noticed a change in performance when testing the algorithm locally. If it would be productive, I could do a more official performance profile and report the results.

@amueller
Copy link
Member

I'm a bit sceptical about complicating the tree code too much for a niche case.

@jmschrei
Copy link
Member

jmschrei commented Aug 3, 2015

From purely a coding point of view, the actual changes to the code do not look too complicated at the _tree.pyx level, and I don't think it would cause a slowdown. However, I am both not positive if it causes a slowdown (not having run the code myself), and unsure how niche a case this is. Does anyone know if this is a common use case?

EDIT: It would be productive to give a profile. Please use some of the standard datasets, and show results building forests that scale in both depth and number of trees. I'm guessing you should get similar results, but it's always better to back up your claims with data.

@galvare2
Copy link
Author

Sorry for the delay, I'm currently working on profiling the code. I'm not sure how much detail is necessary. I was planning on plotting time vs depth and number of trees for different data sets. Is more detail required or is this enough?

@FedericoV
Copy link
Contributor

I can think of several useful use cases:

  • House price should never decrease with size (ceteris paribus)
  • Income should always increase with education

and so forth... I would like to see some examples where importing this kind of prior causes a strong improvement in prediction though. Can you show this on something like the house dataset?

@galvare2
Copy link
Author

Here I have some performance plots that show times to run on standard datasets with various values for n_estimators and max_depth. The three results being compared are the performance of my branch with a monotonicity vector used, my branch without a monotonicity vector used, and the master branch. The last two should be nearly identical. All performances are averaged over 10 runs and include one-sigma error bars. The results are that my branch with monotonicity enabled takes up to around 2-3x longer than with monotonicity disabled, with the larger performance differences only becoming apparent for high values of n_estimators. Using my branch but not passing in a monotonicity vector does not incur a significant performance difference for any parameter values.
depth_boston
depth_diabetes
depth_iris
n_estimators_boston
n_estimators_diabetes
n_estimators_iris

@FedericoV
Copy link
Contributor

Maybe I'm missing something obvious, but did you plot accuracy/RMS anywhere?

On Tue, 1 Sep 2015 at 01:38 galvare2 notifications@github.com wrote:

Here I have some performance plots that show times to run on standard
datasets with various values for n_estimators and max_depth. The three
results being compared are the performance of my branch with a monotonicity
vector used, my branch without a monotonicity vector used, and the master
branch. The last two should be nearly identical. All performances are
averaged over 10 runs and include one-sigma error bars. The results are
that my branch with monotonicity enabled takes up to around 2-3x longer
than with monotonicity disabled, with the larger performance differences
only becoming apparent for high values of n_estimators. Using my branch but
not passing in a monotonicity vector does not incur a significant
performance difference for any parameter values.
depth_boston
https://cloud.githubusercontent.com/assets/5194963/9592958/cc04fe60-4ffd-11e5-8794-cc1d9890d057.png
depth_diabetes
https://cloud.githubusercontent.com/assets/5194963/9592959/cd8b3344-4ffd-11e5-9dd9-f9f586a2b4bf.png
depth_iris
https://cloud.githubusercontent.com/assets/5194963/9592961/cfc7e9cc-4ffd-11e5-8d69-0e6e0b8dd6dc.png
n_estimators_boston
https://cloud.githubusercontent.com/assets/5194963/9592962/d0da8fe0-4ffd-11e5-8189-59e6aa8fe893.png
n_estimators_diabetes
https://cloud.githubusercontent.com/assets/5194963/9592963/d21901f2-4ffd-11e5-8ff0-363e3ff259aa.png
n_estimators_iris
https://cloud.githubusercontent.com/assets/5194963/9592964/d300cd16-4ffd-11e5-8f50-85a97d0863e9.png


Reply to this email directly or view it on GitHub
#4950 (comment)
.

@galvare2
Copy link
Author

The purpose of my analysis here was just to report time slowdown rather than accuracy. I was using random monotonicity vectors to train the models, so I wouldn't expect the accuracy to yield any meaningful information. I will construct an example in which monotonicity actually does meaningfully improve accuracy and report that in the future.

@ogrisel
Copy link
Member

ogrisel commented Sep 1, 2015

The appveyor failure is caused by an old appveyor.yml setup that does not match the current state of the appveyor infrastructure. Please rebase this PR on top of the current master to fix it.

@ogrisel
Copy link
Member

ogrisel commented Sep 1, 2015

If I understand correctly, the gist of this contribution is similar to using spline smoothing as done in http://nerds.airbnb.com/aerosolve/ by AirBNB as explained in the second part of this video:

https://www.youtube.com/watch?v=Cwwu_AVh9n0

@amueller
Copy link
Member

amueller commented Sep 8, 2015

It looks like it is doesn't add much time complexity. I'm not so sure about code complexity. @glouppe @arjoly do you have opinions?

@shoelsch
Copy link

shoelsch commented Feb 2, 2016

What's the status on this PR? I'm very interested in enforcing monotonicity. @galvare2 @amueller @glouppe @arjoly

@ogrisel
Copy link
Member

ogrisel commented Jun 26, 2019

I someone is interested I think it would be relevant to add this feature to the fast Histogram-based GBRT introduced in scikit-learn 0.21 instead.

Base automatically changed from master to main January 22, 2021 10:48
@cmarmo cmarmo added the Needs Decision Requires decision label Feb 14, 2022
@jjerphan
Copy link
Member

jjerphan commented Mar 9, 2022

It seems that part of the changes proposed in PRs have respectively been super-seeded by #15582 and #13649.

@jjerphan
Copy link
Member

jjerphan commented Aug 3, 2022

Closing this PR in favour of #15582 and #13649 which have superseded this PR.

@jjerphan jjerphan closed this Aug 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.