-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
Add monotonicity parameter to Gradient Boosting and Decision Trees #4950
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
can you clarify what is it useful for?
also a parameter that does not depend on n_samples should be in the init
and not the fit.
|
Do we want that? That seems like it would be applicable to all tree and linear models. |
Constraining monotonicity can be a helpful tool for for preventing overfitting by incorporating prior information or assumptions about a dataset and its features. My team uses this constraint to prevent overfitting of quirks in our somewhat noisy training data. One significant example: a particular feature has values that are generally correlated well with the label, but there is a class of examples with low labels but high values of this feature. Without a monotonicity constraint, the algorithm puts a strong negative weight on a small range of feature values, but high weights on either side; see the attached example. Because of the nature of this feature, with the monotonicity constraint, we learn a smoother curve for the feature that generalizes better to new data. |
I've added a commit that incorporates the feedback to move monotonicity from fit to init. Let me know what else I should do. |
AFAIK this is sometimes used in insurance (e.g. pricing models, e.g. age of a car should have a monotonically increasing effect on price). if it doesnt entail any performance penalty I think its a great addition |
The nature of the code added to _tree.pyx doesn't lead me to believe that it would cause a slowdown, and I have not noticed a change in performance when testing the algorithm locally. If it would be productive, I could do a more official performance profile and report the results. |
I'm a bit sceptical about complicating the tree code too much for a niche case. |
From purely a coding point of view, the actual changes to the code do not look too complicated at the _tree.pyx level, and I don't EDIT: It would be productive to give a profile. Please use some of the standard datasets, and show results building forests that scale in both depth and number of trees. I'm guessing you should get similar results, but it's always better to back up your claims with data. |
Sorry for the delay, I'm currently working on profiling the code. I'm not sure how much detail is necessary. I was planning on plotting time vs depth and number of trees for different data sets. Is more detail required or is this enough? |
I can think of several useful use cases:
and so forth... I would like to see some examples where importing this kind of prior causes a strong improvement in prediction though. Can you show this on something like the house dataset? |
Here I have some performance plots that show times to run on standard datasets with various values for n_estimators and max_depth. The three results being compared are the performance of my branch with a monotonicity vector used, my branch without a monotonicity vector used, and the master branch. The last two should be nearly identical. All performances are averaged over 10 runs and include one-sigma error bars. The results are that my branch with monotonicity enabled takes up to around 2-3x longer than with monotonicity disabled, with the larger performance differences only becoming apparent for high values of n_estimators. Using my branch but not passing in a monotonicity vector does not incur a significant performance difference for any parameter values. |
Maybe I'm missing something obvious, but did you plot accuracy/RMS anywhere? On Tue, 1 Sep 2015 at 01:38 galvare2 notifications@github.com wrote:
|
The purpose of my analysis here was just to report time slowdown rather than accuracy. I was using random monotonicity vectors to train the models, so I wouldn't expect the accuracy to yield any meaningful information. I will construct an example in which monotonicity actually does meaningfully improve accuracy and report that in the future. |
The appveyor failure is caused by an old |
If I understand correctly, the gist of this contribution is similar to using spline smoothing as done in http://nerds.airbnb.com/aerosolve/ by AirBNB as explained in the second part of this video: |
ae699e5
to
550fd38
Compare
I someone is interested I think it would be relevant to add this feature to the fast Histogram-based GBRT introduced in scikit-learn 0.21 instead. |
This PR adds a new optional parameter "monotonicity" to the "fit" method of Gradient Boosting models and of Decision Tree models, allowing the caller to pass in an array of length [n_features] specifying the desired monotonicity of the predicted output with respect to each input feature. When a monotonicity is specified, the algorithm only constructs decision trees that obey the desired monotonicity -- possible values are -1 (output must be decreasing), 1 (output must be increasing), and 0 (no constraint for this feature).
This functionality is modeled after the package "GBM" in R, which allows for a monotonicity parameter, and is implemented similarly.