-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
ENH FEA add interaction constraints to HGBT #21020
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH FEA add interaction constraints to HGBT #21020
Conversation
c09bdd9
to
d9b273a
Compare
@thomasjpfan I have two questions right now:
|
Now, I could use some help as I'm stuck with the following error, which only happens for
I don't understand where any of (@thomasjpfan You said I may ping you:blush:) |
I don't understand how a build-time error can be impacted by a runtime value ( |
Me neither. There is nothing I understand with this error. Running
Note the last two lines and the fact that Note that this error also appears in |
015b30e
to
eb1e255
Compare
@thomasjpfan @ogrisel Here comes a detailed bug report based on commit 3ea3829.
System: Python dependencies: Built with OpenMP: True |
@thomasjpfan Is there anything I can do (help, support, trade) to get a LGTM from you? You gave the most thorough review, and I think all of your comments are addressed, no loose ends. |
# %% | ||
# All 4 plots show parallel lines meaning there is no interaction in the model. | ||
# (Note that to see the same with a | ||
# :class:`~sklearn.ensemble.HistGradientBoostingClassifier`, we would need to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think partial dependence plots for HistGradientBoostingClassifier
already plots in the link space:
from sklearn.ensemble import HistGradientBoostingClassifier
from sklearn.datasets import make_classification
from sklearn.inspection import PartialDependenceDisplay
X, y = make_classification(random_state=0)
clf = HistGradientBoostingClassifier().fit(X, y)
_ = PartialDependenceDisplay.from_estimator(clf, X, [0])
My preference is to remove the statement, so the narrative flows more naturally to the 2D-plot and focusing on the current problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right. The statement would be valid for HistGradientBoostingRegressor(loss="poisson")
, but I'll remove it from here. This subject is already discussed in #18309.
sklearn/ensemble/_hist_gradient_boosting/tests/test_gradient_boosting.py
Outdated
Show resolved
Hide resolved
sklearn/ensemble/_hist_gradient_boosting/tests/test_gradient_boosting.py
Outdated
Show resolved
Hide resolved
sklearn/ensemble/_hist_gradient_boosting/tests/test_gradient_boosting.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My (hopefully) penultimate review.
I still need to go through _compute_interactions
and some tests.
Indices of the interaction sets that have to be applied on splits of | ||
child nodes. The fewer sets the stronger the constraint as fewer sets | ||
contain fewer features. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not understand the last sentence. One can have two sets containing half features each, right? In this case, the tree structure won't be as constrained as their features' interactions' being defined by a lot of singletons? Or is my understanding wrong?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It means the choice of possible features to split on is the smaller, the less entries interaction_cst_indices
has. "Stronger constraint" means having larger impact by only allowing very few features to split on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alles in 🧈! Thank you for your patience, @lorentzenchr.
Here are some last comments.
sklearn/ensemble/_hist_gradient_boosting/tests/test_splitting.py
Outdated
Show resolved
Hide resolved
sklearn/ensemble/_hist_gradient_boosting/tests/test_splitting.py
Outdated
Show resolved
Hide resolved
@jjerphan Thank you so much for your thorough review. Et maitenant, laisson faire notre 🧈 ? (Ça ce dit???)😄 |
Supposedly one of my last remarks: We could cite
In Chapter 2.2, they explain interaction constraints nicely with some diagrams of trees, and refer to the paper that originally proposed interaction constraints:
|
You're welcome. Thank you for proposing the call, this gave me a reason to go through your PR again.
I don't know about this phrase. If I had to pick one, I would say: « Emballé, c'est pesé ! 📪 » |
- add reference Machine Learning Applications to Land and Structure Valuation - add arxiv qualifier for G. Louppe
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@jjerphan @thomasjpfan Thank you so much for your time, thoughts and effort while reviewing. It improved this PR a lot. This PR is an important one for me because it enables a lot of really cool use cases of high relevance in practice. |
Co-authored-by: Loïc Estève <loic.esteve@ymail.com>
This is so awesome, thank you for all the hard work! |
Reference Issues/PRs
Fixes #19148.
What does this implement/fix? Explain your changes.
This PR introduces the argument
interaction_cst
to the classesHistGradientBoostingRegressor
andHistGradientBoostingClassifier
.It main impact is in
Splitter.find_node_split
.Any other comments?
Might be a good reason to wish for an early 1.1 release 😏
interaction_cst
hgbt(X + X[:, 0]) - hgbt(X)
with and without constraints, see ba78cb9