-
-
Notifications
You must be signed in to change notification settings - Fork 26k
[MRG] DOC add examples to GradientBoostingClassifier and GradientBoostingRegressor #15151
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG] DOC add examples to GradientBoostingClassifier and GradientBoostingRegressor #15151
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR @pspachtholz .
I think we usually have much simpler examples for the docstring examples (see e.g. the other GBDT examples). Yours looks like something we would generally have in the user guide.
>>> X, y = make_classification(n_samples=20, random_state=22) | ||
>>> X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=5, | ||
... random_state=22) | ||
>>> clf = GradientBoostingClassifier(n_estimators=2, max_depth=1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just use the default n_estimators and max_depth? 2 and 1 aren't good values in general
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I chose these values mainly to demonstrate warm starting and staged prediction. With more estimators the output of staged_predict would have been very confusing.
If we keep the example simpler (which I will do based on your suggestion) then of course these values make no good sense.
By the way is there an example about warm starting in general? I found this but its quite hidden in there. So if not it may be worth to add one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is one in the UG but it's quite hard to find I would agree https://scikit-learn.org/dev/modules/ensemble.html#fitting-additional-weak-learners
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some comments but looks good
GradientBoostingClassifier() | ||
>>> clf.predict(X_test) | ||
array([...]) | ||
>>> clf.score(X_test, y_test) # doctest: +SKIP |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why skip doctest? maybe pass a random seed to train_test_split for the scores to be deterministic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thank you for your feedback. made both examples deterministic.
@NicolasHug Tests are failing only on pylatest. Do you have an idea why output score is non-deterministic there despite seeting all random_states, while for the other environments it is? |
I'm quite puzzled here :/ @thomasjpfan you're the CI king now, any idea about the randomness? (Results are consistently the same on my laptop) |
@adrinjalali this is still open there is also #15197, also for GradientBoostingRegressor, which was closed because this one was still open. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Reference Issues/PRs
towards #3846
What does this implement/fix? Explain your changes.
Adds examples to GradientBoostingClassifier and GradientBoostingRegressor, also demonstrating
warm starting.