Skip to content

[MRG] DOC add examples to GradientBoostingClassifier and GradientBoostingRegressor #15151

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Feb 29, 2020

Conversation

pspachtholz
Copy link
Contributor

Reference Issues/PRs

towards #3846

What does this implement/fix? Explain your changes.

Adds examples to GradientBoostingClassifier and GradientBoostingRegressor, also demonstrating
warm starting.

Copy link
Member

@NicolasHug NicolasHug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @pspachtholz .

I think we usually have much simpler examples for the docstring examples (see e.g. the other GBDT examples). Yours looks like something we would generally have in the user guide.

>>> X, y = make_classification(n_samples=20, random_state=22)
>>> X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=5,
... random_state=22)
>>> clf = GradientBoostingClassifier(n_estimators=2, max_depth=1,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just use the default n_estimators and max_depth? 2 and 1 aren't good values in general

Copy link
Contributor Author

@pspachtholz pspachtholz Oct 8, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I chose these values mainly to demonstrate warm starting and staged prediction. With more estimators the output of staged_predict would have been very confusing.
If we keep the example simpler (which I will do based on your suggestion) then of course these values make no good sense.
By the way is there an example about warm starting in general? I found this but its quite hidden in there. So if not it may be worth to add one.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is one in the UG but it's quite hard to find I would agree https://scikit-learn.org/dev/modules/ensemble.html#fitting-additional-weak-learners

Copy link
Member

@NicolasHug NicolasHug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some comments but looks good

GradientBoostingClassifier()
>>> clf.predict(X_test)
array([...])
>>> clf.score(X_test, y_test) # doctest: +SKIP
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why skip doctest? maybe pass a random seed to train_test_split for the scores to be deterministic

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you for your feedback. made both examples deterministic.

@pspachtholz
Copy link
Contributor Author

pspachtholz commented Oct 13, 2019

@NicolasHug Tests are failing only on pylatest. Do you have an idea why output score is non-deterministic there despite seeting all random_states, while for the other environments it is?

@NicolasHug
Copy link
Member

Tests are failing only on pylatest. Do you have an idea why output score is non-deterministic there despite seeting all random_states, while for the other environments it is?

I'm quite puzzled here :/

@thomasjpfan you're the CI king now, any idea about the randomness?

(Results are consistently the same on my laptop)

@thomasjpfan thomasjpfan self-assigned this Oct 16, 2019
@Malesche
Copy link
Contributor

@adrinjalali this is still open

there is also #15197, also for GradientBoostingRegressor, which was closed because this one was still open.

Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@thomasjpfan thomasjpfan merged commit d86f8fd into scikit-learn:master Feb 29, 2020
panpiort8 pushed a commit to panpiort8/scikit-learn that referenced this pull request Mar 3, 2020
gio8tisu pushed a commit to gio8tisu/scikit-learn that referenced this pull request May 15, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants