Skip to content

[MRG] Take sample weights into account in partial dependence computation #13193

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

samronsin
Copy link
Contributor

Reference Issues/PRs

Fixes #13192.

What does this implement/fix? Explain your changes.

This PR makes partial_dependence take sample weights into account by replacing n_node_samples by weighted_n_node_samples in partial dependence computation.

@jnothman
Copy link
Member

Can a test be added?

@samronsin
Copy link
Contributor Author

I just added a few ! Should add some more ?

@samronsin samronsin changed the title Take sample weights into account in partial dependence computation [MRG] Take sample weights into account in partial dependence computation Feb 28, 2019
@samronsin
Copy link
Contributor Author

Ping @NicolasHug as discussed during the sprint !

Copy link
Member

@NicolasHug NicolasHug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fix looks correct to me.

Please add a whatsnew entry as bugfix.

Also, it'd be nice to have some kind of functional test. Something along the lines of your example in the original issue would be good? Fitting a linear regression on the PDPs should give an r-squared close to 1.

This is fine to merge before or after #12599 BTW, one of us will have to update its PR regarding the tests ;)

raise ValueError("left_sample_frac:%f, "
"n_samples current: %d, "
"n_samples left: %d"
raise ValueError("left_sample_frac:%d, "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO this whole if should be entirely removed. It's not tested anywhere as far as I can see, and this is not PDP-related (this is just tree related).

@jnothman
Copy link
Member

tests still failing. Please add an entry to doc/whats_new/v0.21.rst

@samronsin
Copy link
Contributor Author

Thanks for the review @NicolasHug -- I added a functional test based on the example built for the original issue.
@jnothman tests should be fixed now !

Copy link
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, much cleaner. LGTM if tests pass

Copy link
Member

@NicolasHug NicolasHug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Last comments but LGTM!

@NicolasHug
Copy link
Member

Anyone for a quick review? @thomasjpfan maybe?

@NicolasHug
Copy link
Member

@samronsin Can you please merge master (or trigger the CI with an empty commit) so that checks go green so we can merge :)?

…into add-sample-weights-gbt-partial-dependency
@NicolasHug NicolasHug merged commit d0747ea into scikit-learn:master Apr 5, 2019
@NicolasHug
Copy link
Member

Merging since the failed test is completely unrelated (mlp.test_gradient, likely due to random state not being set).

Thanks @samronsin !

@samronsin
Copy link
Contributor Author

Thanks @NicolasHug -- and also @jnothman and @thomasjpfan -- for your help on this PR !

jeremiedbb pushed a commit to jeremiedbb/scikit-learn that referenced this pull request Apr 25, 2019
…n for gradient boosting (scikit-learn#13193)

* Replace n_node_samples by weighted_n_node_samples in partial dependence computation

* Add tests for both no-op and real sample weights

* Improve naming and remove useless comment

* Fix small test issues

* Fix test for binary classification

* Add test for regressions based on example from initial issue

* Edit whats_new

* 79

* Simplify test code for regression partial dependence

* PEP8

* Facepalm

* Refer to the public function in whats_new

* Make the sample weight test standalone for further reuse

* Fix PR number

* Testing with L1 relative distance computed as averages

* Testing element-wise

* Fix and simplify unit test for binary classification

* Clarify functional test
xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019
…n for gradient boosting (scikit-learn#13193)

* Replace n_node_samples by weighted_n_node_samples in partial dependence computation

* Add tests for both no-op and real sample weights

* Improve naming and remove useless comment

* Fix small test issues

* Fix test for binary classification

* Add test for regressions based on example from initial issue

* Edit whats_new

* 79

* Simplify test code for regression partial dependence

* PEP8

* Facepalm

* Refer to the public function in whats_new

* Make the sample weight test standalone for further reuse

* Fix PR number

* Testing with L1 relative distance computed as averages

* Testing element-wise

* Fix and simplify unit test for binary classification

* Clarify functional test
xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019
xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019
koenvandevelde pushed a commit to koenvandevelde/scikit-learn that referenced this pull request Jul 12, 2019
…n for gradient boosting (scikit-learn#13193)

* Replace n_node_samples by weighted_n_node_samples in partial dependence computation

* Add tests for both no-op and real sample weights

* Improve naming and remove useless comment

* Fix small test issues

* Fix test for binary classification

* Add test for regressions based on example from initial issue

* Edit whats_new

* 79

* Simplify test code for regression partial dependence

* PEP8

* Facepalm

* Refer to the public function in whats_new

* Make the sample weight test standalone for further reuse

* Fix PR number

* Testing with L1 relative distance computed as averages

* Testing element-wise

* Fix and simplify unit test for binary classification

* Clarify functional test
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

partial_dependence ignores sample weights
4 participants