-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
[MRG] Take sample weights into account in partial dependence computation #13193
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG] Take sample weights into account in partial dependence computation #13193
Conversation
Can a test be added? |
I just added a few ! Should add some more ? |
Ping @NicolasHug as discussed during the sprint ! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fix looks correct to me.
Please add a whatsnew entry as bugfix.
Also, it'd be nice to have some kind of functional test. Something along the lines of your example in the original issue would be good? Fitting a linear regression on the PDPs should give an r-squared close to 1.
This is fine to merge before or after #12599 BTW, one of us will have to update its PR regarding the tests ;)
raise ValueError("left_sample_frac:%f, " | ||
"n_samples current: %d, " | ||
"n_samples left: %d" | ||
raise ValueError("left_sample_frac:%d, " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO this whole if
should be entirely removed. It's not tested anywhere as far as I can see, and this is not PDP-related (this is just tree related).
…into add-sample-weights-gbt-partial-dependency
tests still failing. Please add an entry to doc/whats_new/v0.21.rst |
Thanks for the review @NicolasHug -- I added a functional test based on the example built for the original issue. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, much cleaner. LGTM if tests pass
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Last comments but LGTM!
Anyone for a quick review? @thomasjpfan maybe? |
@samronsin Can you please merge master (or trigger the CI with an empty commit) so that checks go green so we can merge :)? |
…into add-sample-weights-gbt-partial-dependency
Merging since the failed test is completely unrelated ( Thanks @samronsin ! |
Thanks @NicolasHug -- and also @jnothman and @thomasjpfan -- for your help on this PR ! |
…n for gradient boosting (scikit-learn#13193) * Replace n_node_samples by weighted_n_node_samples in partial dependence computation * Add tests for both no-op and real sample weights * Improve naming and remove useless comment * Fix small test issues * Fix test for binary classification * Add test for regressions based on example from initial issue * Edit whats_new * 79 * Simplify test code for regression partial dependence * PEP8 * Facepalm * Refer to the public function in whats_new * Make the sample weight test standalone for further reuse * Fix PR number * Testing with L1 relative distance computed as averages * Testing element-wise * Fix and simplify unit test for binary classification * Clarify functional test
…n for gradient boosting (scikit-learn#13193) * Replace n_node_samples by weighted_n_node_samples in partial dependence computation * Add tests for both no-op and real sample weights * Improve naming and remove useless comment * Fix small test issues * Fix test for binary classification * Add test for regressions based on example from initial issue * Edit whats_new * 79 * Simplify test code for regression partial dependence * PEP8 * Facepalm * Refer to the public function in whats_new * Make the sample weight test standalone for further reuse * Fix PR number * Testing with L1 relative distance computed as averages * Testing element-wise * Fix and simplify unit test for binary classification * Clarify functional test
…mputation for gradient boosting (scikit-learn#13193)" This reverts commit 4350b4e.
…mputation for gradient boosting (scikit-learn#13193)" This reverts commit 4350b4e.
…n for gradient boosting (scikit-learn#13193) * Replace n_node_samples by weighted_n_node_samples in partial dependence computation * Add tests for both no-op and real sample weights * Improve naming and remove useless comment * Fix small test issues * Fix test for binary classification * Add test for regressions based on example from initial issue * Edit whats_new * 79 * Simplify test code for regression partial dependence * PEP8 * Facepalm * Refer to the public function in whats_new * Make the sample weight test standalone for further reuse * Fix PR number * Testing with L1 relative distance computed as averages * Testing element-wise * Fix and simplify unit test for binary classification * Clarify functional test
Reference Issues/PRs
Fixes #13192.
What does this implement/fix? Explain your changes.
This PR makes
partial_dependence
take sample weights into account by replacingn_node_samples
byweighted_n_node_samples
in partial dependence computation.