-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
ENH add support for sample_weight in KBinsDiscretizer(strategy="quantile") #24935
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
… use of already existing functions.
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
…acing of conditions to avoid repetitions
… statement and adding a test case for the exception raise on this same check
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Merge remote-tracking branch 'upstream/main' into deas-kbins
Merge remote-tracking branch 'upstream/main' into deas-kbins
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should come up with a good test for the sample_weight
case.
Somehow, I am thinking that we could come up with an array where we could decide easily the quantile and add some extreme values that could have an impact and pass a corresponding null weight to those samples. We will check in this case that zero-weight will be taken into account in the computation of the quantile.
@@ -233,9 +245,28 @@ def fit(self, X, y=None): | |||
'`subsample` must be used with `strategy="quantile"`.' | |||
) | |||
|
|||
elif self.strategy != "quantile" and sample_weight is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems more logical in this order.
elif self.strategy != "quantile" and sample_weight is not None: | |
elif sample_weight is not None and self.strategy != "quantile": |
@@ -35,6 +50,20 @@ def test_valid_n_bins(): | |||
assert KBinsDiscretizer(n_bins=2).fit(X).n_bins_.dtype == np.dtype(int) | |||
|
|||
|
|||
def test_invalid_sample_weight(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def test_invalid_sample_weight(): | |
@pytest.mark.parametrize("strategy", ["uniform", "kmeans"]) | |
def test_kbinsdiscretizer_wrong_strategy_with_weights(strategy): |
strategy = ["uniform", "kmeans"] | ||
sample_weight = [1, 1, 1, 1] | ||
for s in strategy: | ||
est = KBinsDiscretizer(n_bins=3, strategy=s) | ||
err_msg = ( | ||
"`sample_weight` was provided but it can be only used with" | ||
f"strategy='quantile'. Got strategy={s!r} instead." | ||
) | ||
with pytest.raises(ValueError, match=err_msg): | ||
est.fit_transform(X, sample_weight=sample_weight) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
strategy = ["uniform", "kmeans"] | |
sample_weight = [1, 1, 1, 1] | |
for s in strategy: | |
est = KBinsDiscretizer(n_bins=3, strategy=s) | |
err_msg = ( | |
"`sample_weight` was provided but it can be only used with" | |
f"strategy='quantile'. Got strategy={s!r} instead." | |
) | |
with pytest.raises(ValueError, match=err_msg): | |
est.fit_transform(X, sample_weight=sample_weight) | |
sample_weight = np.ones(size=(X.shape[0])) | |
estimator = KBinsDiscretizer(n_bins=3, strategy=strategy) | |
err_msg = ( | |
"`sample_weight` was provided but it can be only used with" | |
"strategy='quantile'." | |
) | |
with pytest.raises(ValueError, match=err_msg): | |
est.fit(X, sample_weight=sample_weight) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On line 55 sample_weight = np.ones(size=(X.shape[0]))
:
X is a list, so we can't use the attribute shape.
Also np.ones
has a parameter shape
, not size
.
Maybe then sample_weight = np.ones(shape=len(X))
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it is indeed the first line of my suggestion :)
Merge remote-tracking branch 'upstream/main' into deas-kbins
Merge remote-tracking branch 'upstream/main' into deas-kbins
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apart from these small changes. LGTM.
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Merge remote-tracking branch 'upstream/main' into deas-kbins
I did the last changes @glemaitre... Do I need to let somebody know or should I keep waiting for a second reviewer? I was thinking that I just should wait, but I'm not sure. Thank you. |
We have to wait for a second reviewer, I will add a tag to inform other core developers that we request a second review. Thanks, @DeaMariaLeon for the work. |
I should have asked earlier. I was thinking that magically someone was going to see it. 🤓 Thank you @glemaitre for all the teaching. :) |
Indeed, I personally follow-up PR based on email notification that happens only when commenting in PRs (or pushing in it). So if you want to get my attention, you need to explicitly write a comment :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few nitpicks and a formatting fix in the error message but other than that, LGTM!
"""Check that we raise an error message when the wrong strategy | ||
is used.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick PEP 257 style (first line should be a one-liner):
"""Check that we raise an error message when the wrong strategy | |
is used.""" | |
"""Check that we raise an error when the wrong strategy is used.""" |
sample_weight = np.ones(shape=(len(X))) | ||
est = KBinsDiscretizer(n_bins=3, strategy=strategy) | ||
err_msg = ( | ||
"`sample_weight` was provided but it can be only used withstrategy='quantile'." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"`sample_weight` was provided but it can be only used withstrategy='quantile'." | |
"`sample_weight` was provided but it can be only used with " | |
"strategy='quantile'." |
@@ -94,6 +146,27 @@ def test_fit_transform_n_bins_array(strategy, expected): | |||
assert bin_edges.shape == (n_bins + 1,) | |||
|
|||
|
|||
@pytest.mark.filterwarnings("ignore: Bins whose width are too small") | |||
def test_kbinsdiscretizer_effect_sample_weight(): | |||
"""Check that we take into account `sample_weight` when computing the quantiles.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"""Check that we take into account `sample_weight` when computing the quantiles.""" | |
"""Check the impact of `sample_weight` one computed quantiles.""" |
|
||
|
||
def test_kbinsdiscretizer_no_mutating_sample_weight(): | ||
"""Check to make sure that `sample_weight` is no changed in place.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"""Check to make sure that `sample_weight` is no changed in place.""" | |
"""Make sure that `sample_weight` is not changed in place.""" |
"`sample_weight` was provided but it can be only used with" | ||
f"strategy='quantile'. Got strategy={self.strategy!r} instead." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"`sample_weight` was provided but it can be only used with" | |
f"strategy='quantile'. Got strategy={self.strategy!r} instead." | |
"`sample_weight` was provided but it can only be " | |
f"used with strategy='quantile'. Got strategy=" | |
f"{self.strategy!r} instead." |
Merge remote-tracking branch 'upstream/main' into deas-kbins
Implemented latest corrections. It should be completed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we released 1.2 last week, we need to update the changelog for this feature. Otherwise, still LGTM.
doc/whats_new/v1.2.rst
Outdated
@@ -680,6 +680,13 @@ Changelog | |||
is now deprecated and will be removed in version 1.4. Use `sparse_output` instead. | |||
:pr:`24412` by :user:`Rushil Desai <rusdes>`. | |||
|
|||
- |Enhancement| Added support for `sample_weight` in :class:`preprocessing.KBinsDiscretizer`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We just released 1.2. We will need to move this section to the v1.3.rst
file
Contains weight values to be associated with each sample. | ||
Only possible when `strategy` is set to `"quantile"`. | ||
|
||
.. versionadded:: 1.2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.. versionadded:: 1.2 | |
.. versionadded:: 1.3 |
Merge remote-tracking branch 'upstream/main' into deas-kbins
Changed release news files. However... There is this note in the instructions for documentation contributors that I just saw: Also: "Before submitting your pull request check if your modifications have introduced new sphinx warnings and try to fix them." Thanks |
doc/whats_new/v1.3.rst
Outdated
- |Enhancement| Added support for `sample_weight` in :class:`preprocessing.KBinsDiscretizer`. | ||
This allows specifying the parameter `sample_weight` for each sample to be used while | ||
fitting. The option is only available when `strategy` is set to `quantile`. | ||
:pr:`24935` by :user:`Seladus <seladus>`, | ||
:user:`Guillaume Lemaitre <glemaitre>`, and | ||
:user:`Dea María Léon <deamarialeon>`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would be less than 80 characters. This would be fine.
- |Enhancement| Added support for `sample_weight` in :class:`preprocessing.KBinsDiscretizer`. | |
This allows specifying the parameter `sample_weight` for each sample to be used while | |
fitting. The option is only available when `strategy` is set to `quantile`. | |
:pr:`24935` by :user:`Seladus <seladus>`, | |
:user:`Guillaume Lemaitre <glemaitre>`, and | |
:user:`Dea María Léon <deamarialeon>`. | |
- |Enhancement| Added support for `sample_weight` in | |
:class:`preprocessing.KBinsDiscretizer`. This allows specifying the parameter | |
`sample_weight` for each sample to be used while fitting. The option is only | |
available when `strategy` is set to `quantile`. | |
:pr:`24935` by :user:`Seladus <seladus>`, :user:`Guillaume Lemaitre <glemaitre>`, and | |
:user:`Dea María Léon <deamarialeon>`. |
Done.. I think. |
All good. Thanks @DeaMariaLeon |
Thank you @glemaitre |
…quantile" (scikit-learn#24935) Co-authored-by: seladus <clement.blancovolle@insa-rouen.fr> Co-authored-by: Seladus <71873495+Seladus@users.noreply.github.com> Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
…quantile" (scikit-learn#24935) Co-authored-by: seladus <clement.blancovolle@insa-rouen.fr> Co-authored-by: Seladus <71873495+Seladus@users.noreply.github.com> Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Reference Issues/PRs
Fixes #22048 see also #20758
What does this implement/fix?
Adding support for sample_weight in KBinsDiscretizer(strategy="quantile"). That is to say : I added a parameter in the
fit
method of KBinsDiscretizer. This parameter,sample_weight
can be set to control the weights associated to each sample only when the strategy is set to"quantile"
Additional comments
PR 22048 opened and almost finished by Seladus