Fix spurious warning from type_of_target when called on estimator.classes_ #31584

saskra · 2025-06-18T13:08:45Z

Reference Issues/PRs

What does this implement/fix? Explain your changes.

This PR suppresses an unintended warning in get_response_values, where type_of_target is called on estimator.classes_. Since classes_ does not represent full sample-level data, this call may spuriously trigger the warning:

"The number of unique classes is greater than 50% of the number of samples."

This is now avoided by passing suppress_warning=True to type_of_target() at this specific location.

This patch is intentionally minimal and does not affect calls to type_of_target that operate on actual sample labels (y, y_true, etc.).

Any other comments?

This was first observed while calibrating classifiers with many classes. Although the dataset was large and well-balanced, the warning appeared due to how classes_ was passed into type_of_target.

Apologies in advance if this is already known or intentional – this is my first contribution here, and I appreciate any feedback or corrections.

Thanks for your time and for maintaining this great library!

…sses_

github-actions · 2025-06-18T13:09:46Z

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

_{Generated for commit: 421728c. Link to the linter CI: here}

jeremiedbb

Thanks for the PR @saskra. Please add a test for _get_response_values in test_response.py to check that no warning is raised now.

sklearn/utils/multiclass.py

saskra · 2025-07-02T12:31:53Z

Thanks for the PR @saskra. Please add a test for _get_response_values in test_response.py to check that no warning is raised now.

Like this?

def test_response_values_warn_bug_type_of_target_on_classes_issue():
    """
    Ensure no warning is raised due to incorrect call to `type_of_target(classes_)`
    when using `CalibratedClassifierCV` with many classes.

    This test verifies that using `classes_` instead of `y` in `_get_response_values`
    does not incorrectly trigger the "unique classes > 50% of samples" warning.
    """
    n_samples = 1000
    n_features = 40
    n_classes = 30  # Well below 50% of 1000 samples

    rng = np.random.RandomState(42)
    X = rng.rand(n_samples, n_features)
    y = np.tile(np.arange(n_classes), int(np.ceil(n_samples / n_classes)))[:n_samples]

    base_clf = RandomForestClassifier(n_estimators=10, random_state=0)
    clf = CalibratedClassifierCV(base_clf, method="isotonic", cv=2)
    clf.fit(X, y)

    with warnings.catch_warnings(record=True) as w:
        warnings.simplefilter("always")

        _get_response_values(clf, X, response_method="predict_proba")

        warning_messages = [str(warning.message) for warning in w]

    # This warning was raised due to incorrect handling of classes_
    unexpected = [
        msg
        for msg in warning_messages
        if "number of unique classes is greater than 50%" in msg
    ]

    assert not unexpected, f"Unexpected warning: {unexpected}"

jeremiedbb · 2025-07-02T13:27:20Z

Yes but I think that we can make it a bit simpler

def test_response_values_type_of_target_on_classes_no_warning():
    """
    Ensure that _get_response_values doesn't raise the "unique classes > 50% of samples"
    warning when calling `type_of_target(classes_)`.

    non-regression test for issue #31583.
    """
    X = np.random.RandomState(0).randn(120, 3)
    # 30 classes, less than 50% of number of samples
    y = np.repeat(np.arange(30), 4)

    clf = LogisticRegression().fit(X, y)

    with warnings.catch_warnings():
        warnings.simplefilter("error", UserWarning)

        _get_response_values(clf, X, response_method="predict_proba")

…lues_type_of_target_on_classes_no_warning; do not add suppress_warning

jeremiedbb · 2025-07-07T13:28:13Z

Thanks @saskra, looks good !
I just directly pushed a small fix because np.concat didn't exist in all numpy versions that we support.
Please add a changelog entry.

jeremiedbb

LGTM

jeremiedbb · 2025-07-10T14:43:12Z

ping @lucyleeow or @StefanieSenger for a second review maybe ?

lucyleeow

Thanks @saskra ! Nitpicks only

lucyleeow · 2025-07-11T01:41:23Z

doc/whats_new/upcoming_changes/sklearn.utils/31584.fix.rst

@@ -0,0 +1,3 @@
+- Fixed a spurious warning in :func:`utils.multiclass.type_of_target` that could be triggered


Let's be specific about which warning we are talking about.

I also wonder if it is worth detailing all public functions this will affect, since whats new is aimed for users, and type_of_target may be less meaningful to a user than e.g., GridSearchCV will no longer give spurious "The number of unique classes is greater than 50% of the number of samples" warning when the number of samples is...etc

How do you find out all the public functions that are affected?

I don't have a clever solution outside of searching in the codebase. Maybe a regular expression search for when classes is passed to type_of_target ? Or something that is not y_* ?
Failing that, just a list of classes/functions that use _get_response_values would be nice.

Like this?

- Fixed a spurious warning that could occur when passing ``estimator.classes_`` or similar arrays with many unique values to classification utilities. The warning came from :func:`sklearn.utils.multiclass.type_of_target` and has now been suppressed when the input is not a true target vector. The warning message was: "The number of unique classes is greater than 50% of the number of samples." This could appear in tools that internally validate classification outputs, such as :class:`~sklearn.model_selection.GridSearchCV`, :func:`~sklearn.model_selection.cross_val_score`, :func:`~sklearn.metrics.make_scorer`, :class:`~sklearn.multioutput.MultiOutputClassifier`, and :class:`~sklearn.calibration.CalibratedClassifierCV`. By :user:`Sascha D. Krauss <saskra>`

Given the large amount of potentially affect public classes and functions, I'm not sure it's worth trying to list them all. It was just an unexpected warning, not a critical bug. Since type_of_target is a public function I think it's fine for the changelog to only be about it.

Fair point, what about something like:

- Fixed a spurious warning (about the number of unique classes being greater than 50% of the number of samples) that could occur when passing `classes` :func:`utils.multiclass.type_of_target`.

Note you don't need the sklearn at the start in whats new entries.

lucyleeow · 2025-07-11T01:45:27Z

sklearn/utils/tests/test_multiclass.py

@@ -302,7 +302,7 @@ def test_type_of_target_too_many_unique_classes():
    We need to check that we don't raise if we have less than 20 samples.
    """

-    y = np.arange(25)
+    y = np.hstack((np.arange(20), [0]))


Could we add a note here to explain why we add the '0' at the end?

And maybe we could keep the original y = np.arange(25) and check that this does not raise a warning?

However, the original y = np.arange(25) now raises a warning because the condition y.shape[0] > classes.shape[0] is not fulfilled, but y.shape[0] == classes.shape[0]. Hence the second zero in the new array in the test, so that you at least have a single class with two samples. The whole point was to prevent the warning from being issued incorrectly if the function is only called with the unique class list instead of the real list of class labels per sample - hence the original idea was also an argument to suppress the warning for this case.

So should we rather adapt the test or treat the case y.shape[0] == classes.shape[0] in type_of_target differently?

Could we add a note here to explain why we add the '0' at the end?

+1
Let's add a comment to explain that we create an array with almost all classes represented only once except 0 represented twice.

And maybe we could keep the original y = np.arange(25) and check that this does not raise a warning?

However, the original y = np.arange(25) now raises a warning because the condition y.shape[0] > classes.shape[0] is not fulfilled, but y.shape[0] == classes.shape[0]

@saskra I think @lucyleeow is correct. Now the np.arange(25) case won't raise the warning because we special cased y.shape[0] == classes.shape[0] so we can check that indeed no warning is raised. Something like the following that we'd put at the end of the test.

# More than 20 samples but only unique classes, no warning should be raised y = np.arange(25) with warnings.catch_warnings(): warnings.simplefilter("ignore", UserWarning) type_of_target(y)

sklearn/utils/tests/test_response.py

Co-authored-by: Lucy Liu <jliu176@gmail.com>

…ique_classes plus explanatory comments

lucyleeow · 2025-07-11T09:35:45Z

Also #31584 (comment) should help you fix lint issues

jeremiedbb · 2025-07-11T09:37:07Z

@saskra something very strange happened to the test_multiclass.py file. Looks like you changed your default tab/spacing in you ide or something 😄

saskra · 2025-07-11T09:44:40Z

@saskra something very strange happened to the test_multiclass.py file. Looks like you changed your default tab/spacing in you ide or something 😄

PyCharm has been doing a lot of weird standalone things since the developers started focussing on AI only. Is it fixed now?

doc/whats_new/upcoming_changes/sklearn.utils/31584.fix.rst

jeremiedbb · 2025-07-11T10:40:21Z

sklearn/utils/tests/test_multiclass.py

@@ -247,7 +247,6 @@ def _generate_sparse(
    ],
 }

-


please revert unrelated line removal

Co-authored-by: Jérémie du Boisberranger <jeremie@probabl.ai>

jeremiedbb

LGTM. Thanks @saskra

lucyleeow

Just a nit about comments

sklearn/utils/tests/test_multiclass.py

Co-authored-by: Lucy Liu <jliu176@gmail.com>

lucyleeow

Thanks for your patience!

…sses_ (scikit-learn#31584) Co-authored-by: Jérémie du Boisberranger <jeremie@probabl.ai> Co-authored-by: Lucy Liu <jliu176@gmail.com>

Fix spurious warning from type_of_target when called on estimator.cla…

4f5483f

…sses_

github-actions bot added the module:utils label Jun 18, 2025

Fix line length for E501 and ensure formatting with ruff

0b5cfc5

jeremiedbb reviewed Jul 1, 2025

View reviewed changes

sklearn/utils/multiclass.py Outdated Show resolved Hide resolved

Fix test_type_of_target_too_many_unique_classes; add test_response_va…

eb90ddd

…lues_type_of_target_on_classes_no_warning; do not add suppress_warning

saskra force-pushed the fix-type-of-target-warning branch from fdb8624 to eb90ddd Compare July 2, 2025 13:39

use hstack instead

a586649

jeremiedbb added this to the 1.7.1 milestone Jul 7, 2025

Add changelog entry

46bd286

jeremiedbb approved these changes Jul 7, 2025

View reviewed changes

lucyleeow reviewed Jul 11, 2025

View reviewed changes

saskra and others added 2 commits July 11, 2025 08:36

Update sklearn/utils/tests/test_response.py

0155642

Co-authored-by: Lucy Liu <jliu176@gmail.com>

Longer changelog entry; third test in test_type_of_target_too_many_un…

cd60bf5

…ique_classes plus explanatory comments

saskra added 2 commits July 11, 2025 11:37

Make changelog shorter again

9302e60

Ruff

d4736e9

jeremiedbb approved these changes Jul 11, 2025

View reviewed changes

saskra and others added 2 commits July 11, 2025 12:44

Update doc/whats_new/upcoming_changes/sklearn.utils/31584.fix.rst

fef3f26

Co-authored-by: Jérémie du Boisberranger <jeremie@probabl.ai>

Reverses unintentional removal of a line

7022e82

jeremiedbb added the To backport PR merged in master that need a backport to a release branch defined based on the milestone. label Jul 11, 2025

jeremiedbb approved these changes Jul 11, 2025

View reviewed changes

lucyleeow reviewed Jul 12, 2025

View reviewed changes

sklearn/utils/tests/test_multiclass.py Outdated Show resolved Hide resolved

sklearn/utils/tests/test_multiclass.py Outdated Show resolved Hide resolved

Update sklearn/utils/tests/test_multiclass.py

5dade50

Co-authored-by: Lucy Liu <jliu176@gmail.com>

Update sklearn/utils/tests/test_multiclass.py

421728c

Co-authored-by: Lucy Liu <jliu176@gmail.com>

lucyleeow approved these changes Jul 14, 2025

View reviewed changes

lucyleeow merged commit 9b7a86f into scikit-learn:main Jul 14, 2025
36 checks passed

saskra deleted the fix-type-of-target-warning branch July 14, 2025 08:47

StefanieSenger mentioned this pull request Jul 14, 2025

type_of_target misclassifies count/ordinal regression targets as multiclass #31752

Closed

jeremiedbb mentioned this pull request Jul 15, 2025

Release 1.7.1 #31762

Merged

13 tasks

		@@ -0,0 +1,3 @@
		- Fixed a spurious warning in :func:`utils.multiclass.type_of_target` that could be triggered

Uh oh!

Fix spurious warning from type_of_target when called on estimator.classes_ #31584

Fix spurious warning from type_of_target when called on estimator.classes_ #31584

Uh oh!

Conversation

saskra commented Jun 18, 2025

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

github-actions bot commented Jun 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✔️ Linting Passed

Uh oh!

jeremiedbb left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

saskra commented Jul 2, 2025

Uh oh!

jeremiedbb commented Jul 2, 2025

Uh oh!

jeremiedbb commented Jul 7, 2025

Uh oh!

jeremiedbb left a comment

Choose a reason for hiding this comment

Uh oh!

jeremiedbb commented Jul 10, 2025

Uh oh!

lucyleeow left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

saskra Jul 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

saskra Jul 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

lucyleeow commented Jul 11, 2025

Uh oh!

jeremiedbb commented Jul 11, 2025

Uh oh!

saskra commented Jul 11, 2025

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jeremiedbb left a comment

Choose a reason for hiding this comment

Uh oh!

lucyleeow left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

lucyleeow left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions bot commented Jun 18, 2025 •

edited

Loading

saskra Jul 11, 2025 •

edited

Loading

saskra Jul 11, 2025 •

edited

Loading