ENH Array API support for confusion_matrix #30562

StefanieSenger · 2024-12-30T09:49:34Z

Reference Issues/PRs

towards #26024
closes #30440 (supercedes)

This PR is an alternative, discussed in #30440. It accepts array inputs from all namespaces, converts the input arrays to numpy arrays right away to do the calculations in numpy (which is necessary for the coo_matrix at least) and returns the confusion_matrix in the same namespace as the input and on a cpu device.

That's what we had discussed. For more details see the discussions on both PRs.

github-actions · 2024-12-30T09:50:53Z

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

_{Generated for commit: 5b103e7. Link to the linter CI: here}

sklearn/utils/_array_api.py

StefanieSenger · 2024-12-30T11:31:42Z

sklearn/metrics/tests/test_classification.py

+        result = confusion_matrix(y_true, y_pred)
+        xp_result, _ = get_namespace(result)
+        assert _is_numpy_namespace(xp_result)
+
+        # Since the computation always happens with NumPy / SciPy on the CPU, this
+        # function is expected to return an array allocated on the CPU even when it does
+        # not match the input array's device.
+        assert result.device == "cpu"


I have adjusted this test to your suggestions from this comment, @ogrisel. But here, the test is narrower because we made the return value of confusion_matrix to always be a numpy array on cpu.

StefanieSenger · 2024-12-30T11:50:08Z

Regarding the question how to document the return type of confusion_matrix() as a numpy array, I think that keeping
C : ndarray of shape (n_classes, n_classes) in the docstring should be enough, assumed that all the other functions and methods where we have added array api support document the return value type correctly, which is currently not the case.

sklearn/utils/_array_api.py

virchan

I suspect the CI failure (about the Test Library) is a false positive, as I couldn't reproduce the same error after running the tests multiple times on my local machine. Instead, I encountered a different set of errors related to CUDA. I believe the issue might be linked to how we convert sample_weight into a NumPy array.

sklearn/metrics/_classification.py

OmarManzoor

Thank you for the updates @StefanieSenger

doc/modules/array_api.rst

sklearn/metrics/_classification.py

StefanieSenger

Thanks for your reviews, @OmarManzoor and @lesteve!
I have applied your suggestions and commented regarding the documentation.

Now, this PR looks very straightforward. :)

doc/modules/array_api.rst

sklearn/metrics/_classification.py

lesteve

LGTM, thanks!

ogrisel

LGTM beside the point below.

I am not sure how to best document which scikit-learn functions and classes intentionally rely on internal array API namespace conversions (https://github.com/scikit-learn/scikit-learn/pull/30562/files#r2207423623). I think it's interesting to do it but I agree we can discuss that in a follow-up issue or PR.

sklearn/metrics/_classification.py

Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>

OmarManzoor

LGTM. Thanks @StefanieSenger

OmarManzoor · 2025-08-01T09:58:29Z

There are some tests failing which may need to be checked

lesteve · 2025-08-01T11:08:04Z

Hmm probably some weird interaction with #31701?

…s [azure parallel]

lesteve · 2025-08-01T13:06:07Z

In d1b3439 we fixed the latest failures in the most straightforward (but maybe a bit hacky) way. We also added a test that would have failed in #31701 and would have surfaced the issue.

The tension comes from:

we want to be able to call confusion_matrix with some empty inputs BUG fix behaviour in confusion_matrix with with empty array-like as input #16442
MNT Add _check_sample_weights to classification metrics #31701 added more stringent checks for sample_weight, amongst which it calls check_array(sample_weights) which will fail if sample_weight is empty.

A maybe more clean approach would be to pass an additional argument ensure_min_samples in _check_targets that gets passed to _check_sample_weight and eventually to ensure that ensure_min_samples=ensure_min_samples is passed to the check_array(sample_weight) call. This seems a bit too much since it seems like confusion_matrix is a bit special in being able to handle empty inputs?

OmarManzoor · 2025-08-02T06:59:47Z

@lesteve Thanks for the fixes. I think the current changes look fine. However the approach that you have mentioned about adding in ensure_min_samples also sounds good. Wouldn't that approach be better considering that such a parameter is actually available in the check_array function?

ogrisel · 2025-08-04T08:20:26Z

I am also fine the current workaround implemented in this PR as it is well explained by the inline comment.

+1 for exploring the refactoring suggested by @lesteve in a follow-up PR.

OmarManzoor · 2025-08-04T08:22:02Z

Let's merge this then!

Co-authored-by: Virgil Chan <virchan.math@gmail.com> Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> Co-authored-by: Loïc Estève <loic.esteve@ymail.com> Co-authored-by: Omar Salman <omar.salman@arbisoft.com>

StefanieSenger and others added 14 commits December 7, 2024 23:51

ENH Array API for confusion_matrix

78d2a65

fix dtype checking

770e638

prepare for PR

af440ca

change log

b45646e

use our _isin

3db7054

changes after review

abab5ea

forgot to push that before

abc3981

add test

09cec5d

fix sclar dtype

fdb25f6

fix typos

49f75b7

convert_to_numpy and coo_matrix instead of python loop

914bb63

Merge branch 'main' into array_api_confusion_matrix

a939c80

experiment with convert_to_numpy

6da1d06

np.intersect1d can stay as it is

1f23f63

github-actions bot added module:metrics module:utils labels Dec 30, 2024

StefanieSenger added 2 commits December 30, 2024 10:55

return cm as numpy array

6a43bc3

move attach unique to after conversion to numpy

2000a00

StefanieSenger commented Dec 30, 2024

View reviewed changes

sklearn/utils/_array_api.py Outdated Show resolved Hide resolved

adjust test

5963e0f

StefanieSenger commented Dec 30, 2024

View reviewed changes

document return array type

ef84e04

StefanieSenger requested a review from ogrisel December 30, 2024 11:50

StefanieSenger added 2 commits January 2, 2025 09:36

use get_namespace

1cf525e

fix issue with nullable dtypes with pandas==1.1.5

f50f3ea

StefanieSenger commented Jan 2, 2025

View reviewed changes

sklearn/utils/_array_api.py Outdated Show resolved Hide resolved

private function

84038e4

virchan reviewed Jan 3, 2025

View reviewed changes

sklearn/metrics/_classification.py Outdated Show resolved Hide resolved

convert back to original namespace and keep cpu device

2edcf8e

StefanieSenger changed the title ~~ENH Array API support for confusion_matrix converting to numpy array~~ ENH Array API support for confusion_matrix Jul 15, 2025

OmarManzoor reviewed Jul 15, 2025

View reviewed changes

doc/modules/array_api.rst Show resolved Hide resolved

sklearn/metrics/_classification.py Outdated Show resolved Hide resolved

sklearn/metrics/_classification.py Outdated Show resolved Hide resolved

changes after review

a5bf169

StefanieSenger added the CUDA CI label Jul 16, 2025

github-actions bot removed the CUDA CI label Jul 16, 2025

StefanieSenger commented Jul 16, 2025

View reviewed changes

doc/modules/array_api.rst Show resolved Hide resolved

Merge branch 'main' into array_api_confusion_matrix_numpy

725cb24

lesteve reviewed Jul 24, 2025

View reviewed changes

sklearn/metrics/_classification.py Show resolved Hide resolved

lesteve approved these changes Jul 24, 2025

View reviewed changes

ogrisel approved these changes Jul 31, 2025

View reviewed changes

sklearn/metrics/_classification.py Show resolved Hide resolved

StefanieSenger and others added 3 commits August 1, 2025 11:25

Update sklearn/metrics/_classification.py

0740d3d

Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>

Merge branch 'main' into array_api_confusion_matrix_numpy

cc3fd4d

fix linting

cda414c

OmarManzoor approved these changes Aug 1, 2025

View reviewed changes

lesteve and others added 4 commits August 1, 2025 13:15

Adapt after _check_targets change [azure parallel]

ea480bd

empty commit to re-trigger CI

a84671c

empty commit to re-trigger CI [azure parallel]

82bf221

Tackle special case of empty inputs with _check_targets recent change…

d1b3439

…s [azure parallel]

Merge branch 'main' into array_api_confusion_matrix_numpy

5b103e7

OmarManzoor enabled auto-merge (squash) August 4, 2025 08:22

OmarManzoor merged commit 1ff785e into scikit-learn:main Aug 4, 2025
34 checks passed

StefanieSenger deleted the array_api_confusion_matrix_numpy branch August 4, 2025 11:15

Uh oh!

ENH Array API support for confusion_matrix #30562

ENH Array API support for confusion_matrix #30562

Uh oh!

Conversation

StefanieSenger commented Dec 30, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reference Issues/PRs

Uh oh!

github-actions bot commented Dec 30, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✔️ Linting Passed

Uh oh!

Uh oh!

StefanieSenger Dec 30, 2024

Choose a reason for hiding this comment

Uh oh!

StefanieSenger commented Dec 30, 2024

Uh oh!

Uh oh!

virchan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

OmarManzoor left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

StefanieSenger left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

lesteve left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ogrisel left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

OmarManzoor left a comment

Choose a reason for hiding this comment

Uh oh!

OmarManzoor commented Aug 1, 2025

Uh oh!

lesteve commented Aug 1, 2025

Uh oh!

lesteve commented Aug 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

OmarManzoor commented Aug 2, 2025

Uh oh!

ogrisel commented Aug 4, 2025

Uh oh!

OmarManzoor commented Aug 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

StefanieSenger commented Dec 30, 2024 •

edited

Loading

github-actions bot commented Dec 30, 2024 •

edited

Loading

OmarManzoor left a comment •

edited

Loading

lesteve left a comment •

edited

Loading

lesteve commented Aug 1, 2025 •

edited

Loading

OmarManzoor commented Aug 4, 2025 •

edited

Loading