ENH Array API support for confusion_matrix #30440

StefanieSenger · 2024-12-09T09:33:04Z

Reference Issues/PRs

towards #26024

Edit: This PR is now superseeded by #30562

What does this implement/fix? Explain your changes.

This PR aims to add Array API support to confusion_matrix(). I have run the CUDA tests on Colab and they too, pass.

@OmarManzoor @ogrisel @lesteve: do you want to have a look?

github-actions · 2024-12-09T09:34:24Z

❌ Linting issues

This PR is introducing linting issues. Here's a summary of the issues. Note that you can avoid having linting issues by enabling pre-commit hooks. Instructions to enable them can be found here.

You can see the details of the linting issues under the lint job here

`ruff check`

ruff detected issues. Please run ruff check --fix --output-format=full locally, fix the remaining issues, and push the changes. Here you can see the detected issues. Note that the installed ruff version is ruff=0.11.7.


sklearn/utils/_array_api.py:1101:1: W293 [*] Blank line contains whitespace
     |
1099 |     return array
1100 |
1101 |   
     | ^^ W293
1102 |   
1103 | def _logsumexp(array, axis=None, xp=None):
     |
     = help: Remove whitespace from blank line

sklearn/utils/_array_api.py:1102:1: W293 [*] Blank line contains whitespace
     |
1102 |   
     | ^^ W293
1103 | def _logsumexp(array, axis=None, xp=None):
1104 |     # TODO replace by scipy.special.logsumexp when
     |
     = help: Remove whitespace from blank line

Found 2 errors.
[*] 2 fixable with the `--fix` option.

`ruff format`

ruff detected issues. Please run ruff format locally and push the changes. Here you can see the detected issues. Note that the installed ruff version is ruff=0.11.7.


--- sklearn/utils/_array_api.py
+++ sklearn/utils/_array_api.py
@@ -1098,8 +1098,7 @@
             array[xp.isinf(array) & (array < 0)] = xp.iinfo(array.dtype).min
     return array
 
-  
-  
+
 def _logsumexp(array, axis=None, xp=None):
     # TODO replace by scipy.special.logsumexp when
     # https://github.com/scipy/scipy/pull/22683 is part of a release.

1 file would be reformatted, 925 files already formatted

_{Generated for commit: 2aacdef. Link to the linter CI: here}

sklearn/metrics/_classification.py

StefanieSenger · 2024-12-09T09:38:18Z

sklearn/metrics/_classification.py

    if need_index_conversion:
        label_to_ind = {y: x for x, y in enumerate(labels)}
-        y_pred = np.array([label_to_ind.get(x, n_labels + 1) for x in y_pred])
-        y_true = np.array([label_to_ind.get(x, n_labels + 1) for x in y_true])
+        y_pred = xp.asarray(
+            [label_to_ind.get(x, n_labels + 1) for x in y_pred], device=device_
+        )
+        y_true = xp.asarray(
+            [label_to_ind.get(x, n_labels + 1) for x in y_true], device=device_
+        )


This code block within the if need_index_conversion condition is only tested for ndarrays, because of the way our tests are written. It should work for the other array libraries that we currently integrate, but I feel we should in fact test this part of the code?

It fails in label_to_ind = {y: x for x, y in enumerate(labels)} for array_api_strict, because these array elements are not hashable.

I will try to fix this (possibly by re-factoring, don't spoiler) and add a test.

I have asked here and they have explained me how to deal with it. It is fixed and I have added a test.

(I have also tried to re-factor, but not successfully yet as what I tried is much slower than then status quo and at least in this PR, I won't further try, since we have a working solution.)

Not necessarily for this PR but maybe we could use unique_inverse

something like this would maybe work:

_, y_pred = xp.unique_inverse(y_pred)

Maybe a clearer example with array-api-strict:

In [1]: import array_api_strict ...: arr = array_api_strict.asarray([1, 3, 4, 1, 3, 5]) ...: unique_values, labels = array_api_strict.unique_inverse(arr) ...: labels ...: Out[1]: Array([0, 1, 2, 0, 1, 3], dtype=array_api_strict.int64)

Actually, looking a bit further, it looks even like we have sklearn.utils._encode._unique(return_inverse=True) that seems to have array API support already ...

Oh, that looks pretty good. I will try it tomorrow. :)

Not for this PR, but for information, @lesteve:

I have tried to make it work with unique_inverse and a few other approaches, but I find it pretty complex to handle the lookup with only array operations, especially with strings allowed as dtypes. Maybe someone else wants to give it a try. I won't try any further, but I can share my experience, if interested.

In this PR, we're fine with the status quo of using the mapping.

StefanieSenger · 2024-12-09T09:39:46Z

sklearn/metrics/_classification.py

+        for true, pred, weight in zip(y_true, y_pred, sample_weight):
+            cm[true, pred] += weight


Is this performant enough? I think it could be, because we are mostly dealing with small matrices at this point. But finding another way might be better. I am not sure how to do this with the tools available in array_api_strict though.

I have the feeling that this loop will kill any computational benefit of array API support. We might as well ensure that y_true and y_pred are numpy arrays using _convert_to_numpy and rely on the coo_matrix trick instead. This would keep the code simpler.

That being said, I think convenience array API support for classification metrics that rely on confusion matrix internally is useful as discussed in #30439 (comment).

I agree that it seems like python loops do not go well with GPUs. However there doesn't seem to be an alternative with the array api because it doesn't support any sort of advanced indexing.

So either we might have to use the loop if we insist on following the array api or we could simply use the original code by utilizing the _convert_to_numpy as @ogrisel suggested.

I ran this to check both options:

import timeit code_to_test = """ import torch from sklearn.metrics import confusion_matrix from sklearn.base import config_context xp=torch y_true = xp.randint(0, 2, (2**23,), dtype=xp.int32) y_pred = xp.randint(0, 2, (2**23,), dtype=xp.int32) with config_context(array_api_dispatch=True): confusion_matrix(y_true,y_pred) """ execution_time = timeit.timeit(code_to_test, number=1) print(execution_time)

The results are pretty clear, I am really surprised about the difference:

with python loop:
115.27976658102125

with _convert_to_numpy:
2.0403239529696293

The code for the python loop is the current state of this PR.
The code for the _convert_to_numpy version is in an extra branch: here

So, I think this speaks for the _convert_to_numpy option?

Edit: This was tested on cpu and I will next test it in Colab for data staying on cpu vs. gpu-cpu-gpu conversion.

I have executed this in Colab, where I had to massively reduce the size of the arrays, because otherwise Colab would interrupt itself and not finish executing. This was now tested on size 2**10:

with python loop and device='cpu': 2.3 s
with python loop and device='cuda': 7.9 s
with _convert_to_numpy and device='cpu': 2.9 s
with _convert_to_numpy and device='cuda': 2.7 s

I believe these results are not interpretable, because:

executing everything just once makes the results really dependent on chance (the way we happen to create the random arrays), but I cannot increase this in Colab because of it's limitations, it seems (should I run 100 repetitions on smaller arrays then?)

the large difference between both cpu solutions is not there anymore (though it seems unprobable than this large difference had happened by chance and I wonder what else could be the reason)

it is so much slower compared to the 2**23 version on my own laptop and I cannot see how can this be?

I can see that I must be doing something wrong, but seems I need some feedback to improve this test.

I just repeated the experiment using Scaleway VM (Type RENDER-S). I was able to execute the 2**23 array size versions for the python loop and got this:

with python loop:
execution_time for device cpu: 175.60608643199998
execution_time for device cuda: 468.15440756399994

with convert_to_numpy
execution_time for device cpu: 2.8455443269999705
execution_time for device cuda: 0.3635161219999645

So it seems the python loop is really having a bad effect, especially on cuda.

But there is still that problem with the number=1 repetitions, which makes our results object to chance. I will re-craft the test and report results.

I have now re-defined the test code. This is the test I ran using Scaleway VM (Type RENDER-S):

import timeit code_to_test = """ import torch from sklearn.metrics import confusion_matrix from sklearn.base import config_context xp=torch y_true = xp.randint(low=0, high=10, size=(int(1e5),), dtype=xp.int32, device="cpu") y_pred = xp.randint(low=0, high=10, size=(int(1e5),), dtype=xp.int32, device="cpu") with config_context(array_api_dispatch=True): confusion_matrix(y_true,y_pred) """ execution_time = timeit.timeit(code_to_test, number=50) print(f'execution_time for device cpu: {execution_time}') code_to_test = """ import torch from sklearn.metrics import confusion_matrix from sklearn.base import config_context xp=torch y_true = xp.randint(low=0, high=10, size=(int(1e5),), dtype=xp.int32, device="cuda") y_pred = xp.randint(low=0, high=10, size=(int(1e5),), dtype=xp.int32, device="cuda") with config_context(array_api_dispatch=True): confusion_matrix(y_true,y_pred) """ execution_time = timeit.timeit(code_to_test, number=50) print(f'execution_time for device cuda: {execution_time}')

These are the results:

with python loop (current branch)

execution_time for device cpu: 69.83575903099972 execution_time for device cuda: 276.34000336899953

with convert_to_numpy (this branch)

execution_time for device cpu: 1.9843785450002542 execution_time for device cuda: 0.4149802029996863

This even more confirms that we should use convert_to_numpy and then the coo sparse matrix instead of the python loop.

I am happy with this test now. Did I misunderstand anything or did I forget to consider anything?

Sorry for the verbosity of my posts, I am just discovering both these topics: gpu arrays and performance testing, so my thoughts were not all straightforward all the time.

Yes I think the experiments you performed confirm our initial hypothesis that using python loops will have a negative impact on the performance of any array api related code.

Also it makes sense that cuda is more affected because that involves synchronization and communication between the gpu device and the cpu code in the case of using loops.

Thank you for running the experiments.

Thanks for your confirmation, @OmarManzoor. 😃
I will then get rid of the python loop and use convert_to_numpy in combination with the coo_matrix then.

And for completeness, since @lesteve hinted me to split off the setup from the actually tested code, which is only the call to confusion_matrix. I also ran this:

import timeit setup = """ import torch from sklearn.metrics import confusion_matrix from sklearn.base import config_context xp=torch y_true = xp.randint(low=0, high=10, size=(int(1e5),), dtype=xp.int32, device="cpu") y_pred = xp.randint(low=0, high=10, size=(int(1e5),), dtype=xp.int32, device="cpu") """ code_to_test = """ with config_context(array_api_dispatch=True): confusion_matrix(y_true,y_pred) """ execution_time = timeit.timeit(stmt=code_to_test, setup=setup, number=50) print(f'execution_time for device cpu: {execution_time}') setup = """ import torch from sklearn.metrics import confusion_matrix from sklearn.base import config_context xp=torch y_true = xp.randint(low=0, high=10, size=(int(1e5),), dtype=xp.int32, device="cuda") y_pred = xp.randint(low=0, high=10, size=(int(1e5),), dtype=xp.int32, device="cuda") """ code_to_test = """ with config_context(array_api_dispatch=True): confusion_matrix(y_true,y_pred) """ execution_time = timeit.timeit(stmt=code_to_test, setup=setup, number=50) print(f'execution_time for device cuda: {execution_time}')

And again the results:

with python loop:

execution_time for device cpu: 69.24619939000013 execution_time for device cuda: 276.69077064399994

with convert_to_numpy:

execution_time for device cpu: 0.28062641000002486 execution_time for device cuda: 0.3014569509999774

These results, again, confirm that the loop would have been a pretty bad choice. Goodbye, loop.

Edit: this test now also complies to this comment on not testing data creation in another PR.

sklearn/metrics/_classification.py

OmarManzoor

Thanks for the PR @StefanieSenger

sklearn/metrics/_classification.py

StefanieSenger

Thank you for reviewing, @OmarManzoor.
I have implemented your suggestions. Would you mind having another look?

(Currently still working on it: I found some problem. So no rush.) --> Edit: it's resolved.

sklearn/metrics/_classification.py

sklearn/utils/_array_api.py

lesteve · 2024-12-19T13:12:18Z

My main comment is why don't we convert to numpy at the top of the function and then use the same numpy code as before. We end up converting to numpy array to create the sparse matrix anyway? I would expect that manipulating 1d arrays on GPU is not going to have a huge speed improvement compared to using numpy, maybe that's a naive assumption.

It seems like this kind of stuff has been discussed before, for example #28626 (comment), #30439 (comment), and #30439 (comment). Unfortunately, I have to admit that I don't have a good overview of it and whether there was an agreed general direction ...

My intuition is telling me that it would be easier in metrics to convert first to numpy and then to do the computation in numpy. I am fine with it not being "real Array API support". As an end-user I want to do a cross_val_score with the heavy computation using the GPU to have speed improvement (I guess .fit and maybe .predict) and not have any error. If GPU 1d arrays are converted to numpy arrays to compute metrics and that is not the bottleneck I don't care that much.

OmarManzoor · 2024-12-20T08:26:04Z

My main comment is why don't we convert to numpy at the top of the function and then use the same numpy code as before. We end up converting to numpy array to create the sparse matrix anyway? I would expect that manipulating 1d arrays on GPU is not going to have a huge speed improvement compared to using numpy, maybe that's a naive assumption.

I think if the speed benefits gained from using the array api are not significant within the computation then we can probably just use numpy from start to end. However if there is benefit in using the array api in places where it can be used then I think it would be useful to use the array api and convert to numpy only where required.

StefanieSenger · 2024-12-23T11:11:18Z

@lesteve and @OmarManzoor:

I have checked how it would be if we converted all the input into numpy in the param validation in the top of the function and then return the confusion_matrix in whatever namespace the input came from in a separate branch (not sure if you can see the diff) These are the results:

execution_time for device cpu: 0.6608354229999804
execution_time for device cuda: 0.8810625369999343

And here, for comparison again the results from the current branch:

execution_time for device cpu: 2.0540889530000186
execution_time for device cuda: 0.6095863729999564

I ran the same test as here, but larger arrays (size=(int(1e6),)).

It seems that the numpy version is faster for numpy inputs and there is not a big difference for cuda, so I would tend to prefer the new branch version. Would you agree or is there something else to consider?

OmarManzoor · 2024-12-24T11:08:39Z

It seems that the numpy version is faster for numpy inputs and there is not a big difference for cuda, so I would tend to prefer the new branch version. Would you agree or is there something else to consider?

Looking at the results I think we might just convert to numpy and use the original code instead of complicating the logic inside this function with the array api.

I also ran some benchmarks on a kaggle notebook using arrays of size 1e8 and the results show a similar trend. Even though the CUDA version seems to be somewhat better consistently by using the array api, I don't think the performance gain is too dramatic.

RAM 29GB
GPU P100 16GB
array size 1e8

simple numpy by converting to numpy at the start

avg execution_time for device cpu: 7.236887979507446
avg execution_time for device cuda: 5.599829816818238

current code with array api

avg execution_time for device cpu: 11.78741683959961
avg execution_time for device cuda: 3.5200899124145506

Here is the short python script that I used

from time import time

import torch
from tqdm import tqdm

from sklearn.base import config_context
from sklearn.metrics import confusion_matrix

xp = torch

execution_times = []
for _ in tqdm(range(10), desc="CPU"):
    y_true = xp.randint(low=0, high=10, size=(int(1e8),), dtype=xp.int64, device="cpu")
    y_pred = xp.randint(low=0, high=10, size=(int(1e8),), dtype=xp.int64, device="cpu")
    start = time()
    with config_context(array_api_dispatch=True):
        confusion_matrix(y_true, y_pred)
    execution_times.append(time() - start)
print(f"avg execution_time for device cpu: {sum(execution_times) / 10}")

execution_times = []
for _ in tqdm(range(10), desc="CUDA"):
    y_true = xp.randint(low=0, high=10, size=(int(1e8),), dtype=xp.int64, device="cuda")
    y_pred = xp.randint(low=0, high=10, size=(int(1e8),), dtype=xp.int64, device="cuda")
    start = time()
    with config_context(array_api_dispatch=True):
        confusion_matrix(y_true, y_pred)
    execution_times.append(time() - start)
print(f"execution_time for device cuda: {sum(execution_times) / 10}")

ogrisel

I had started a review of this branch as such before considering the fact that the new branch might be better and would make some of the points moot.

The comments related to testing and documenting the output array namespace and device of the function are still valid for the other branch, though.

ogrisel · 2024-12-26T13:26:29Z

sklearn/metrics/tests/test_classification.py

+    y_pred = xp.asarray([4, 5, 6], device=device)
+
+    with config_context(array_api_dispatch=True):
+        confusion_matrix(y_true, y_pred)


Maybe add a check about the result to assert that the namespace of the result array is array_namespace.

The device is not necessarily the same, though (because we do not move back to the input device). I think it's fine to keep the result array on a CPU device even if the input arrays are GPU-allocated.

I don't think there is a library-agnostic way to check that a device object is a CPU device:

https://data-apis.org/array-api/latest/design_topics/device_support.html#semantics

Maybe we could check that the output

Suggested change

confusion_matrix(y_true, y_pred)

result = confusion_matrix(y_true, y_pred)

xp_result, device_result = get_namespace_and_device(result)

assert xp_result is xp

# Since the final computation always happens with NumPy / SciPy on

# the CPU, this function is expected to return an array allocated

# on the default device even when it does not match the input array's

# device.

default_device = device(xp.zeros(0))

assert device_result == default_device

If the last assertion does not work for any reason, I think it's fine not to test the result array device.

The fact that we don't move back the data to the input arrays' device can be a bit surprising. Maybe we should document that somewhere, but I am not sure how.

What about a table here-ish - we could also include info on whether support is 'surface' only and we actually do all computation in numpy, like what we've decided for confusion_matrix. Though I appreciate this isn't binary and in some functions part of the compute will be array api compliant or convert to numpy only for compiled functions etc

@lucyleeow you may know this already but my current understanding is that the confusion_matrix PR that is more likely to be merged is #30562.

I am a bit unsure in confusion_matrix about moving back to the original array namespace. I am slightly leaning towards doing it for consistency's sake, even if I am not entirely convinced it is that useful in practice.

Thanks, don't worry I saw that, just continuing the discussion here as it was started here.

ogrisel · 2024-12-26T13:28:30Z

sklearn/utils/_array_api.py

+        if xp.isdtype(array.dtype, "real floating"):
+            array[xp.isinf(array) & (array > 0)] = xp.finfo(array.dtype).max
+            array[xp.isinf(array) & (array < 0)] = xp.finfo(array.dtype).min
+        else:  # xp.isdtype(array.dtype, "integral")
+            array[xp.isinf(array) & (array > 0)] = xp.iinfo(array.dtype).max
+            array[xp.isinf(array) & (array < 0)] = xp.iinfo(array.dtype).min


Since xp.isinf(array) is always called twice in a row, let's reuse the result of the first call.

Suggested change

if xp.isdtype(array.dtype, "real floating"):

array[xp.isinf(array) & (array > 0)] = xp.finfo(array.dtype).max

array[xp.isinf(array) & (array < 0)] = xp.finfo(array.dtype).min

else: # xp.isdtype(array.dtype, "integral")

array[xp.isinf(array) & (array > 0)] = xp.iinfo(array.dtype).max

array[xp.isinf(array) & (array < 0)] = xp.iinfo(array.dtype).min

isinf_mask = xp.isinf(array)

if xp.isdtype(array.dtype, "real floating"):

array[isinf_mask & (array > 0)] = xp.finfo(array.dtype).max

array[isinf_mask & (array < 0)] = xp.finfo(array.dtype).min

else: # xp.isdtype(array.dtype, "integral")

array[isinf_mask & (array > 0)] = xp.iinfo(array.dtype).max

array[isinf_mask & (array < 0)] = xp.iinfo(array.dtype).min

ogrisel · 2024-12-26T13:44:45Z

sklearn/utils/_array_api.py

+    minimum numbers available for the dtype respectively; like np.nan_to_num."""
+    xp, _ = get_namespace(array, xp=xp)
+    try:
+        array = xp.nan_to_num(array)


Suggested change

array = xp.nan_to_num(array)

# nan_to_num is not part of the array API spec at this time but is generally

# consistently well adopted, so we anticipate a future inclusion in the spec:

# https://github.com/data-apis/array-api/issues/878

array = xp.nan_to_num(array)

ogrisel · 2024-12-26T13:49:49Z

sklearn/metrics/_classification.py

        shape=(n_labels, n_labels),
        dtype=dtype,
    ).toarray()
+    cm = xp.asarray(cm)


We could move this conversion to the last line (just before the return statement) and this would side-step the problems related to np.errstate-style error control not being part of the array API spec.

Since we do not move the cm back to the input device, using the array namespace for the normalization step below is not expected to yield any computational advantages.

This would also get rid of maintaining a fallback implementation of nan_to_num since this is not (yet?) part of the spec.

ogrisel · 2024-12-26T14:07:17Z

It seems that the numpy version is faster for numpy inputs and there is not a big difference for cuda, so I would tend to prefer the new branch version. Would you agree or is there something else to consider?

I agree, but I would not have expected that the NumPy or torch CPU performance to be that degraded in this branch compared to the other branch. This is visible both in your int(1e6) data benchmark and in @OmarManzoor's larger data benchmark above (#30440 (comment)).

It might be interesting to profile both branches with a large enough dataset to understand where this discrepancy comes from.

ogrisel · 2024-12-26T14:14:48Z

Note that in the other branch, you move back the results to the input arrays' device:

https://github.com/StefanieSenger/scikit-learn/blob/1f23f634d55a9f9b7182584ddb6e6afd37548d98/sklearn/metrics/_classification.py#L424

I am not sure if this is needed / useful:

on the plus side, it is less surprising to the caller;
on the minus side, it might cause unnecessary data transfer / synchronization overhead between the host and the device: in general confusion matrices are tiny matrices (n_classes is seldom larger than 1000) so I don't expect downstream computation to ever benefit from GPU computation, even when the inputs y_true and y_pred are huge arrays.

Shall we even keep the result as a NumPy array?

ogrisel · 2024-12-26T17:13:26Z

I did some profiling using py-spy and here is what I found:

when using a very large number of data points (int(1e8)), computing unique input values in y_true and y_pred dominates (more than 40% of the total time), even for numpy inputs that benefit from the unique value cache (sklearn.utils._unique.attach_unique/cached_unique);
the numpy-dtype based unique cache system does not work when using pytorch tensors (as expected...) and then computing unique values takes 80% or more of the total time because it happens several times in _check_targets and then in unique_labels;
the measured times vary a lot from run to run, and I am not sure if there is a significant difference between the two branches for numpy inputs.

However, converting early to numpy should allow benefiting from the caching of unique values if we convert to numpy before calling attach_unique.

ogrisel · 2024-12-26T17:20:30Z

@StefanieSenger could you please open a PR from your other branch (and cross-link to here) and then move the calls to attach_unique after the numpy conversion at the beginning of:

https://github.com/StefanieSenger/scikit-learn/blob/1f23f634d55a9f9b7182584ddb6e6afd37548d98/sklearn/metrics/_classification.py#L340-L367

OmarManzoor · 2024-12-30T07:08:27Z

on the plus side, it is less surprising to the caller;

on the minus side, it might cause unnecessary data transfer / synchronization overhead between the host and the device: in general confusion matrices are tiny matrices (n_classes is seldom larger than 1000) so I don't expect downstream computation to ever benefit from GPU computation, even when the inputs y_true and y_pred are huge arrays.

Shall we even keep the result as a NumPy array?

I think we can keep the result as a NumPy array as this returns a metric which I doubt would be utilized in further computations that require a GPU specifically.

lucascolley · 2025-01-02T15:00:11Z

Shall we even keep the result as a NumPy array?

I think we can keep the result as a NumPy array as this returns a metric which I doubt would be utilized in further computations that require a GPU specifically.

This is interesting and quite surprising to me, have you included this pattern anywhere else in scikit-learn? In SciPy, we have been quite careful to ensure that everything is namespace_in === namespace_out. I think there is a concern that telling users "mostly namespace_in === namespace_out, except ..." will be quite confusing. What if a user wants to do this?

# arrays of some namespace xp
x1 = confusion_matrix(y_true,y_pred)
z = xp.concat((x1, x2))

Since xp.concat may only accept xp arrays and not numpy arrays, you'd have to make sure users know to call xp.asarray on the confusion matrix. I think it would be pretty confusing if you have to do this for some sklearn functions and not others.

lucascolley · 2025-01-02T15:04:15Z

I think that adding an option to short-circuit and return the result as a NumPy array sounds like a great idea, but I don't think it should be the default behaviour.

StefanieSenger · 2025-01-03T08:47:22Z

I think there is a concern that telling users "mostly namespace_in === namespace_out, except ..." will be quite confusing.

I agree. I also think that converting back to the namespace that was passed, wouldn't be very costly here.

ogrisel · 2025-01-05T10:21:35Z

I think that adding an option to short-circuit and return the result as a NumPy array sounds like a great idea, but I don't think it should be the default behaviour.

So far, we did not have to change the public API of scikit-learn to add array API support (beyond enabling/disabling array API dispatch via sklearn.config_context or sklearn.set_config). Since confusion_matrix is a public function, adding such an option would set a precedent. I would therefore only add such an argument to a private _confusion_matrix function meant to be called either by the public confusion_matrix function or directly by public metric functions that know whether it's useful to keep the result as a numpy array.

OmarManzoor · 2025-01-06T06:38:30Z

Since confusion_matrix is a public function, adding such an option would set a precedent. I would therefore only add such an argument to a private _confusion_matrix function meant to be called either by the public confusion_matrix function or directly by public metric functions that know whether it's useful to keep the result as a numpy array.

In that case wouldn't it be just better to cast just before returning to the intended namespace and device?

StefanieSenger added 3 commits December 7, 2024 23:51

ENH Array API for confusion_matrix

78d2a65

fix dtype checking

770e638

prepare for PR

af440ca

github-actions bot added the module:metrics label Dec 9, 2024

StefanieSenger commented Dec 9, 2024

View reviewed changes

StefanieSenger added 2 commits December 9, 2024 10:49

change log

b45646e

use our _isin

3db7054

ogrisel added the CUDA CI label Dec 9, 2024

github-actions bot removed the CUDA CI label Dec 9, 2024

OmarManzoor reviewed Dec 10, 2024

View reviewed changes

sklearn/metrics/_classification.py Outdated Show resolved Hide resolved

sklearn/metrics/_classification.py Outdated Show resolved Hide resolved

changes after review

abab5ea

StefanieSenger commented Dec 10, 2024

View reviewed changes

sklearn/metrics/_classification.py Outdated Show resolved Hide resolved

sklearn/metrics/_classification.py Outdated Show resolved Hide resolved

forgot to push that before

abc3981

OmarManzoor reviewed Dec 10, 2024

View reviewed changes

sklearn/utils/_array_api.py Outdated Show resolved Hide resolved

ogrisel reviewed Dec 10, 2024

View reviewed changes

sklearn/utils/_array_api.py Outdated Show resolved Hide resolved

StefanieSenger added 3 commits December 11, 2024 15:42

add test

09cec5d

fix sclar dtype

fdb25f6

fix typos

49f75b7

OmarManzoor added the Array API label Dec 17, 2024

convert_to_numpy and coo_matrix instead of python loop

914bb63

Merge branch 'main' into array_api_confusion_matrix

a939c80

ogrisel reviewed Dec 26, 2024

View reviewed changes

StefanieSenger mentioned this pull request Dec 30, 2024

ENH Array API support for confusion_matrix #30562

Merged

StefanieSenger and others added 2 commits January 3, 2025 09:48

Merge branch 'main' into array_api_confusion_matrix

3b2eb04

satisfy ruff

78b9612

This was referenced Jan 13, 2025

MNT helper _nan_to_num function for array API #30637

Closed

Documenting return array types #30638

Closed

lucyleeow added the Superseded PR has been replace by a newer PR label Feb 6, 2025

lucyleeow mentioned this pull request May 22, 2025

Make more of the "tools" of scikit-learn Array API compatible #26024

Open

Merge branch 'main' into array_api_confusion_matrix

2aacdef

StefanieSenger added Superseded PR has been replace by a newer PR and removed Superseded PR has been replace by a newer PR labels Jul 15, 2025

OmarManzoor closed this in #30562 Aug 4, 2025

StefanieSenger deleted the array_api_confusion_matrix branch August 4, 2025 11:21

		for true, pred, weight in zip(y_true, y_pred, sample_weight):
		cm[true, pred] += weight

-        confusion_matrix(y_true, y_pred)
+        result = confusion_matrix(y_true, y_pred)
+        xp_result, device_result = get_namespace_and_device(result)
+        assert xp_result is xp
+        # Since the final computation always happens with NumPy / SciPy on
+        # the CPU, this function is expected to return an array allocated
+        # on the default device even when it does not match the input array's
+        # device.
+        default_device = device(xp.zeros(0))
+        assert device_result == default_device

-        array = xp.nan_to_num(array)
+        # nan_to_num is not part of the array API spec at this time but is generally
+        # consistently well adopted, so we anticipate a future inclusion in the spec:
+        # https://github.com/data-apis/array-api/issues/878
+        array = xp.nan_to_num(array)

Uh oh!

ENH Array API support for confusion_matrix #30440

ENH Array API support for confusion_matrix #30440

Uh oh!

Conversation

StefanieSenger commented Dec 9, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Uh oh!

github-actions bot commented Dec 9, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

❌ Linting issues

ruff check

ruff format

Uh oh!

Uh oh!

StefanieSenger Dec 9, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

StefanieSenger Dec 10, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

StefanieSenger Dec 12, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

StefanieSenger Dec 13, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

StefanieSenger Dec 9, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

StefanieSenger Dec 12, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

StefanieSenger Dec 16, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

StefanieSenger Dec 18, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

OmarManzoor left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

StefanieSenger left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

StefanieSenger commented Dec 9, 2024 •

edited

Loading

github-actions bot commented Dec 9, 2024 •

edited

Loading

`ruff check`

`ruff format`

StefanieSenger Dec 9, 2024 •

edited

Loading

StefanieSenger Dec 10, 2024 •

edited

Loading

StefanieSenger Dec 12, 2024 •

edited

Loading

StefanieSenger Dec 13, 2024 •

edited

Loading

StefanieSenger Dec 9, 2024 •

edited

Loading

StefanieSenger Dec 12, 2024 •

edited

Loading

StefanieSenger Dec 16, 2024 •

edited

Loading

StefanieSenger Dec 18, 2024 •

edited

Loading

StefanieSenger left a comment •

edited

Loading

lesteve commented Dec 19, 2024 •

edited

Loading

StefanieSenger commented Dec 23, 2024 •

edited

Loading

OmarManzoor commented Dec 24, 2024 •

edited

Loading

ogrisel left a comment •

edited

Loading

lucyleeow Feb 7, 2025 •

edited

Loading

ogrisel commented Dec 26, 2024 •

edited

Loading

ogrisel commented Dec 26, 2024 •

edited

Loading

ogrisel commented Dec 26, 2024 •

edited

Loading