Skip to content

ENH Array API support for confusion_matrix #30562

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 42 commits into
base: main
Choose a base branch
from

Conversation

StefanieSenger
Copy link
Contributor

@StefanieSenger StefanieSenger commented Dec 30, 2024

Reference Issues/PRs

towards #26024
closes #30440 (supercedes)

This PR is an alternative, discussed in #30440. It accepts array inputs from all namespaces, converts the input arrays to numpy arrays right away to do the calculations in numpy (which is necessary for the coo_matrix at least) and returns the confusion_matrix in the same namespace as the input and on a cpu device.

That's what we had discussed. For more details see the discussions on both PRs.

Copy link

github-actions bot commented Dec 30, 2024

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: 725cb24. Link to the linter CI: here

Comment on lines 3120 to 3127
result = confusion_matrix(y_true, y_pred)
xp_result, _ = get_namespace(result)
assert _is_numpy_namespace(xp_result)

# Since the computation always happens with NumPy / SciPy on the CPU, this
# function is expected to return an array allocated on the CPU even when it does
# not match the input array's device.
assert result.device == "cpu"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have adjusted this test to your suggestions from this comment, @ogrisel. But here, the test is narrower because we made the return value of confusion_matrix to always be a numpy array on cpu.

@StefanieSenger
Copy link
Contributor Author

Regarding the question how to document the return type of confusion_matrix() as a numpy array, I think that keeping
C : ndarray of shape (n_classes, n_classes) in the docstring should be enough, assumed that all the other functions and methods where we have added array api support document the return value type correctly, which is currently not the case.

Copy link
Member

@virchan virchan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect the CI failure (about the Test Library) is a false positive, as I couldn't reproduce the same error after running the tests multiple times on my local machine. Instead, I encountered a different set of errors related to CUDA. I believe the issue might be linked to how we convert sample_weight into a NumPy array.

Copy link
Contributor

@OmarManzoor OmarManzoor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks good overall. Thanks @StefanieSenger
We just need to finalize whether we want to keep the current flow where we return a numpy array on the cpu or should we convert to the initial namespace and device.
CC: @ogrisel

@lesteve
Copy link
Member

lesteve commented Feb 11, 2025

We just need to finalize whether we want to keep the current flow where we return a numpy array on the cpu or should we convert to the initial namespace and device.

About namespace, I would say for now the least controversial thing to do (from the point of view of array API support) would be to convert to the initial namespace, for example see #30440 (comment).

About device, I am not quite sure ... the Scipy array API support blog post compiled code section (from October 2023) says:

or now, we attempt to convert to NumPy using np.asarray before the compiled code, then convert back to our array's namespace using xp.asarray after it. This will raise exceptions for arrays on a different device to the CPU, as justified in the RFC (tldr: we want to avoid silent device transfers).

Maybe @lucascolley has some opinion on what to do with device based on his scipy experience?

@lucascolley
Copy link
Contributor

thanks for the ping @lesteve, I left my initial thoughts at #30440 (comment) then discussed this with Olivier and Tim in a call. The summary is that I think you'll want to think carefully before setting the precedent for different defaults across functions as to whether they return the input namespace or NumPy. That seems like it could be confusing for users and difficult to document.

In this case, if (some of) the computation has to be done with NumPy, then I would suggest attempting to convert back to the input namespace. In SciPy we have chosen to let this error for now and skip the tests for device transfers, but you could choose to let the device transfers happen (as I think you do in your _convert_to_numpy, right?).

@OmarManzoor
Copy link
Contributor

OmarManzoor commented Feb 16, 2025

I ran some benchmarks on a Kaggle kernel with CUDA using this branch where in one case, the current flow in which a numpy array is returned is used, while in the other we convert to the namespace and device when returning return xp.asarray(cm, device=device_)

Array Size (CUDA) Return Numpy Timing Return Namespace and Device Timing
1e3 0.002571168 0.002800589
1e4 0.005218287 0.005717518
1e5 0.010124249 0.011052353
1e6 0.041873491 0.044407668
1e7 0.571530447 0.586297677

By looking at this, I think we might as well return the confusion matrix in the original namespace and device.

@StefanieSenger
Copy link
Contributor Author

Thanks, everyone, for your thoughts on the return type!

To summarize:

The original decision (based on @ogrisel’s suggestion) was to default to returning numpy arrays.
@OmarManzoor’s benchmarks show that converting back to the original namespace and device is a little less performant than keeping the arrays numpy arrays, but the difference is small.
@lesteve and @lucascolley brought up the user experience and consistency across functions and users get the return types that they expect.

So, should we finalise the PR by converting back to the input namespace and device, despite the minor performance difference? Or should we keep the numpy default?

I believe the advantages of converting back to the input namespace and device overweigh and in case we would later judge differently for some reason, array API is experimental and we could still change its behaviour without annoying users too much.

@ogrisel, since your suggestion was to convert to numpy, do you have any thoughts on whether we should stick with that or return types from the input namespace based on the discussion?

@lucascolley
Copy link
Contributor

lucascolley commented Feb 16, 2025

So, should we finalise the PR by converting back to the input namespace and device, despite the minor performance difference?

+1 from me!

For clarity, what is the NumPy code (i.e. which lines) that we have been unable to write in a performant way with the standard (which caused the initial performance discussion)?

@OmarManzoor
Copy link
Contributor

For clarity, what is the NumPy code (i.e. which lines) that we have been unable to write in a performant way with the standard (which caused the initial performance discussion)?

Basically where we create the confusion matrix using a coo_matrix

cm = coo_matrix(
(sample_weight, (y_true, y_pred)),
shape=(n_labels, n_labels),
dtype=dtype,
).toarray()

@lesteve
Copy link
Member

lesteve commented Feb 18, 2025

So, should we finalise the PR by converting back to the input namespace and device, despite the minor performance difference?

This seems the less surprising to me for now and we can revisit this later if the need arises. I would think metrics computation is unlikely to be the bottleneck in most real life use cases.

@OmarManzoor
Copy link
Contributor

@ogrisel Could you kindly provide your feedback so that we can finalize this PR?

@ogrisel
Copy link
Member

ogrisel commented Jun 19, 2025

Based on the discussion in #31286 (comment), the decision is to convert such output arrays back to the original namespace.

@StefanieSenger StefanieSenger changed the title ENH Array API support for confusion_matrix converting to numpy array ENH Array API support for confusion_matrix Jul 15, 2025
Copy link
Contributor

@OmarManzoor OmarManzoor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the updates @StefanieSenger

@@ -141,6 +141,7 @@ Metrics
-------

- :func:`sklearn.metrics.accuracy_score`
- :func:`sklearn.metrics.confusion_matrix`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should have another section where we specify metrics that don't support the array api in internal computations and use numpy instead because the array api does not support the optimizations needed. Therefore using numpy is more appropriate internally and we convert the output back to the desired array.
This should be done because we don't guarantee and speedups for such metrics.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand, @OmarManzoor. But metrics computation is not the bottleneck for performance and it is hard to know where to draw the line, because confusion_matrix is internally used in so many other functions and methods. So, I am not so sure if we can decide on a good way to document that on this PR. Should we maybe rather discuss that somewhere else?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's fine to not add the additional document in this PR. But don't you think that where other metrics that are documented use the array api internally and the ones involving confusion matrix or confusion matrix itself don't use the array api, it would be better to clarify such a point? I think we can ask for opinions from others as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for understanding and your support on this PR, @OmarManzoor!
I don't mind either way and am happy to support the docs moving forward so or so.

If you would ask for my opinion (but it's really not deciding), then I would tend to say we don't need to justify in detail what happens internally at the moment. I would consider confusion_matrix and any other function that doesn't break when we use other array libraries on it now to be array api compatible. You can feed it arrays of any type and it returns arrays from the same type and thus it's compatible to be used as part of a data science pipeline.

For metrics the speedup by actually staying on gpu is not so high. And there will be many other places where we cannot support staying on gpu (yet) for different reasons, and that might change over time as array api evolves or we find better solutions. I think it is very ambitious to try to document internal details in such an early state of implementing array api in experimental state.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way the main reason for getting other opinions is that this was suggested previously #29881 (comment)

Copy link
Contributor Author

@StefanieSenger StefanieSenger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your reviews, @OmarManzoor and @lesteve!
I have applied your suggestions and commented regarding the documentation.

Now, this PR looks very straightforward. :)

@@ -141,6 +141,7 @@ Metrics
-------

- :func:`sklearn.metrics.accuracy_score`
- :func:`sklearn.metrics.confusion_matrix`
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand, @OmarManzoor. But metrics computation is not the bottleneck for performance and it is hard to know where to draw the line, because confusion_matrix is internally used in so many other functions and methods. So, I am not so sure if we can decide on a good way to document that on this PR. Should we maybe rather discuss that somewhere else?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants