[WIP] Add array-api support to metrics.confusion_matrix #28867

charlesjhill · 2024-04-21T19:56:13Z

Reference Issues/PRs

See #26024 for the array-api meta-issue tracking the "tools" in sklearn.

What does this implement/fix? Explain your changes.

This PR adds array-api compatibility to the sklearn.metrics.confusion_matrix method, aiming to support all of its current API surface. Many other classification metrics are or can be computed based on a confusion matrix so it seems fairly high value to port.

TODO:

Complete porting confusion_matrix test suite to check other array namespaces.

Any other comments?

None for now :)

github-actions · 2024-04-21T19:57:24Z

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

_{Generated for commit: 2ea2722. Link to the linter CI: here}

…atrix-array-api

ogrisel

Quick feedback.

ogrisel · 2024-04-25T08:44:57Z

sklearn/utils/_array_api.py

+    # import array_api_strict as xp
+    X = xp.asarray(X, copy=copy)
+    dtype = X.dtype
+    isscaler = X.ndim == 0


Typo: scaler => scalar.

ogrisel · 2024-05-07T13:52:30Z

sklearn/utils/_array_api.py

+        msg = (
+            "Cannot return indices with the torch backend yet. See" " array_api_compat."
+        )
+        raise NotImplementedError(msg)


I have opened data-apis/array-api-compat#135. If array_api_compat maintainers accept this suggestion, then it might be worth contributing such a temporary workaround to array_api_compat. If not, we can implement our own temporary workaround for torch in scikit-learn.

ogrisel · 2024-05-07T13:55:38Z

sklearn/utils/_array_api.py

+            yield namespace, "cuda"
+            yield namespace, "mps"
+        else:
+            yield namespace, None


+1 for this. There once this is in, we should do a follow-up PR for occurrences of yield_namespace_device_dtype_combinations that discard the dtype value to avoid redundant test cases.

ogrisel · 2024-05-07T14:08:27Z

Please also add a changelog entry in v1.6.rst as soon as #27381 is merged.

betatim · 2024-05-16T11:57:00Z

sklearn/metrics/_classification.py

-                    f"Got y_true={xp.unique(y_true)} and "
-                    f"y_pred={xp.unique(y_pred)}. Make sure that the "
+                    f"Got y_true={xp.unique_values(y_true)} and "
+                    f"y_pred={xp.unique_values(y_pred)}. Make sure that the "
                    "predictions provided by the classifier coincides with "


Suggested change

"predictions provided by the classifier coincides with "

"predictions provided by the classifier coincide with "

unrelated grammar fix (I think)

betatim · 2024-05-16T11:58:09Z

sklearn/metrics/_classification.py

    if y_type not in ("binary", "multiclass"):
        raise ValueError("%s is not supported" % y_type)

    if labels is None:
        labels = unique_labels(y_true, y_pred)
    else:
-        labels = np.asarray(labels)
-        n_labels = labels.size
+        n_labels = size(labels)
        if n_labels == 0:
            raise ValueError("'labels' should contains at least one label.")


Suggested change

raise ValueError("'labels' should contains at least one label.")

raise ValueError("'labels' should contain at least one label.")

unrelated typo fix

betatim · 2024-05-16T12:19:05Z

sklearn/utils/_array_api.py

@@ -542,14 +567,19 @@ def get_namespace(*arrays, remove_none=True, remove_types=(str,), xp=None):
    # message in case it is missing.
    import array_api_compat

-    namespace, is_array_api_compliant = array_api_compat.get_namespace(*arrays), True
+    # Convert lists and tuple to numpy arrays.


Can we make the comment so it explains why it is a good idea to do this? I think that would be helpful to have here, at least for me it is not 100% clear that we should do this

betatim · 2024-05-16T12:25:39Z

sklearn/metrics/_classification.py

+        if size(y_true) == 0:
+            return xp.zeros((n_labels, n_labels), dtype=xp.int32, device=device)
+
+        if size(_intersect1d(y_true, labels, xp=xp)) == 0:


What happens here if labels is a Python list of strings and y_true is, say, a torch array?

As a user I'd probably provide the labels as a Python list/tuple, mostly because it is convenient and not performance critical.

Is there a downside to being helpful to the callers and allowing list/tuple here?

charlesjhill · 2024-05-16T17:20:32Z

Thanks for the comments @ogrisel, @betatim. I've been putting most of my dev time into #27113, but I will see if I can find some time to get back to this one in the next few days ~

lesteve · 2025-01-23T06:48:52Z

I am going to close this one since there has been some work in the meantime on confusion_matrix see #30562 for example.

Hopefully you don't mind too much @charlesjhill 🙏.

StefanieSenger · 2025-01-28T12:18:21Z

Sorry @charlesjhill, that was an oversight. I wasn't aware you had opened a WIP PR before.

charlesjhill added 4 commits April 21, 2024 14:05

First pass at confusion matrix support with array api

9166240

Fix issue with type casting with inplace division.

85163a5

Fix array-api implementation of _nan_to_num.

ae9d207

Begin porting confusion_matrix tests.

5cfb85f

github-actions bot added module:metrics module:utils labels Apr 21, 2024

ogrisel added the Array API label Apr 24, 2024

charlesjhill added 5 commits April 29, 2024 23:17

Merge remote-tracking branch 'upstream/main' into feature/confusion-m…

e258ba1

…atrix-array-api

Update array_api helpers to use upstream methods.

864ffea

Remove extraneous array api function.

aa58127

Merge remote-tracking branch 'upstream/main' into feature/confusion-m…

f5565f3

…atrix-array-api

Fixup the remaining TODOs.

2ea2722

ogrisel reviewed May 7, 2024

View reviewed changes

betatim reviewed May 16, 2024

View reviewed changes

lesteve closed this Jan 23, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Add array-api support to metrics.confusion_matrix #28867

[WIP] Add array-api support to metrics.confusion_matrix #28867

charlesjhill commented Apr 21, 2024 •

edited

Loading

github-actions bot commented Apr 21, 2024 •

edited

Loading

ogrisel left a comment

ogrisel Apr 25, 2024

ogrisel May 7, 2024

ogrisel May 7, 2024

ogrisel commented May 7, 2024

betatim May 16, 2024

betatim May 16, 2024

betatim May 16, 2024

betatim May 16, 2024

charlesjhill commented May 16, 2024

lesteve commented Jan 23, 2025

StefanieSenger commented Jan 28, 2025

	"predictions provided by the classifier coincides with "
	"predictions provided by the classifier coincide with "

	raise ValueError("'labels' should contains at least one label.")
	raise ValueError("'labels' should contain at least one label.")

[WIP] Add array-api support to metrics.confusion_matrix #28867

[WIP] Add array-api support to metrics.confusion_matrix #28867

Conversation

charlesjhill commented Apr 21, 2024 • edited Loading

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

github-actions bot commented Apr 21, 2024 • edited Loading

✔️ Linting Passed

ogrisel left a comment

Choose a reason for hiding this comment

ogrisel Apr 25, 2024

Choose a reason for hiding this comment

ogrisel May 7, 2024

Choose a reason for hiding this comment

ogrisel May 7, 2024

Choose a reason for hiding this comment

ogrisel commented May 7, 2024

betatim May 16, 2024

Choose a reason for hiding this comment

betatim May 16, 2024

Choose a reason for hiding this comment

betatim May 16, 2024

Choose a reason for hiding this comment

betatim May 16, 2024

Choose a reason for hiding this comment

charlesjhill commented May 16, 2024

lesteve commented Jan 23, 2025

StefanieSenger commented Jan 28, 2025

charlesjhill commented Apr 21, 2024 •

edited

Loading

github-actions bot commented Apr 21, 2024 •

edited

Loading