ENH Array API for `contingency_matrix` #29251

Tialo · 2024-06-13T16:31:42Z

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Added common test like in metrics/tests/test_common.py. As supervised clusters metrics operate with labels imo it is wrong to use dtype_name from yield_namespace_device_dtype_combinations which are float types.

Any other comments?

Should common test check if metric array is on the same device?

github-actions · 2024-06-13T16:32:59Z

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

_{Generated for commit: 7d7d58f. Link to the linter CI: here}

EmilyXinyi · 2024-06-18T08:18:27Z

As supervised clusters metrics operate with labels imo it is wrong to use dtype_name from yield_namespace_device_dtype_combinations which are float types.

Out of curiosity and because I am not familiar with this, do you mind elaborating on why you think that this? Thanks!

Tialo · 2024-06-18T15:17:15Z

As supervised clusters metrics operate with labels imo it is wrong to use dtype_name from yield_namespace_device_dtype_combinations which are float types.

Out of curiosity and because I am not familiar with this, do you mind elaborating on why you think that this? Thanks!

dtype_name is either float32 or float64. It will be more reasonably to use integer types for labels, as usually they are numbers from 0 to n.

But iirc, some metrics, e.g. entropy can use float labels, like 4.0, 2.0, but I am not sure about 4.2, 1.7, etc

ogrisel · 2024-07-08T09:10:57Z

I agree with the comment above. To avoid the confusion, the returned value yield_namespace_device_dtype_combinations could be renamed to floating_dtype and the docstring updated accordingly to explain that GPU devices are typically optimized for float32 (or even lower) but some can also offer float64 support.

ogrisel

Thanks for the PR. Here is some feedback:

sklearn/metrics/cluster/_supervised.py

ogrisel · 2024-07-08T12:33:48Z

sklearn/metrics/cluster/tests/test_common.py

+
+def check_array_api_metric_supervised(metric, array_namespace, device, dtype_name):
+    labels_true = np.array([0, 0, 1, 1, 2, 2], dtype=dtype_name)
+    labels_pred = np.array([1, 0, 2, 1, 0, 2], dtype=dtype_name)


Please (also or only) test with the default integer dtype (without casting to dtype_name). The dtype_names generated by our testing infrastructure are floating point dtypes. Using floating point dtypes does not really make sense for cluster labels.

Related: #29251 (comment)

Do you mean to add a test, where dtype_name will be for example "float32"?

The phrasing of my comment was grammatically incorrect and confusing. I rewrote it.

I see the confusion here, actually as defined in test_array_api_compliance dtype_name either int32 or int64. So float types are not checked anyway.

ogrisel · 2024-10-17T13:00:42Z

sklearn/metrics/cluster/_supervised.py

+    )
+
+    if sparse and is_array_api_compliant:
+        raise ValueError("Cannot use sparse=True while using array api dispatch")


Coming back to this, maybe raising this exception is unnecessarily annoying. We could instead just return

Accepting array API inputs would not necessarily mean to output a data structure of the same type when output is request explicitly to use another container type.

We could instead just return

Did you mean to write something after return?

Yes, I mean to return a sparse scipy data-structures (backed by numpy arrays).

~~Does #31286 mean that should return the same type as input..?~~ Edit - I missed the sparse parameter

Tialo · 2024-10-17T15:20:50Z

sklearn/metrics/cluster/_supervised.py

+
+    classes, class_idx = xp.unique_inverse(labels_true)
+    clusters, cluster_idx = xp.unique_inverse(labels_pred)
+    class_idx = _convert_to_numpy(class_idx, xp)


otherwise cupy would raise here

contingency = sp.coo_matrix( (np.ones(class_idx.shape[0]), (class_idx, cluster_idx)), shape=(n_classes, n_clusters), dtype=dtype, )

@ogrisel do you think it is worthwhile to do some benchmarking, to compare conversion to numpy vs creating the table via other functions, or should we just leave as is?

…y-api/contingency_matrix

Tialo · 2024-12-02T16:32:51Z

I am not sure why my test fails with this error.
FAILED metrics/cluster/tests/test_supervised.py::test_contingency_matrix_array_api_sparse - ImportError: array_api_compat is required to dispatch arrays using the API

And if module is not installed, then I don't know why other array-api tests do not fail

…y-api/contingency_matrix

lucyleeow · 2025-07-04T07:38:00Z

Checking in on this as it seems that this would unblock many other metrics (see #26024 (comment))

Thanks so much for your patience @Tialo , just checking, are you still interested in working on this?

Tialo · 2025-07-04T12:51:02Z

Hi, yes I am

lucyleeow

Thanks for your patience @Tialo , this looks like it's in good shape, just a question around whether it is worth exploring alternate table computation, that it would be nice to get other maintainers thoughts.

We now vendor array-api-compat and array-api-extra ( #30340), so maybe this will fix your error: #29251 (comment)

Could you please merge main and fix the merge conflicts? Thank you

lucyleeow · 2025-07-09T05:29:34Z

sklearn/metrics/cluster/tests/test_supervised.py

+            np.array([1, 0]),
+            sparse=True,
+        )
+        assert isinstance(res, sp.csr_matrix)


Should this check be used with yield_namespace_device_dtype_combinations ?

Not sure, I think checking that use of array_api_dispatch=True and sparse=True returns sparse matrix is sufficient

lucyleeow · 2025-07-09T05:34:49Z

sklearn/metrics/cluster/tests/test_common.py

@@ -232,3 +242,60 @@ def test_returned_value_consistency(name):

    assert isinstance(score, float)
    assert not isinstance(score, (np.float64, np.float32))
+
+
+def check_array_api_metric(


Do you think we could re-use check_array_api_metric from sklearn/metrics/tests/test_common.py ?

There is a processing of specific kwargs inside this function e.g.

if metric_kwargs.get("sample_weight") is not None: metric_kwargs["sample_weight"] = xp.asarray( metric_kwargs["sample_weight"], device=device )

Will it be ok to add similar kwargs processing to check_array_api_metric from sklearn/metrics/tests/test_common.py if it will be needed in future?
For example for function mutual_info_score and parameter contingency

lucyleeow · 2025-07-21T01:29:47Z

@Tialo are you still interested in working on this PR? Thanks

Tialo · 2025-07-21T04:59:56Z

I won't be able to get back to this PR for a month, so if this is a blocker, you can take it over

…ontingency_matrix

…y-api/contingency_matrix

github-actions bot added the module:metrics label Jun 13, 2024

Tialo changed the title ~~ENH Array API for contingency matrix~~ ENH Array API for contingency_matrix Jun 13, 2024

ogrisel added the Array API label Jun 18, 2024

ogrisel reviewed Jul 8, 2024

View reviewed changes

ogrisel reviewed Oct 17, 2024

View reviewed changes

Tialo commented Oct 17, 2024

View reviewed changes

Tialo added 6 commits October 17, 2024 18:37

contingency_matrix

5f06d1f

int types tests

0390ea9

doc

75036b3

resolve v1

d92384b

format

e308ed6

rename dtype_name, remove float labels test

02d4e6d

Tialo force-pushed the array-api/contingency_matrix branch from f97c960 to 02d4e6d Compare October 17, 2024 15:41

Tialo added 7 commits October 17, 2024 20:15

add whatsnew entry

8b5b542

format

4d454c7

Merge branch 'main' of github.com:scikit-learn/scikit-learn into arra…

2794a8c

…y-api/contingency_matrix

remove sparse exception

ca7fa70

Merge branch 'main' of github.com:scikit-learn/scikit-learn into arra…

9bcf30a

…y-api/contingency_matrix

fix merge

e45834c

format whatsnew

c1f961a

Tialo added 4 commits May 16, 2025 19:39

Merge branch 'main' of github.com:scikit-learn/scikit-learn into arra…

712ad8f

…y-api/contingency_matrix

add import

ecf05d3

skip test if SCIPY_ARRAY_API is not 1

b8c46be

add xp.asarray

8e5aebb

lucyleeow mentioned this pull request May 22, 2025

Make more of the "tools" of scikit-learn Array API compatible #26024

Open

lucyleeow reviewed Jul 9, 2025

View reviewed changes

y.korobko added 10 commits August 7, 2025 00:30

Merge branch 'main' of github.com:Tialo/scikit-learn into array-api/c…

4b4ac28

…ontingency_matrix

Merge branch 'main' of github.com:scikit-learn/scikit-learn into arra…

06179af

…y-api/contingency_matrix

iter

20ac4dc

comment skipif

e3e9c41

todo

0787935

fix int_dtype atol

0141733

use skip_if_array_api_compat_not_configured

24f1c5d

remove unused function

f55d0f5

remove xp.asarray

2bbfeb4

use xp.asarray

7d7d58f

Uh oh!

ENH Array API for contingency_matrix #29251

Are you sure you want to change the base?

ENH Array API for contingency_matrix #29251

Uh oh!

Conversation

Tialo commented Jun 13, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

github-actions bot commented Jun 13, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✔️ Linting Passed

Uh oh!

EmilyXinyi commented Jun 18, 2024

Uh oh!

Tialo commented Jun 18, 2024

Uh oh!

ogrisel commented Jul 8, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ogrisel left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ogrisel Jul 8, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lucyleeow Jul 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Tialo Oct 17, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Tialo commented Dec 2, 2024

Uh oh!

lucyleeow commented Jul 4, 2025

Uh oh!

Tialo commented Jul 4, 2025

Uh oh!

lucyleeow left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lucyleeow commented Jul 21, 2025

Uh oh!

Tialo commented Jul 21, 2025

Uh oh!

Uh oh!

ENH Array API for `contingency_matrix` #29251

ENH Array API for `contingency_matrix` #29251

Tialo commented Jun 13, 2024 •

edited

Loading

github-actions bot commented Jun 13, 2024 •

edited

Loading

ogrisel commented Jul 8, 2024 •

edited

Loading

ogrisel Jul 8, 2024 •

edited

Loading

lucyleeow Jul 4, 2025 •

edited

Loading

Tialo Oct 17, 2024 •

edited

Loading