ENH Add Array API compatibility for `entropy` #29141

Tialo · 2024-05-30T19:51:36Z

Towards #26024

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

…y-api/entropy

github-actions · 2024-05-30T19:52:52Z

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

_{Generated for commit: be5fe2f. Link to the linter CI: here}

ogrisel

Thanks for the PR. By the look of it there is still significant work to do, see:

doc/whats_new/v1.6.rst

sklearn/metrics/cluster/_supervised.py

ogrisel · 2024-06-06T10:03:57Z

sklearn/metrics/cluster/tests/test_supervised.py

+    labels1 = xp.asarray([0, 0, 42.0])
+    labels2 = xp.asarray([])
+    labels3 = xp.asarray([1, 1, 1, 1])


Let's be explicit about dtypes and let's actually test on the device returned by the fixture:

Suggested change

labels1 = xp.asarray([0, 0, 42.0])

labels2 = xp.asarray([])

labels3 = xp.asarray([1, 1, 1, 1])

float_labels = xp.asarray([0, 0, 42.0], device=device, dtype=dtype_name)

empty_int32_labels = xp.asarray([], dtype="int32", device=device)

int_labels = xp.asarray([1, 1, 1, 1], device=device)

the rest of the test will need to be updated accordingly (along with the code, I think).

You can use https://gist.github.com/EdAbati/ff3bdc06bafeb92452b3740686cc8d7c to launch this test on a machine with pytorch running on a non-CPU device.

What do you think about adding integer dtype_names in yield_namespace_device_dtype_combinations? Right now there are many asserts that do not depend on dtype_name, thus they won't check anything new. E.g. assert for empty_int32_labels with xp=torch, device=cpu, dtype_name=float64 and xp=torch, device=cpu, dtype_name=float32.

Adding integer dtypes could help remove such repetitions.
For example, this test could be rewritten as

@pytest.mark.parametrize( "array_namespace, device, dtype_name", yield_namespace_device_dtype_combinations(yield_integers=True), ) def test_entropy_array_api(array_namespace, device, dtype_name): xp = _array_api_for_tests(array_namespace, device) labels = xp.asarray(np.asarray([0, 0, 42.0], dtype=dtype_name), device=device) empty_labels = xp.asarray(np.asarray([], dtype=dtype_name), device=device) constant_labels = xp.asarray(np.asarray([1, 1, 1, 1], dtype=dtype_name), device=device) with config_context(array_api_dispatch=True): assert_almost_equal(entropy(labels), 0.6365141, 5) assert entropy(empty_labels) == 1 assert entropy(constant_labels) == 0

If it makes sense I can open separate issue for discussion.

I don't want to have a combinatory explosion of test cases where most cases are not likely to yield interesting things to test.

For tests that require integer dtypes, we can just use either xp.int32 or xp.int64 that should be enough for most scikit-learn use cases and are supported by all known platforms.

For floating values, it's interesting to test with xp.float32 and xp.float64 (when possible) because most GPUs work best in xp.float32 and sometimes do not support xp.float64 which is the default dtype of numpy, hence the special need to test for both when possible, conditionally on the choice of namespace and device.

sklearn/metrics/cluster/tests/test_supervised.py

…y-api/entropy

ogrisel

Thanks for the update, almost LGTM! See below:

ogrisel · 2024-06-07T07:39:00Z

sklearn/metrics/cluster/_supervised.py

-    pi = np.bincount(label_idx).astype(np.float64)
-    pi = pi[pi > 0]
+
+    pi = xp.astype(xp.unique_counts(labels)[1], xp.float64)


Nice code simplification by the way :)

sklearn/metrics/cluster/tests/test_supervised.py

sklearn/metrics/cluster/_supervised.py

Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>

sklearn/metrics/cluster/tests/test_supervised.py

ogrisel

Once the above two comments are dealt with and assuming tests still pass, LGTM. Thanks for the PR.

…o array-api/entropy

ogrisel

LGTM, thanks for the PR!

OmarManzoor

LGTM. Thanks @Tialo

Tialo added 3 commits May 25, 2024 02:04

array-api for entropy

3034d2e

iter

149487c

Merge branch 'main' of github.com:scikit-learn/scikit-learn into arra…

81c534d

…y-api/entropy

github-actions bot added the module:metrics label May 30, 2024

Tialo added 4 commits May 30, 2024 22:53

whatsnew

76f574f

dont convert to numpy

ed3ca4f

remove remove_types

0eb7015

underscore unused variable

616426c

ogrisel added the Array API label Jun 5, 2024

ogrisel requested changes Jun 6, 2024

View reviewed changes

Tialo added 6 commits June 6, 2024 14:53

Merge branch 'main' of github.com:scikit-learn/scikit-learn into arra…

c2d028b

…y-api/entropy

sorted

8e07021

do not convert to numpy

eaa644b

use dtype_name

f02e736

merge

8d2502b

array_api.rst

a6fe06c

ogrisel reviewed Jun 7, 2024

View reviewed changes

Update sklearn/metrics/cluster/_supervised.py

cacf3b4

Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>

ogrisel reviewed Jun 7, 2024

View reviewed changes

sklearn/metrics/cluster/tests/test_supervised.py Outdated Show resolved Hide resolved

ogrisel reviewed Jun 7, 2024

View reviewed changes

sklearn/metrics/cluster/tests/test_supervised.py Show resolved Hide resolved

ogrisel approved these changes Jun 7, 2024

View reviewed changes

ogrisel added the Waiting for Second Reviewer First reviewer is done, need a second one! label Jun 7, 2024

Tialo added 2 commits June 12, 2024 14:59

tests

bf5c517

Merge branch 'array-api/entropy' of github.com:Tialo/scikit-learn int…

8b0d904

…o array-api/entropy

ogrisel approved these changes Jun 12, 2024

View reviewed changes

ogrisel and others added 2 commits June 12, 2024 16:50

Merge branch 'main' into array-api/entropy

432b3f0

Merge branch 'main' into array-api/entropy

be5fe2f

OmarManzoor approved these changes Jun 14, 2024

View reviewed changes

OmarManzoor removed the Waiting for Second Reviewer First reviewer is done, need a second one! label Jun 14, 2024

OmarManzoor enabled auto-merge (squash) June 14, 2024 07:10

OmarManzoor merged commit 5ced13c into scikit-learn:main Jun 14, 2024
29 checks passed

Tialo deleted the array-api/entropy branch June 14, 2024 08:02

jeremiedbb mentioned this pull request Jul 2, 2024

Release 1.5.1 #29382

Merged

11 tasks

OmarManzoor mentioned this pull request Sep 18, 2024

Make more of the "tools" of scikit-learn Array API compatible #26024

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

ENH Add Array API compatibility for `entropy` #29141

ENH Add Array API compatibility for `entropy` #29141

Uh oh!

Tialo commented May 30, 2024

Uh oh!

github-actions bot commented May 30, 2024 •

edited

Loading

Uh oh!

ogrisel left a comment

Uh oh!

Uh oh!

Uh oh!

ogrisel Jun 6, 2024

Uh oh!

Tialo Jun 6, 2024

Uh oh!

ogrisel Jun 7, 2024 •

edited

Loading

Uh oh!

Uh oh!

ogrisel left a comment

Uh oh!

ogrisel Jun 7, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ogrisel left a comment

Uh oh!

ogrisel left a comment

Uh oh!

OmarManzoor left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ENH Add Array API compatibility for entropy #29141

ENH Add Array API compatibility for entropy #29141

Uh oh!

Conversation

Tialo commented May 30, 2024

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

github-actions bot commented May 30, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✔️ Linting Passed

Uh oh!

ogrisel left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

ogrisel Jun 6, 2024

Choose a reason for hiding this comment

Uh oh!

Tialo Jun 6, 2024

Choose a reason for hiding this comment

Uh oh!

ogrisel Jun 7, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ogrisel left a comment

Choose a reason for hiding this comment

Uh oh!

ogrisel Jun 7, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ogrisel left a comment

Choose a reason for hiding this comment

Uh oh!

ogrisel left a comment

Choose a reason for hiding this comment

Uh oh!

OmarManzoor left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

ENH Add Array API compatibility for `entropy` #29141

ENH Add Array API compatibility for `entropy` #29141

github-actions bot commented May 30, 2024 •

edited

Loading

ogrisel Jun 7, 2024 •

edited

Loading