Use Array API in `r2_score` #27102

elindgren · 2023-08-18T12:25:28Z

Reference Issues/PRs

One of the items outlined in #26024.

What does this implement/fix? Explain your changes.

Migrates r2_score to use the Array API as outlined in #26024.

This PR also introduces the function _average that mimics the functionality of np.average for weighted averages, as that is not in the Array API spec. _average can be found under utils/_array_api.py.

Any other comments?

None

github-actions · 2023-08-18T12:27:15Z

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

_{Generated for commit: 1bf557d. Link to the linter CI: here}

sklearn/metrics/_regression.py

ogrisel · 2023-08-18T14:29:49Z

sklearn/metrics/_regression.py

-    denominator = (
-        weight * (y_true - np.average(y_true, axis=0, weights=sample_weight)) ** 2
-    ).sum(axis=0, dtype=np.float64)
+    numerator = xp.sum(weight * (y_true - y_pred) ** 2, axis=0, dtype=xp.float64)


Since the end goal is to compute a scalar and we want float64 precision but some namespace / device combinations might not support this (e.g. pytorch / MPS only support float32 operations), I think we need to move to CPU before computing the float64 sum.

See the following for a similar case:

https://github.com/scikit-learn/scikit-learn/pull/27098/files#diff-6264adab84df6fafdb2d950141783b52b718da1e1be773daae7b5921b2556f94R446-R448

Note that this will make the computation slightly less efficient for devices that do support float64 arithmetic, but maybe this is a pragmatic fix.

Alternatively, we could attempt to write a utility in _array_api.py that dynamically inspects if a given namespace / device combo can support float64 operation (and cache the result) so as to trigger the to device="cpu" copy only in cases when it's actually needed.

I implemented this, but I ran into issues with CuPy. Apparently, device="cpu" does not work, see the below error message. I made a hacky workaround for this that keeps data on the GPU only for CuPy, but I'm not happy with it. Any ideas?

The hacky workaround in question:

https://github.com/elindgren/scikit-learn/blob/a4dd5944c8c88f2b321ee31c3ca835dc39ca99c6/sklearn/metrics/_regression.py#L1024-L1026

I cherry-picked @betatim's solution from #27232 to solve this issue. Now the result is automatically cast to the most accurate float that the platform supports.

ogrisel · 2023-08-19T06:59:12Z

We probably need a new entry in doc/whats_new/v1.4.rst to document the enhancement to r2_score.

sklearn/utils/_array_api.py

sklearn/metrics/_regression.py

sklearn/metrics/tests/test_regression.py

ogrisel · 2023-09-07T14:06:01Z

@elindgren I merged #27137 and synced your branch with main. I think there are things that can be consolidated, with respect to testing and documenting the changes in the changelog for instance.

elindgren · 2023-09-08T15:20:50Z

@elindgren I merged #27137 and synced your branch with main. I think there are things that can be consolidated, with respect to testing and documenting the changes in the changelog for instance.

Thanks! Sorry for the radio silence, picking this back up.

sklearn/metrics/tests/test_regression.py

sklearn/metrics/_regression.py

Co-authored-by: Tim Head <betatim@gmail.com>

Some Array API compatible libraries do not have a device called 'cpu'. Instead we try and detect the lib+device combination that does not support float64.

fcharras · 2023-12-05T15:27:53Z

I've opened a follow-up PR at #27904

(of course it keeps your commits and includes your name in the changelog)

It is hard to contribute when the guidelines and signaling are unclear.

Yet you've done well and your work has been very useful. I can also see now how awful of a maze it is when beginning to tackle dtype and device management. Can't say for sure the solutions I'm proposing in #27904 are more likeable.

…nto ENH/r2_score_array_api

… rather than from_dlpack

…nto ENH/r2_score_array_api

…on.py

…nto ENH/r2_score_array_api

…se fast code paths do not generalized to nd inputs

ogrisel · 2024-01-10T10:27:21Z

Closing this PR in favor of #27904 which includes the original commits of this PR and credits the work of @elindgren in the changelog. I think we are converging on a consensus.

update r2 score to use the array API, and write initial tests

e0429db

github-actions bot added module:metrics module:utils labels Aug 18, 2023

elindgren changed the title ~~update r2 score to use the array API, and write initial tests~~ Update r2 score to use the array API Aug 18, 2023

elindgren changed the title ~~Update r2 score to use the array API~~ Use Array API in r2_score Aug 18, 2023

Merge remote-tracking branch 'upstream/main'

b9c1720

elindgren mentioned this pull request Aug 18, 2023

Make more of the "tools" of scikit-learn Array API compatible #26024

Open

ogrisel added the Array API label Aug 19, 2023

ogrisel reviewed Aug 19, 2023

View reviewed changes

Merge branch 'main' into ENH/r2_score_array_api

5666ce5

betatim reviewed Aug 22, 2023

View reviewed changes

sklearn/utils/_array_api.py Outdated Show resolved Hide resolved

betatim reviewed Aug 22, 2023

View reviewed changes

sklearn/metrics/_regression.py Outdated Show resolved Hide resolved

betatim reviewed Aug 22, 2023

View reviewed changes

sklearn/metrics/tests/test_regression.py Outdated Show resolved Hide resolved

betatim reviewed Aug 22, 2023

View reviewed changes

sklearn/metrics/tests/test_regression.py Outdated Show resolved Hide resolved

Merge branch 'main' into ENH/r2_score_array_api

4580d1c

Fix some review comments and move stuff to CPU

a4dd594

ogrisel reviewed Sep 11, 2023

View reviewed changes

sklearn/metrics/tests/test_regression.py Outdated Show resolved Hide resolved

ogrisel reviewed Sep 11, 2023

View reviewed changes

sklearn/metrics/tests/test_regression.py Outdated Show resolved Hide resolved

Add regression tests to the test_common framework

adc7680

betatim reviewed Sep 29, 2023

View reviewed changes

sklearn/metrics/tests/test_regression.py Outdated Show resolved Hide resolved

sklearn/metrics/tests/test_regression.py Outdated Show resolved Hide resolved

sklearn/metrics/_regression.py Outdated Show resolved Hide resolved

elindgren and others added 6 commits October 5, 2023 10:21

Update sklearn/metrics/tests/test_regression.py

85469a9

Co-authored-by: Tim Head <betatim@gmail.com>

Update sklearn/metrics/tests/test_regression.py

b7efaa5

Co-authored-by: Tim Head <betatim@gmail.com>

Remove hardcoded device choice in _weighted_sum

ac533c2

Some Array API compatible libraries do not have a device called 'cpu'. Instead we try and detect the lib+device combination that does not support float64.

Factor out max float precision determination

35be22e

Use convenience function to find highest accuracy float in r2_score

7c53e19

add tests for _average for Array API

230ae46

WIP: solving dtype and device maze

93257ba

fcharras mentioned this pull request Dec 5, 2023

ENH Use Array API in r2_score #27904

Merged

fcharras and others added 25 commits December 5, 2023 16:29

Fix changelog conflict

45bbe4e

Tests fixups

2145a6b

Tests fixups

bd4b224

Merge branch 'main' of https://github.com/scikit-learn/scikit-learn i…

34aceb1

…nto ENH/r2_score_array_api

Fix dtype parameterization in common metric tests

56d5308

Tests fixups

75cb3f3

Tests fixups

d9fff24

Adds lru_cache on device inspection function + user _convert_to_numpy…

d72137c

… rather than from_dlpack

Adequatly define hash of _ArrayAPIWrapper to avoid wrong equality

16ab95f

Merge branch 'main' of https://github.com/scikit-learn/scikit-learn i…

9862a85

…nto ENH/r2_score_array_api

Remove _weighted_sum and only use _average

143ce54

Merge branch 'main' of https://github.com/scikit-learn/scikit-learn i…

4e9401b

…nto ENH/r2_score_array_api

Linting on unrelated diff, pre-commit broken ? + fixes

2b095c4

Merge branch 'main' into ENH/r2_score_array_api

42f5d8d

re add faster, simpler code branch for _weighted_sum in _classificati…

ff0b860

…on.py

re add faster, simpler code branch for _weighted_sum in _classificati…

efe36f3

…on.py

Merge branch 'main' of https://github.com/scikit-learn/scikit-learn i…

abb9ee9

…nto ENH/r2_score_array_api

fix

08f5433

fix tests with torch+cuda

38f56af

fix tests with torch+cuda

c09a84b

Merge branch 'main' into ENH/r2_score_array_api

13d9bd6

FIX: always pass xp to _convert_to_numpy calls

c32fa92

FIX also update device_ in case of numpy fallback

1555f8d

FIX pass xp to _convert_to_numpy instead of copy=True

fc1b9f1

Rename _weighted_sum to _weighted_sum_1d to make it explicit that tho…

1bf557d

…se fast code paths do not generalized to nd inputs

ogrisel closed this Jan 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use Array API in `r2_score` #27102

Use Array API in `r2_score` #27102

elindgren commented Aug 18, 2023

github-actions bot commented Aug 18, 2023 •

edited

Loading

ogrisel Aug 18, 2023

elindgren Sep 8, 2023

elindgren Oct 5, 2023

ogrisel commented Aug 19, 2023

ogrisel commented Sep 7, 2023

elindgren commented Sep 8, 2023

fcharras commented Dec 5, 2023

ogrisel commented Jan 10, 2024

Use Array API in r2_score #27102

Use Array API in r2_score #27102

Conversation

elindgren commented Aug 18, 2023

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

github-actions bot commented Aug 18, 2023 • edited Loading

✔️ Linting Passed

ogrisel Aug 18, 2023

Choose a reason for hiding this comment

elindgren Sep 8, 2023

Choose a reason for hiding this comment

elindgren Oct 5, 2023

Choose a reason for hiding this comment

ogrisel commented Aug 19, 2023

ogrisel commented Sep 7, 2023

elindgren commented Sep 8, 2023

fcharras commented Dec 5, 2023

ogrisel commented Jan 10, 2024

Use Array API in `r2_score` #27102

Use Array API in `r2_score` #27102

github-actions bot commented Aug 18, 2023 •

edited

Loading