ENH Use Array API in `r2_score` #27904

fcharras · 2023-12-05T15:23:00Z

Reference Issues/PRs

The PR builds on preliminary explorations done by @elindgren in #27102

It tackles one of the items outlined in #26024.

Any other comments?

This PR proposes to fallbacks to cpu+numpy at the very beginning of the r2_score function whenever the array namespace and the device can't handle float64 precision, because explicit castings to float64 are unavoidable and are used in a lot of steps.

It also proposes improved ways to detect device support for dtypes, and uses it to act accordingly in r2_score and _average, but also updates weighted_sum function.

Co-authored-by: Tim Head <betatim@gmail.com>

Some Array API compatible libraries do not have a device called 'cpu'. Instead we try and detect the lib+device combination that does not support float64.

…27073)

…test_affinity_propagation` (scikit-learn#27095) Signed-off-by: Julien Jerphanion <git@jjerphan.xyz> Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>

…argument (scikit-learn#26814)

github-actions · 2023-12-05T15:24:16Z

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

_{Generated for commit: 457531e. Link to the linter CI: here}

…nto ENH/r2_score_array_api

sklearn/utils/tests/test_array_api.py

ogrisel · 2024-03-08T08:32:06Z

@betatim @fcharras @adrinjalali I think this is ready for review.

betatim

Some comments. Haven't looked at the tests yet but the rest looks nice.

sklearn/metrics/_classification.py

sklearn/utils/_array_api.py

sklearn/metrics/_regression.py

sklearn/utils/_array_api.py

betatim · 2024-03-08T16:52:20Z

sklearn/utils/_array_api.py

+    a = xp.astype(a, output_dtype)
+
+    if weights is None:
+        return (xp.mean if normalize else xp.sum)(a, axis=axis)


😮

Kinda cool that we can do this in Python, but also strong stuff :D

sklearn/utils/_array_api.py

Co-authored-by: Tim Head <betatim@gmail.com>

adrinjalali

Nice. Other than the nits, this looks quite good to me now.

sklearn/metrics/_classification.py

sklearn/metrics/_regression.py

adrinjalali

Thanks @ogrisel

ogrisel · 2024-03-11T12:58:11Z

Thanks for the reviews @adrinjalali and @betatim. This PR has been much simplified as a result of those reviews.

doc/modules/array_api.rst

betatim

LGTM

…th array dtype

betatim · 2024-03-11T16:04:38Z

What a mission this PR was! Thanks everyone who helped, I think it was worth the effort and wait :D

fcharras · 2024-03-14T11:28:25Z

Thanks everyone for continuing this PR, I now caught up with latest diff and I'm also a happy bunny.

I want to mention 2 differences I think I've spotted between the state where I had left the branch and what has been merged:

in numpy an operation on array inputs of mixed dtypes that include int and float will always result in float64. (I think the rationale is that float64 is better because it's less likely to have issues with large ints) The behavior with array api dispatch that has been merged here is so that it will instead result in an output with the default float dtype (so e.g float32 with torch) rather than forcing float64 or keeping int dtype. Which is most likely fine for scikit learn usecases and simpler overall.
the error messages in _average where mimicking those of np.average, the merged _average have improved error messages but diverge slightly from np.average in this regard now. Which is ok but I just wanted to mention the reason the error messages had this original shape to begin with.

I'm happy that I got to learn the existence of xp.result_type 👍 .

We did make very conservative choices on this PR initially and in the end that was a source of several iterations, I'll try to be better at thinking at what is actually needed in scikit-learn and the cost in complexity.

As it has been pointed out already, I suspect some of the tools that we introduced then dropped during the PR (like _support_dtype which is still somewhere in the history of commits) will end up being necessary in other PRs ?

Last word, we had initiated documenting the policy when dealing with array api dispatch with no float64 support at #28034 , now I'm a bit unsure if #27904 (comment) had everyone aligned or if it moved to something a bit different in this PR, I'll try to sum up again and update it.

elindgren and others added 20 commits August 18, 2023 14:16

update r2 score to use the array API, and write initial tests

e0429db

Merge remote-tracking branch 'upstream/main'

b9c1720

Merge branch 'main' into ENH/r2_score_array_api

5666ce5

Merge branch 'main' into ENH/r2_score_array_api

4580d1c

Fix some review comments and move stuff to CPU

a4dd594

Add regression tests to the test_common framework

adc7680

Update sklearn/metrics/tests/test_regression.py

85469a9

Co-authored-by: Tim Head <betatim@gmail.com>

Update sklearn/metrics/tests/test_regression.py

b7efaa5

Co-authored-by: Tim Head <betatim@gmail.com>

Remove hardcoded device choice in _weighted_sum

ac533c2

Some Array API compatible libraries do not have a device called 'cpu'. Instead we try and detect the lib+device combination that does not support float64.

Factor out max float precision determination

35be22e

Use convenience function to find highest accuracy float in r2_score

7c53e19

add tests for _average for Array API

230ae46

MNT Ignore ruff errors (scikit-learn#27094)

e4672d1

DOC fix docstring for sklearn.datasets.get_data_home (scikit-learn#…

8ba9485

…27073)

TST Extend tests for scipy.sparse.*array in `sklearn/cluster/tests/…

490e0b4

…test_affinity_propagation` (scikit-learn#27095) Signed-off-by: Julien Jerphanion <git@jjerphan.xyz> Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>

MNT Remove DeprecationWarning for scipy.sparse.linalg.cg tol vs rtol …

a8a820c

…argument (scikit-learn#26814)

Merge branch 'main' into ENH/r2_score_array_api

552e421

Merge remote-tracking branch 'upstream/main' into ENH/r2_score_array_api

ff52710

remove temporary file

fe9cc1c

WIP: solving dtype and device maze

93257ba

github-actions bot added module:metrics module:utils labels Dec 5, 2023

fcharras mentioned this pull request Dec 5, 2023

Use Array API in r2_score #27102

Closed

fcharras added 4 commits December 5, 2023 16:29

Fix changelog conflict

45bbe4e

Tests fixups

2145a6b

Tests fixups

bd4b224

Merge branch 'main' of https://github.com/scikit-learn/scikit-learn i…

34aceb1

…nto ENH/r2_score_array_api

fcharras marked this pull request as ready for review December 6, 2023 11:28

fcharras mentioned this pull request Dec 6, 2023

ENH Remove hardcoded device choice in _weighted_sum #27232

Closed

ogrisel reviewed Mar 8, 2024

View reviewed changes

sklearn/utils/tests/test_array_api.py Outdated Show resolved Hide resolved

Small fixes in comments and remove duplicated lines.

ec84e44

ogrisel added 2 commits March 8, 2024 14:42

One more get_namespace simplification

08405a5

Remove useless import added by vs code...

a09866d

betatim reviewed Mar 8, 2024

View reviewed changes

ogrisel and others added 4 commits March 10, 2024 22:15

Apply suggestions from code review

b59a7be

Co-authored-by: Tim Head <betatim@gmail.com>

Rename _skip_non_arrays to _remove_non_arrays & co

ef1631b

Remove custom __hash__ method that is no longer needed

388d670

Remove redundant calls to xp.astype

8042795

adrinjalali reviewed Mar 11, 2024

View reviewed changes

sklearn/metrics/_classification.py Outdated Show resolved Hide resolved

sklearn/metrics/_classification.py Outdated Show resolved Hide resolved

sklearn/metrics/_regression.py Show resolved Hide resolved

ogrisel added 7 commits March 11, 2024 10:08

Factorize the if xp is None: xp, _ = get_namespace(inputs) pattern

92af1a8

Fix handling of xp is not None in get_namespace

47fed64

get_namespace in _weighted_sum_1d

3699353

Merge _weighted_sum_1d into _average

c2b4b11

One final 'if xp is None' occurrence

9c2d9ac

DOC be explicit about return types

90076d3

Merge branch 'main' into ENH/r2_score_array_api

3cc74e5

adrinjalali approved these changes Mar 11, 2024

View reviewed changes

betatim reviewed Mar 11, 2024

View reviewed changes

doc/modules/array_api.rst Outdated Show resolved Hide resolved

betatim approved these changes Mar 11, 2024

View reviewed changes

Update phrasing in the doc to avoid confusing array container type wi…

457531e

…th array dtype

ogrisel enabled auto-merge (squash) March 11, 2024 15:19

ogrisel merged commit 612d93d into scikit-learn:main Mar 11, 2024

fcharras added a commit to fcharras/scikit-learn that referenced this pull request Mar 15, 2024

wip: merge main after merge of scikit-learn#27904

046cf00

ogrisel mentioned this pull request Jun 7, 2024

Make more of the "tools" of scikit-learn Array API compatible #26024

Open

grovduck mentioned this pull request Jul 10, 2024

Fix r2_score port test lemma-osu/sknnr#70

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH Use Array API in `r2_score` #27904

ENH Use Array API in `r2_score` #27904

fcharras commented Dec 5, 2023 •

edited

Loading

github-actions bot commented Dec 5, 2023 •

edited

Loading

ogrisel commented Mar 8, 2024

betatim left a comment

betatim Mar 8, 2024

adrinjalali left a comment

adrinjalali left a comment

ogrisel commented Mar 11, 2024

betatim left a comment

betatim commented Mar 11, 2024

fcharras commented Mar 14, 2024 •

edited

Loading

ENH Use Array API in r2_score #27904

ENH Use Array API in r2_score #27904

Conversation

fcharras commented Dec 5, 2023 • edited Loading

Reference Issues/PRs

Any other comments?

github-actions bot commented Dec 5, 2023 • edited Loading

✔️ Linting Passed

ogrisel commented Mar 8, 2024

betatim left a comment

Choose a reason for hiding this comment

betatim Mar 8, 2024

Choose a reason for hiding this comment

adrinjalali left a comment

Choose a reason for hiding this comment

adrinjalali left a comment

Choose a reason for hiding this comment

ogrisel commented Mar 11, 2024

betatim left a comment

Choose a reason for hiding this comment

betatim commented Mar 11, 2024

fcharras commented Mar 14, 2024 • edited Loading

ENH Use Array API in `r2_score` #27904

ENH Use Array API in `r2_score` #27904

fcharras commented Dec 5, 2023 •

edited

Loading

github-actions bot commented Dec 5, 2023 •

edited

Loading

fcharras commented Mar 14, 2024 •

edited

Loading