FIX Fix array API `train_test_split` #28407

betatim · 2024-02-12T14:46:53Z

Reference Issues/PRs

Follow up to #26855

What does this implement/fix? Explain your changes.

This fixes the array API implementation of train_test_split. There were a few parts of train_test_split that appeared to work but didn't actually.

Any other comments?

This includes all of #27904. Once it is merged this PR needs rebasing to remove those changes. The relevant changes are in the final commit of this PR.

github-actions · 2024-02-12T14:48:14Z

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

_{Generated for commit: 1419a66. Link to the linter CI: here}

adrinjalali

I quite like this. Thanks @betatim , but I'm a bit worried about how complicated things become in methods that used to seem quite short and easy to read.

sklearn/metrics/_regression.py

ogrisel

I tried to look at the train_test_split specific part of the PR but I did not find any non-regression test for this part.

Also, I assume that we might have a similar problem in cross_val_score, cross_validate and in the *SearchCV meta estimators.

sklearn/model_selection/_split.py

sklearn/utils/__init__.py

ogrisel

LGTM once the above comments are addressed.

betatim · 2024-03-13T12:54:00Z

Implemented ensure_common_namespace_device, for some reason I can't comment on your original review comment :-/

betatim · 2024-03-13T13:40:54Z

I tried to look at the train_test_split specific part of the PR but I did not find any non-regression test for this part.

I didn't add one because test_array_api_train_test_split already fails on main because of this. So I wasn't sure if we needed another test, or if what we really need is to run the existing tests :-/

ogrisel · 2024-03-13T15:07:34Z

I didn't add one because test_array_api_train_test_split already fails on main because of this.

... when running the tests with cupy installed, which is not yet the case on our current CI (see: #24491).

sklearn/utils/__init__.py

ogrisel

I pushed a new test to explicitly cover the missing lines in _determine_key_type but I think we should also add a test for train_test_split itself that checks the that returned arrays are actually of the expected container type and device.

EDIT: those tests already exist but I don't understand why codecov was complaining then... I checked that they are not always skipped on my local laptop...

sklearn/model_selection/tests/test_split.py::test_array_api_train_test_split[True-None-torch-cpu-float64] PASSED                                                                                     [  3%]
sklearn/model_selection/tests/test_split.py::test_array_api_train_test_split[True-None-torch-mps-float32] PASSED                                                                                     [  7%]
sklearn/model_selection/tests/test_split.py::test_array_api_train_test_split[True-stratify1-cupy.array_api-None-None] SKIPPED (cupy.array_api is not installed: not checking array_api input)        [ 11%]
sklearn/model_selection/tests/test_split.py::test_array_api_train_test_split[False-None-torch-mps-float32] PASSED                                                                                    [ 14%]
sklearn/model_selection/tests/test_split.py::test_array_api_train_test_split[True-stratify1-torch-cuda-float32] SKIPPED (PyTorch test requires cuda, which is not available)                         [ 18%]
sklearn/model_selection/tests/test_split.py::test_array_api_train_test_split[False-None-torch-cpu-float64] PASSED                                                                                    [ 22%]
sklearn/model_selection/tests/test_split.py::test_array_api_train_test_split[False-None-cupy.array_api-None-None] SKIPPED (cupy.array_api is not installed: not checking array_api input)            [ 25%]
sklearn/model_selection/tests/test_split.py::test_array_api_train_test_split[True-None-torch-cuda-float32] SKIPPED (PyTorch test requires cuda, which is not available)                              [ 29%]
sklearn/model_selection/tests/test_split.py::test_array_api_train_test_split[False-None-torch-cpu-float32] PASSED                                                                                    [ 33%]
sklearn/model_selection/tests/test_split.py::test_array_api_train_test_split[True-stratify1-torch-mps-float32] PASSED                                                                                [ 37%]
sklearn/model_selection/tests/test_split.py::test_array_api_train_test_split[True-None-torch-cpu-float32] PASSED                                                                                     [ 40%]
sklearn/model_selection/tests/test_split.py::test_array_api_train_test_split[True-None-torch-cuda-float64] SKIPPED (PyTorch test requires cuda, which is not available)                              [ 44%]
sklearn/model_selection/tests/test_split.py::test_array_api_train_test_split[True-None-numpy-None-None] PASSED                                                                                       [ 48%]
sklearn/model_selection/tests/test_split.py::test_array_api_train_test_split[True-stratify1-torch-cuda-float64] SKIPPED (PyTorch test requires cuda, which is not available)                         [ 51%]
sklearn/model_selection/tests/test_split.py::test_array_api_train_test_split[True-stratify1-torch-cpu-float64] PASSED                                                                                [ 55%]
sklearn/model_selection/tests/test_split.py::test_array_api_train_test_split[False-None-torch-cuda-float32] SKIPPED (PyTorch test requires cuda, which is not available)                             [ 59%]
sklearn/model_selection/tests/test_split.py::test_array_api_train_test_split[True-None-numpy.array_api-None-None] PASSED                                                                             [ 62%]
sklearn/model_selection/tests/test_split.py::test_array_api_train_test_split[False-None-torch-cuda-float64] SKIPPED (PyTorch test requires cuda, which is not available)                             [ 66%]
sklearn/model_selection/tests/test_split.py::test_array_api_train_test_split[False-None-numpy.array_api-None-None] PASSED                                                                            [ 70%]
sklearn/model_selection/tests/test_split.py::test_array_api_train_test_split[True-stratify1-torch-cpu-float32] PASSED                                                                                [ 74%]
sklearn/model_selection/tests/test_split.py::test_array_api_train_test_split[False-None-numpy-None-None] PASSED                                                                                      [ 77%]
sklearn/model_selection/tests/test_split.py::test_array_api_train_test_split[True-stratify1-numpy.array_api-None-None] PASSED                                                                        [ 81%]
sklearn/model_selection/tests/test_split.py::test_array_api_train_test_split[True-None-cupy.array_api-None-None] SKIPPED (cupy.array_api is not installed: not checking array_api input)             [ 85%]
sklearn/model_selection/tests/test_split.py::test_array_api_train_test_split[True-None-cupy-None-None] SKIPPED (cupy is not installed: not checking array_api input)                                 [ 88%]
sklearn/model_selection/tests/test_split.py::test_array_api_train_test_split[False-None-cupy-None-None] SKIPPED (cupy is not installed: not checking array_api input)                                [ 92%]
sklearn/model_selection/tests/test_split.py::test_array_api_train_test_split[True-stratify1-numpy-None-None] PASSED                                                                                  [ 96%]
sklearn/model_selection/tests/test_split.py::test_array_api_train_test_split[True-stratify1-cupy-None-None] SKIPPED (cupy is not installed: not checking array_api input)                            [100%]

ogrisel · 2024-03-13T17:13:51Z

We also need a changelog entry along the lines of "fix train_test_split" on CuPy arrays.

ogrisel

@betatim I pushed an extra test case to fix the coverage. I let you fix the conflicts.

ogrisel · 2024-03-14T09:42:29Z

I pushed again my test case after the move of the test function to the new file introduced in main.

ogrisel · 2024-03-14T11:26:46Z

The line not covered by our tests as reported by codecov should be executed when running Array API tests with libraries that do not support complex dtypes such as cupy.

ogrisel · 2024-03-14T11:27:25Z

@adrinjalali I think this PR is ready for a second review.

ogrisel · 2024-03-15T10:44:38Z

For information, I re-ran the current state of this PR with cupy and torch/cuda and all tests are green.

adrinjalali

Otherwise LGTM.

adrinjalali · 2024-03-15T14:09:50Z

sklearn/utils/tests/test_indexing.py

+            complex_array_key = xp.asarray([1 + 1j, 2 + 2j, 3 + 3j])
+        except TypeError:
+            # Complex numbers are not supported by all Array API libraries.
+            complex_array_key = None


seems like a legit codecov complaint.

This line is covered when running the tests with cupy, however this requires the cuda CI being designed at #24491.

But we won't be able to get coverage data on a weekly run though.

So shall we ignore the codecov comment for now or do we need to add a comment to mark it as ignored or something?

github-actions bot added module:metrics module:model_selection module:utils labels Feb 12, 2024

betatim force-pushed the fix-array-api-train_test_split branch from 5d062ee to 126c434 Compare February 12, 2024 15:05

adrinjalali reviewed Feb 13, 2024

View reviewed changes

sklearn/metrics/_regression.py Outdated Show resolved Hide resolved

sklearn/metrics/_regression.py Outdated Show resolved Hide resolved

ogrisel reviewed Feb 29, 2024

View reviewed changes

sklearn/model_selection/_split.py Outdated Show resolved Hide resolved

ogrisel mentioned this pull request Mar 3, 2024

MAINT: convert numpy.array_api to array-api-strict #28555

Merged

betatim force-pushed the fix-array-api-train_test_split branch from 126c434 to a796f33 Compare March 12, 2024 09:34

adrinjalali reviewed Mar 12, 2024

View reviewed changes

sklearn/model_selection/_split.py Outdated Show resolved Hide resolved

sklearn/utils/__init__.py Outdated Show resolved Hide resolved

sklearn/utils/__init__.py Outdated Show resolved Hide resolved

ogrisel reviewed Mar 12, 2024

View reviewed changes

sklearn/utils/__init__.py Outdated Show resolved Hide resolved

ogrisel approved these changes Mar 12, 2024

View reviewed changes

ogrisel reviewed Mar 13, 2024

View reviewed changes

sklearn/utils/__init__.py Outdated Show resolved Hide resolved

ogrisel requested changes Mar 13, 2024

View reviewed changes

betatim and others added 6 commits March 14, 2024 09:24

Fix train_test_split array API implementation

665cab0

Introduce ensure_common_namespace_device() to refactor

8ad944c

Remove 'and not _is_numpy_namespace(xp)' clause

5f94e73

TST test_determine_key_type_array_api

b166202

What's new

9014048

Clean up post rebase

e99be1a

ogrisel approved these changes Mar 14, 2024

View reviewed changes

betatim force-pushed the fix-array-api-train_test_split branch from c71fddd to e99be1a Compare March 14, 2024 09:37

TST: cover ValueError on complex keys for _determine_key_type

1419a66

adrinjalali reviewed Mar 15, 2024

View reviewed changes

adrinjalali merged commit e5a7c3e into scikit-learn:main Mar 18, 2024

betatim deleted the fix-array-api-train_test_split branch March 18, 2024 14:52

This was referenced Mar 21, 2024

Array API cupy fixes #27672

Closed

Array API support for cross_validation and friends #28677

Closed

Uh oh!

FIX Fix array API train_test_split #28407

FIX Fix array API train_test_split #28407

Uh oh!

Conversation

betatim commented Feb 12, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

github-actions bot commented Feb 12, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✔️ Linting Passed

Uh oh!

adrinjalali left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

ogrisel left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ogrisel left a comment

Choose a reason for hiding this comment

Uh oh!

betatim commented Mar 13, 2024

Uh oh!

betatim commented Mar 13, 2024

Uh oh!

ogrisel commented Mar 13, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

ogrisel left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ogrisel commented Mar 13, 2024

Uh oh!

ogrisel left a comment

Choose a reason for hiding this comment

Uh oh!

ogrisel commented Mar 14, 2024

Uh oh!

ogrisel commented Mar 14, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ogrisel commented Mar 14, 2024

Uh oh!

ogrisel commented Mar 15, 2024

Uh oh!

adrinjalali left a comment

Choose a reason for hiding this comment

Uh oh!

adrinjalali Mar 15, 2024

Choose a reason for hiding this comment

Uh oh!

ogrisel Mar 18, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

betatim Mar 18, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

FIX Fix array API `train_test_split` #28407

FIX Fix array API `train_test_split` #28407

betatim commented Feb 12, 2024 •

edited

Loading

github-actions bot commented Feb 12, 2024 •

edited

Loading

ogrisel commented Mar 13, 2024 •

edited

Loading

ogrisel left a comment •

edited

Loading

ogrisel commented Mar 14, 2024 •

edited

Loading

ogrisel Mar 18, 2024 •

edited

Loading

betatim Mar 18, 2024 •

edited

Loading