array API support for mean_absolute_percentage_error #29300

EmilyXinyi · 2024-06-19T13:33:07Z

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Adds array API support for mean_absolute_percentage_error

Any other comments?

Keep this as draft until I add PR number and CUDA is green
Failing CI: I ran the command that triggers the failing test cases locally (pytest --durations=20 --junitxml=test-data.xml --pyargs sklearn) but they all pass. I am not sure what contributes to the difference in behaviour between our pipeline and my local tests...

github-actions · 2024-06-19T13:34:26Z

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

_{Generated for commit: 74224cb. Link to the linter CI: here}

EmilyXinyi · 2024-06-19T13:52:39Z

CUDA: https://colab.research.google.com/drive/1SKzB8XaT2j_j4j7S-w4W6Du3DCo1I2B2?usp=sharing
EDIT: launching CUDA after commit 6c21075
CUDA failed on the follow 8 test cases:
FAILED sklearn/metrics/tests/test_common.py::test_array_api_compliance[accuracy_score-check_array_api_multilabel_classification_metric-cupy-None-None] - ValueError: unrecognized csr_matrix constructor usage
FAILED sklearn/metrics/tests/test_common.py::test_array_api_compliance[accuracy_score-check_array_api_multilabel_classification_metric-cupy.array_api-None-None] - TypeError: bool is only allowed on arrays with 0 dimensions
FAILED sklearn/metrics/tests/test_common.py::test_array_api_compliance[accuracy_score-check_array_api_multilabel_classification_metric-torch-cuda-float64] - ValueError: unrecognized csr_matrix constructor usage
FAILED sklearn/metrics/tests/test_common.py::test_array_api_compliance[accuracy_score-check_array_api_multilabel_classification_metric-torch-cuda-float32] - ValueError: unrecognized csr_matrix constructor usage
FAILED sklearn/metrics/tests/test_common.py::test_array_api_compliance[zero_one_loss-check_array_api_multilabel_classification_metric-cupy-None-None] - ValueError: unrecognized csr_matrix constructor usage
FAILED sklearn/metrics/tests/test_common.py::test_array_api_compliance[zero_one_loss-check_array_api_multilabel_classification_metric-cupy.array_api-None-None] - TypeError: bool is only allowed on arrays with 0 dimensions
FAILED sklearn/metrics/tests/test_common.py::test_array_api_compliance[zero_one_loss-check_array_api_multilabel_classification_metric-torch-cuda-float64] - ValueError: unrecognized csr_matrix constructor usage
FAILED sklearn/metrics/tests/test_common.py::test_array_api_compliance[zero_one_loss-check_array_api_multilabel_classification_metric-torch-cuda-float32] - ValueError: unrecognized csr_matrix constructor usage

ogrisel

Besides the following, LGTM.

sklearn/metrics/_regression.py

ogrisel

LGTM once the merge conflicts are resolved.

EdAbati

I left a comment regarding mps.

I also noticed that other tests seems to fail also in main:

FAILED sklearn/metrics/cluster/tests/test_supervised.py::test_entropy_array_api[torch-mps-float32] - TypeError: Cannot convert a MPS Tensor to float64 dtype as the MPS framework doesn't support float64. Please use float32 instead.
FAILED sklearn/metrics/tests/test_common.py::test_array_api_compliance[accuracy_score-check_array_api_multilabel_classification_metric-torch-mps-float32] - ValueError: unrecognized csr_matrix constructor usage
FAILED sklearn/metrics/tests/test_common.py::test_array_api_compliance[zero_one_loss-check_array_api_multilabel_classification_metric-torch-mps-float32] - ValueError: unrecognized csr_matrix constructor usage

is anyone having a look at these? (or I can try to help)

doc/whats_new/v1.6.rst

EdAbati · 2024-06-20T18:54:36Z

sklearn/metrics/_regression.py

+    epsilon = xp.asarray(xp.finfo(xp.float64).eps, dtype=xp.float64)
+    y_true_abs = xp.asarray(xp.abs(y_true), dtype=xp.float64)
+    mape = xp.asarray(xp.abs(y_pred - y_true), dtype=xp.float64) / xp.where(
+        epsilon < y_true_abs, y_true_abs, epsilon
+    )


float64 does not work with Torch on mps unfortunately. One way to get the max supported float precision could be xp.asarray(0.0).dtype as this function does . Can you think of a better way?

I might have missed something, why do we need cast to float64 each time we run abs?

Good catch. The original code did not upcast to float64 so we should not either.

Thank you thank you! I also noticed the CUDA failures and I could not fathom why that is apart from #29269. It will make the suggested code changes and hopefully everything passes after #29321 is merged. (PS. everything has always been passing locally)

I might have missed something, why do we need cast to float64 each time we run abs?

btw the casting is for xp.asarray, not abs

ogrisel · 2024-06-20T20:59:20Z

is anyone having a look at these? (or I can try to help)

@EdAbati feel free to have a look and open a PR to fix those if you have time.

ogrisel · 2024-06-28T14:36:02Z

@EmilyXinyi I merged main into this PR to check how many CI failures would remain after that.

ogrisel · 2024-07-01T07:23:25Z

sklearn/metrics/_regression.py

+    epsilon = xp.asarray(xp.finfo(xp.float64).eps, dtype=xp.asarray(0.0).dtype)
+    y_true_abs = xp.asarray(xp.abs(y_true), dtype=xp.asarray(0.0).dtype)
+    mape = xp.asarray(xp.abs(y_pred - y_true), dtype=xp.asarray(0.0).dtype) / xp.where(


I would rather use the floating point dtype used in the regressor's predictions than a device dependent dtype.

Suggested change

epsilon = xp.asarray(xp.finfo(xp.float64).eps, dtype=xp.asarray(0.0).dtype)

y_true_abs = xp.asarray(xp.abs(y_true), dtype=xp.asarray(0.0).dtype)

mape = xp.asarray(xp.abs(y_pred - y_true), dtype=xp.asarray(0.0).dtype) / xp.where(

epsilon = xp.asarray(xp.finfo(xp.float64).eps, dtype=y_pred.dtype)

y_true_abs = xp.asarray(xp.abs(y_true), dtype=y_pred.dtype)

mape = xp.asarray(xp.abs(y_pred - y_true), dtype=y_pred.dtype) / xp.where(

Using y_pred.dtype can't get past the test cases and I think this is because y_pred.dtype could be array_api_strict.int64, but only floating-point types are allowed in __truediv__ (which means division, I believe). As well, MAPE behaves likes a symmetric function instead of asymmetric if we use y_pred.dtype, which I suspect is due to the reduced accuracy in cases where dtype is int64.

I am not sure if there is a better way to approach this... Any suggestions would be greatly appreciated, thank you!! :)

We have a function defined _find_matching_floating_dtype in the array api utils. Maybe you could use that to get the float dtype once and then use it in all three places?

Thank you @OmarManzoor I have made the changes accordingly

OmarManzoor

Otherwise LGTM. Thanks @EmilyXinyi

sklearn/metrics/_regression.py

…age_error

lesteve · 2024-07-12T09:57:47Z

FYI I tested the CUDA CI label after #29456 was merged and it looks like this is working fine:

the CUDA CI label was removed automatically by the bot after I added it
the CUDA CI workflow started: https://github.com/scikit-learn/scikit-learn/actions/runs/9905987292/job/27366659543?pr=29300

OmarManzoor · 2024-07-12T10:06:47Z

@lesteve The CUDA CI just failed. Isn't array-api-strict updated to the latest version?
I checked and it seems that the old version is configured in the lock file. Can we update this to the latest version so that we don't run into these errors?

lesteve · 2024-07-12T11:29:00Z

I think you can ignore the GPU failures for now, I was mostly testing to make sure setting the label did trigger the GPU CI.

The versions are all in the lock-file and indeed currently array-api-strict still 1.1.1

scikit-learn/build_tools/github/pylatest_conda_forge_cuda_array-api_linux-64_conda.lock

Line 249 in 409d187

    
           https://conda.anaconda.org/conda-forge/noarch/array-api-strict-1.1.1-pyhd8ed1ab_0.conda#941bbcd64d1a7b44aeb497f468fc85b4

FYI trying to update the lock-file to the latest version, there seems to be some issue but I guess someone needs to look into it at one point: #29373 (or wait until Monday morning European time so that the lock-file PR is updated and cross our fingers).

OmarManzoor · 2024-07-12T11:35:25Z

FYI trying to update the lock-file to the latest version, there seems to be some issue but I guess someone needs to look into it at one point: #29373 (or wait until Monday morning European time so that the lock-file PR is updated and cross our fingers).

So should we update and use maximum or should we keep the current code? In my opinion I think we can work considering the latest version and fix the CI whenever possible. What do you think?

lesteve · 2024-07-12T12:24:39Z

In my opinion I think we can work considering the latest version and fix the CI whenever possible. What do you think?

Yes you should ignore the GPU CI failures in this PR definitely, and someone should look at the CI issues in #29373 at one point.

…age_error

lesteve · 2024-07-12T15:13:43Z

So I am not going to pretend I understand what is going on and there is no huge rush on fixing this at all, but it seems like this PR broke the doc-min-dependencies build (i.e. we run the doc build with our minimal dependencies to check that all the example run) ...

See Build log

Here is the error, it does look related to mean_absolute_percentage_error + array API:

Unexpected failing examples:

    ../examples/applications/plot_time_series_lagged_features.py failed leaving traceback:

    Traceback (most recent call last):
      File "/home/circleci/project/examples/applications/plot_time_series_lagged_features.py", line 132, in <module>
        mean_absolute_percentage_error(y_test, y_pred)
      File "/home/circleci/project/sklearn/utils/_param_validation.py", line 213, in wrapper
        return func(*args, **kwargs)
      File "/home/circleci/project/sklearn/metrics/_regression.py", line 400, in mean_absolute_percentage_error
        dtype = _find_matching_floating_dtype(y_true, y_pred, sample_weight, xp=xp)
      File "/home/circleci/project/sklearn/utils/_array_api.py", line 681, in _find_matching_floating_dtype
        floating_dtypes = [
      File "/home/circleci/project/sklearn/utils/_array_api.py", line 682, in <listcomp>
        a.dtype for a in dtyped_arrays if xp.isdtype(a.dtype, "real floating")
      File "/home/circleci/project/sklearn/utils/_array_api.py", line 442, in isdtype
        return isdtype(dtype, kind, xp=self)
      File "/home/circleci/project/sklearn/utils/_array_api.py", line 198, in isdtype
        return _isdtype_single(dtype, kind, xp=xp)
      File "/home/circleci/project/sklearn/utils/_array_api.py", line 215, in _isdtype_single
        return dtype in supported_float_dtypes(xp)
    TypeError: Cannot interpret 'Int64' as a data type

OmarManzoor · 2024-07-12T15:17:47Z

Here is the error, it does look related to mean_absolute_percentage_error + array API:

Yes I saw this too. How come this did not error out in the PR? Seems to be a case in this example: plot_time_series_lagged_features.py. Also I was not able to reproduce this on my local system.

lesteve · 2024-07-12T15:32:28Z

How come this did not error out in the PR?

The doc build in PR only runs examples that have changes in the PR, if you want a full doc build you can push a commit with [doc build]. So sometimes (but rarely) we realise on the doc build on main that an example is broken and that somehow it was not covered by the tests.

To reproduce, I guess the best bet (if you are on Linux) is to use the doc-min-dependencies lock-file, see the quick doc I recently added.

OmarManzoor · 2024-07-15T07:16:34Z

To reproduce, I guess the best bet (if you are on Linux) is to use the doc-min-depedencies lock-file

I am on a Mac with M2 chip.

lesteve · 2024-07-15T07:54:45Z

I am on a Mac with M2 chip.

OK, you can still to use the associated environment file (see doc mentioned above) and see whether you can reproduce. A quick guess would be that we use old numpy/pandas/something else version and that the logic needs to be adapted ...

EmilyXinyi · 2024-07-15T08:46:40Z

I am also trying to reproduce the problem. I am on Mac with Intel i5 cores

OmarManzoor · 2024-07-15T09:21:31Z

I am also trying to reproduce the problem. I am on Mac with Intel i5 cores

I was not able to configure the environment on my mac using the instructions that @lesteve mentioned but maybe you can create a conda environment using the lock file or the environment file.

github-actions bot added the module:metrics label Jun 19, 2024

EmilyXinyi marked this pull request as ready for review June 19, 2024 15:38

ogrisel approved these changes Jun 19, 2024

View reviewed changes

sklearn/metrics/_regression.py Outdated Show resolved Hide resolved

ogrisel approved these changes Jun 20, 2024

View reviewed changes

EmilyXinyi force-pushed the array_API_mean_absolute_percentage_error branch 2 times, most recently from a65b388 to b5bc817 Compare June 20, 2024 16:02

ogrisel added the Array API label Jun 20, 2024

EdAbati reviewed Jun 20, 2024

View reviewed changes

EdAbati mentioned this pull request Jun 20, 2024

fix: mps device support in entropy #29321

Merged

ogrisel reviewed Jul 1, 2024

View reviewed changes

EmilyXinyi added 14 commits July 8, 2024 13:39

array API support for mean_absolute_percentage_error

a5545d3

update PR number

50f03d3

make the average case always return a floatn

2919e99

addressing review comments

2273e58

fixing typo and bad merge

230332e

fixing tests

685b7c2

addressing review comments regarding mps float64 typecasting and typo

852a53e

make the average case always return a floatn

fc5e596

addressing review comments

aeaec8f

make the average case always return a floatn

0565a9f

addressing review comments

23c5de3

make the average case always return a floatn

4d12080

addressing review comments

ff9b82c

fixing bad push

ddebb21

EmilyXinyi force-pushed the array_API_mean_absolute_percentage_error branch from b9c8c7c to ddebb21 Compare July 8, 2024 12:03

Merge branch 'main' into array_API_mean_absolute_percentage_error

c3ece78

OmarManzoor approved these changes Jul 12, 2024

View reviewed changes

sklearn/metrics/_regression.py Outdated Show resolved Hide resolved

Merge branch 'scikit-learn:main' into array_API_mean_absolute_percent…

ad15f0d

…age_error

lesteve added the CUDA CI label Jul 12, 2024

github-actions bot removed the CUDA CI label Jul 12, 2024

lesteve mentioned this pull request Jul 12, 2024

CI Move label removal to a separate workflow #29456

Merged

EmilyXinyi and others added 2 commits July 12, 2024 08:38

Merge branch 'scikit-learn:main' into array_API_mean_absolute_percent…

96358e3

…age_error

address review comments

c191a60

OmarManzoor enabled auto-merge (squash) July 12, 2024 12:53

OmarManzoor disabled auto-merge July 12, 2024 12:54

OmarManzoor added 2 commits July 12, 2024 17:57

Merge branch 'main' into array_API_mean_absolute_percentage_error

cfc79f8

Merge branch 'main' into array_API_mean_absolute_percentage_error

74224cb

OmarManzoor enabled auto-merge (squash) July 12, 2024 13:28

OmarManzoor merged commit dc6c01c into scikit-learn:main Jul 12, 2024
28 checks passed

lesteve mentioned this pull request Jul 15, 2024

🔒 🤖 CI Update lock files for main CI build(s) 🔒 🤖 #29486

Merged

OmarManzoor mentioned this pull request Jul 15, 2024

Fix array api in mean_absolute_percentage_error for older versions #29490

Merged

EmilyXinyi deleted the array_API_mean_absolute_percentage_error branch August 12, 2024 08:21

OmarManzoor mentioned this pull request Sep 18, 2024

Make more of the "tools" of scikit-learn Array API compatible #26024

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

array API support for mean_absolute_percentage_error #29300

array API support for mean_absolute_percentage_error #29300

EmilyXinyi commented Jun 19, 2024 •

edited

Loading

github-actions bot commented Jun 19, 2024 •

edited

Loading

EmilyXinyi commented Jun 19, 2024 •

edited

Loading

ogrisel left a comment

ogrisel left a comment

EdAbati left a comment •

edited

Loading

EdAbati Jun 20, 2024

ogrisel Jun 20, 2024

EmilyXinyi Jun 21, 2024

EmilyXinyi Jun 21, 2024

ogrisel commented Jun 20, 2024

ogrisel commented Jun 28, 2024 •

edited

Loading

ogrisel Jul 1, 2024

EmilyXinyi Jul 1, 2024

OmarManzoor Jul 8, 2024

EmilyXinyi Jul 8, 2024

OmarManzoor left a comment

lesteve commented Jul 12, 2024

OmarManzoor commented Jul 12, 2024 •

edited

Loading

lesteve commented Jul 12, 2024 •

edited

Loading

OmarManzoor commented Jul 12, 2024

lesteve commented Jul 12, 2024

lesteve commented Jul 12, 2024 •

edited

Loading

OmarManzoor commented Jul 12, 2024 •

edited

Loading

lesteve commented Jul 12, 2024 •

edited

Loading

OmarManzoor commented Jul 15, 2024

lesteve commented Jul 15, 2024

EmilyXinyi commented Jul 15, 2024

OmarManzoor commented Jul 15, 2024

array API support for mean_absolute_percentage_error #29300

array API support for mean_absolute_percentage_error #29300

Conversation

EmilyXinyi commented Jun 19, 2024 • edited Loading

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

github-actions bot commented Jun 19, 2024 • edited Loading

✔️ Linting Passed

EmilyXinyi commented Jun 19, 2024 • edited Loading

ogrisel left a comment

Choose a reason for hiding this comment

ogrisel left a comment

Choose a reason for hiding this comment

EdAbati left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ogrisel commented Jun 20, 2024

ogrisel commented Jun 28, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

OmarManzoor left a comment

Choose a reason for hiding this comment

lesteve commented Jul 12, 2024

OmarManzoor commented Jul 12, 2024 • edited Loading

lesteve commented Jul 12, 2024 • edited Loading

OmarManzoor commented Jul 12, 2024

lesteve commented Jul 12, 2024

lesteve commented Jul 12, 2024 • edited Loading

OmarManzoor commented Jul 12, 2024 • edited Loading

lesteve commented Jul 12, 2024 • edited Loading

OmarManzoor commented Jul 15, 2024

lesteve commented Jul 15, 2024

EmilyXinyi commented Jul 15, 2024

OmarManzoor commented Jul 15, 2024

EmilyXinyi commented Jun 19, 2024 •

edited

Loading

github-actions bot commented Jun 19, 2024 •

edited

Loading

EmilyXinyi commented Jun 19, 2024 •

edited

Loading

EdAbati left a comment •

edited

Loading

ogrisel commented Jun 28, 2024 •

edited

Loading

OmarManzoor commented Jul 12, 2024 •

edited

Loading

lesteve commented Jul 12, 2024 •

edited

Loading

lesteve commented Jul 12, 2024 •

edited

Loading

OmarManzoor commented Jul 12, 2024 •

edited

Loading

lesteve commented Jul 12, 2024 •

edited

Loading