Skip to content

array API support for mean_absolute_percentage_error #29300

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

EmilyXinyi
Copy link
Contributor

@EmilyXinyi EmilyXinyi commented Jun 19, 2024

Reference Issues/PRs

Towards #26024

What does this implement/fix? Explain your changes.

Adds array API support for mean_absolute_percentage_error

Any other comments?

Keep this as draft until I add PR number and CUDA is green
Failing CI: I ran the command that triggers the failing test cases locally (pytest --durations=20 --junitxml=test-data.xml --pyargs sklearn) but they all pass. I am not sure what contributes to the difference in behaviour between our pipeline and my local tests...

Copy link

github-actions bot commented Jun 19, 2024

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: 74224cb. Link to the linter CI: here

@EmilyXinyi
Copy link
Contributor Author

EmilyXinyi commented Jun 19, 2024

CUDA: https://colab.research.google.com/drive/1SKzB8XaT2j_j4j7S-w4W6Du3DCo1I2B2?usp=sharing
EDIT: launching CUDA after commit 6c21075
CUDA failed on the follow 8 test cases:
FAILED sklearn/metrics/tests/test_common.py::test_array_api_compliance[accuracy_score-check_array_api_multilabel_classification_metric-cupy-None-None] - ValueError: unrecognized csr_matrix constructor usage
FAILED sklearn/metrics/tests/test_common.py::test_array_api_compliance[accuracy_score-check_array_api_multilabel_classification_metric-cupy.array_api-None-None] - TypeError: bool is only allowed on arrays with 0 dimensions
FAILED sklearn/metrics/tests/test_common.py::test_array_api_compliance[accuracy_score-check_array_api_multilabel_classification_metric-torch-cuda-float64] - ValueError: unrecognized csr_matrix constructor usage
FAILED sklearn/metrics/tests/test_common.py::test_array_api_compliance[accuracy_score-check_array_api_multilabel_classification_metric-torch-cuda-float32] - ValueError: unrecognized csr_matrix constructor usage
FAILED sklearn/metrics/tests/test_common.py::test_array_api_compliance[zero_one_loss-check_array_api_multilabel_classification_metric-cupy-None-None] - ValueError: unrecognized csr_matrix constructor usage
FAILED sklearn/metrics/tests/test_common.py::test_array_api_compliance[zero_one_loss-check_array_api_multilabel_classification_metric-cupy.array_api-None-None] - TypeError: bool is only allowed on arrays with 0 dimensions
FAILED sklearn/metrics/tests/test_common.py::test_array_api_compliance[zero_one_loss-check_array_api_multilabel_classification_metric-torch-cuda-float64] - ValueError: unrecognized csr_matrix constructor usage
FAILED sklearn/metrics/tests/test_common.py::test_array_api_compliance[zero_one_loss-check_array_api_multilabel_classification_metric-torch-cuda-float32] - ValueError: unrecognized csr_matrix constructor usage

@EmilyXinyi EmilyXinyi marked this pull request as ready for review June 19, 2024 15:38
Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides the following, LGTM.

Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM once the merge conflicts are resolved.

@EmilyXinyi EmilyXinyi force-pushed the array_API_mean_absolute_percentage_error branch 2 times, most recently from a65b388 to b5bc817 Compare June 20, 2024 16:02
Copy link
Contributor

@EdAbati EdAbati left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a comment regarding mps.

I also noticed that other tests seems to fail also in main:

FAILED sklearn/metrics/cluster/tests/test_supervised.py::test_entropy_array_api[torch-mps-float32] - TypeError: Cannot convert a MPS Tensor to float64 dtype as the MPS framework doesn't support float64. Please use float32 instead.
FAILED sklearn/metrics/tests/test_common.py::test_array_api_compliance[accuracy_score-check_array_api_multilabel_classification_metric-torch-mps-float32] - ValueError: unrecognized csr_matrix constructor usage
FAILED sklearn/metrics/tests/test_common.py::test_array_api_compliance[zero_one_loss-check_array_api_multilabel_classification_metric-torch-mps-float32] - ValueError: unrecognized csr_matrix constructor usage

is anyone having a look at these? (or I can try to help)

Comment on lines 405 to 410
epsilon = xp.asarray(xp.finfo(xp.float64).eps, dtype=xp.float64)
y_true_abs = xp.asarray(xp.abs(y_true), dtype=xp.float64)
mape = xp.asarray(xp.abs(y_pred - y_true), dtype=xp.float64) / xp.where(
epsilon < y_true_abs, y_true_abs, epsilon
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

float64 does not work with Torch on mps unfortunately. One way to get the max supported float precision could be xp.asarray(0.0).dtype as this function does . Can you think of a better way?

I might have missed something, why do we need cast to float64 each time we run abs?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. The original code did not upcast to float64 so we should not either.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you thank you! I also noticed the CUDA failures and I could not fathom why that is apart from #29269. It will make the suggested code changes and hopefully everything passes after #29321 is merged. (PS. everything has always been passing locally)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might have missed something, why do we need cast to float64 each time we run abs?

btw the casting is for xp.asarray, not abs

@ogrisel
Copy link
Member

ogrisel commented Jun 20, 2024

is anyone having a look at these? (or I can try to help)

@EdAbati feel free to have a look and open a PR to fix those if you have time.

@ogrisel
Copy link
Member

ogrisel commented Jun 28, 2024

@EmilyXinyi I merged main into this PR to check how many CI failures would remain after that.

Comment on lines 405 to 407
epsilon = xp.asarray(xp.finfo(xp.float64).eps, dtype=xp.asarray(0.0).dtype)
y_true_abs = xp.asarray(xp.abs(y_true), dtype=xp.asarray(0.0).dtype)
mape = xp.asarray(xp.abs(y_pred - y_true), dtype=xp.asarray(0.0).dtype) / xp.where(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rather use the floating point dtype used in the regressor's predictions than a device dependent dtype.

Suggested change
epsilon = xp.asarray(xp.finfo(xp.float64).eps, dtype=xp.asarray(0.0).dtype)
y_true_abs = xp.asarray(xp.abs(y_true), dtype=xp.asarray(0.0).dtype)
mape = xp.asarray(xp.abs(y_pred - y_true), dtype=xp.asarray(0.0).dtype) / xp.where(
epsilon = xp.asarray(xp.finfo(xp.float64).eps, dtype=y_pred.dtype)
y_true_abs = xp.asarray(xp.abs(y_true), dtype=y_pred.dtype)
mape = xp.asarray(xp.abs(y_pred - y_true), dtype=y_pred.dtype) / xp.where(

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using y_pred.dtype can't get past the test cases and I think this is because y_pred.dtype could be array_api_strict.int64, but only floating-point types are allowed in __truediv__ (which means division, I believe). As well, MAPE behaves likes a symmetric function instead of asymmetric if we use y_pred.dtype, which I suspect is due to the reduced accuracy in cases where dtype is int64.

I am not sure if there is a better way to approach this... Any suggestions would be greatly appreciated, thank you!! :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a function defined _find_matching_floating_dtype in the array api utils. Maybe you could use that to get the float dtype once and then use it in all three places?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @OmarManzoor I have made the changes accordingly

@EmilyXinyi EmilyXinyi force-pushed the array_API_mean_absolute_percentage_error branch from b9c8c7c to ddebb21 Compare July 8, 2024 12:03
Copy link
Contributor

@OmarManzoor OmarManzoor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise LGTM. Thanks @EmilyXinyi

@lesteve
Copy link
Member

lesteve commented Jul 12, 2024

FYI I tested the CUDA CI label after #29456 was merged and it looks like this is working fine:

@OmarManzoor
Copy link
Contributor

OmarManzoor commented Jul 12, 2024

@lesteve The CUDA CI just failed. Isn't array-api-strict updated to the latest version?
I checked and it seems that the old version is configured in the lock file. Can we update this to the latest version so that we don't run into these errors?

@lesteve
Copy link
Member

lesteve commented Jul 12, 2024

I think you can ignore the GPU failures for now, I was mostly testing to make sure setting the label did trigger the GPU CI.

The versions are all in the lock-file and indeed currently array-api-strict still 1.1.1

https://conda.anaconda.org/conda-forge/noarch/array-api-strict-1.1.1-pyhd8ed1ab_0.conda#941bbcd64d1a7b44aeb497f468fc85b4

FYI trying to update the lock-file to the latest version, there seems to be some issue but I guess someone needs to look into it at one point: #29373 (or wait until Monday morning European time so that the lock-file PR is updated and cross our fingers).

@OmarManzoor
Copy link
Contributor

FYI trying to update the lock-file to the latest version, there seems to be some issue but I guess someone needs to look into it at one point: #29373 (or wait until Monday morning European time so that the lock-file PR is updated and cross our fingers).

So should we update and use maximum or should we keep the current code? In my opinion I think we can work considering the latest version and fix the CI whenever possible. What do you think?

@lesteve
Copy link
Member

lesteve commented Jul 12, 2024

In my opinion I think we can work considering the latest version and fix the CI whenever possible. What do you think?

Yes you should ignore the GPU CI failures in this PR definitely, and someone should look at the CI issues in #29373 at one point.

@OmarManzoor OmarManzoor enabled auto-merge (squash) July 12, 2024 12:53
@OmarManzoor OmarManzoor disabled auto-merge July 12, 2024 12:54
@OmarManzoor OmarManzoor enabled auto-merge (squash) July 12, 2024 13:28
@OmarManzoor OmarManzoor merged commit dc6c01c into scikit-learn:main Jul 12, 2024
28 checks passed
@lesteve
Copy link
Member

lesteve commented Jul 12, 2024

So I am not going to pretend I understand what is going on and there is no huge rush on fixing this at all, but it seems like this PR broke the doc-min-dependencies build (i.e. we run the doc build with our minimal dependencies to check that all the example run) ...

See Build log

Here is the error, it does look related to mean_absolute_percentage_error + array API:

Unexpected failing examples:

    ../examples/applications/plot_time_series_lagged_features.py failed leaving traceback:

    Traceback (most recent call last):
      File "/home/circleci/project/examples/applications/plot_time_series_lagged_features.py", line 132, in <module>
        mean_absolute_percentage_error(y_test, y_pred)
      File "/home/circleci/project/sklearn/utils/_param_validation.py", line 213, in wrapper
        return func(*args, **kwargs)
      File "/home/circleci/project/sklearn/metrics/_regression.py", line 400, in mean_absolute_percentage_error
        dtype = _find_matching_floating_dtype(y_true, y_pred, sample_weight, xp=xp)
      File "/home/circleci/project/sklearn/utils/_array_api.py", line 681, in _find_matching_floating_dtype
        floating_dtypes = [
      File "/home/circleci/project/sklearn/utils/_array_api.py", line 682, in <listcomp>
        a.dtype for a in dtyped_arrays if xp.isdtype(a.dtype, "real floating")
      File "/home/circleci/project/sklearn/utils/_array_api.py", line 442, in isdtype
        return isdtype(dtype, kind, xp=self)
      File "/home/circleci/project/sklearn/utils/_array_api.py", line 198, in isdtype
        return _isdtype_single(dtype, kind, xp=xp)
      File "/home/circleci/project/sklearn/utils/_array_api.py", line 215, in _isdtype_single
        return dtype in supported_float_dtypes(xp)
    TypeError: Cannot interpret 'Int64' as a data type

@OmarManzoor
Copy link
Contributor

OmarManzoor commented Jul 12, 2024

Here is the error, it does look related to mean_absolute_percentage_error + array API:

Yes I saw this too. How come this did not error out in the PR? Seems to be a case in this example: plot_time_series_lagged_features.py. Also I was not able to reproduce this on my local system.

@lesteve
Copy link
Member

lesteve commented Jul 12, 2024

How come this did not error out in the PR?

The doc build in PR only runs examples that have changes in the PR, if you want a full doc build you can push a commit with [doc build]. So sometimes (but rarely) we realise on the doc build on main that an example is broken and that somehow it was not covered by the tests.

To reproduce, I guess the best bet (if you are on Linux) is to use the doc-min-dependencies lock-file, see the quick doc I recently added.

@OmarManzoor
Copy link
Contributor

To reproduce, I guess the best bet (if you are on Linux) is to use the doc-min-depedencies lock-file

I am on a Mac with M2 chip.

@lesteve
Copy link
Member

lesteve commented Jul 15, 2024

I am on a Mac with M2 chip.

OK, you can still to use the associated environment file (see doc mentioned above) and see whether you can reproduce. A quick guess would be that we use old numpy/pandas/something else version and that the logic needs to be adapted ...

@EmilyXinyi
Copy link
Contributor Author

I am also trying to reproduce the problem. I am on Mac with Intel i5 cores

@OmarManzoor
Copy link
Contributor

I am also trying to reproduce the problem. I am on Mac with Intel i5 cores

I was not able to configure the environment on my mac using the instructions that @lesteve mentioned but maybe you can create a conda environment using the lock file or the environment file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

5 participants