Skip to content

array API support for mean_poisson_deviance #29227

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

EmilyXinyi
Copy link
Contributor

@EmilyXinyi EmilyXinyi commented Jun 10, 2024

Reference Issues/PRs

towards #26024

What does this implement/fix? Explain your changes.

add array API support for mean_poisson_deviance

Any other comments?

there is a dependency on scipy.special.xlogy (in sklearn.metrics._regression._mean_tweedie_deviance, used by this function) which does not have an equivalent array API compatible version. I am unsure what is the best way to approach this: create a wrapper, implement our own function that accomplishes the same task, open an issue in the array-api repo about this function (maybe we can help contributing to this function too), or something else?

Copy link

github-actions bot commented Jun 10, 2024

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: 4371fb7. Link to the linter CI: here

@ogrisel
Copy link
Member

ogrisel commented Jun 12, 2024

The upcomping scipy 1.14.0 release will ship a array API support for this function via a generic fallback.

https://github.com/scipy/scipy/blob/3aa715425291727cfd63d3929802d242681318ba/scipy/special/_support_alternative_backends.py#L76-L81

Since it is not released yet, we have two option:

  • wait for scipy 1.14.0 to be released we should happen soon-ish and skip the tests for mean_poisson_deviance for sp_version < parse_version("1.14.0"): once scipy is released and our CI dependencies are updated, this test will start to not be skipped anymore.
  • backport a similar generic fallback for xlogy under our own sklearn.utils._array_api module and use that instead of scipy.special.xlogy. Note that when xp is a NumPy-based namespace, we should still use scipy.special.xlogy (which is faster for NumPy because written in Cython) instead of the generic implementation.

@EmilyXinyi
Copy link
Contributor Author

The upcomping scipy 1.14.0 release will ship a array API support for this function via a generic fallback.

https://github.com/scipy/scipy/blob/3aa715425291727cfd63d3929802d242681318ba/scipy/special/_support_alternative_backends.py#L76-L81

Since it is not released yet, we have two option:

  • wait for scipy 1.14.0 to be released we should happen soon-ish and skip the tests for mean_poisson_deviance for sp_version < parse_version("1.14.0"): once scipy is released and our CI dependencies are updated, this test will start to not be skipped anymore.
  • backport a similar generic fallback for xlogy under our own sklearn.utils._array_api module and use that instead of scipy.special.xlogy. Note that when xp is a NumPy-based namespace, we should still use scipy.special.xlogy (which is faster for NumPy because written in Cython) instead of the generic implementation.

Thank you @ogrisel. I have skipped the tests for now. I chose to skip the tests because we are not changing anything in the function itself but only skipping tests that we know are not failing, so I think it's the safer option.

@EmilyXinyi EmilyXinyi marked this pull request as ready for review June 13, 2024 09:38
@EmilyXinyi EmilyXinyi changed the title WIP array API support for mean_poisson_deviance array API support for mean_poisson_deviance Jun 13, 2024
Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We would also need to set the SCIPY_ARRAY_API environment variable to "1" in our CI to enable scipy support for array API (see https://docs.scipy.org/doc/scipy/dev/api-dev/array_api.html#using-array-api-standard-support).

To do that, you can edit the pylatest_conda_forge_mkl config in azure-pipelines.yml to set this variable, similarly to what we do for other environment variables such as SKLEARN_TESTS_GLOBAL_RANDOM_SEED for instance.

We would also need to do something similar in .github/workflows/cuda-gpu-ci.yml. In this case you can just edit the last step to replace:

pytest -k 'array_api'

by:

SCIPY_ARRAY_API=1 pytest -k 'array_api'

@ogrisel
Copy link
Member

ogrisel commented Jun 13, 2024

Could you also please launch the tests yourself to check that they pass with the nightly build of scipy?

You can install the nightly dev builds of scipy (and numpy, just in case) with the following command:

pip install --pre --extra-index https://pypi.anaconda.org/scientific-python-nightly-wheels/simple numpy scipy

I would recommend not doing that in your usual dev env but rather create a dedicated conda environment for those tests.

Alternatively you can use a through away google colab env setup with the help of this notebook (to customize to install the scipy dev version):

https://gist.github.com/EdAbati/ff3bdc06bafeb92452b3740686cc8d7c

Doing this on colab would also make it possible to test with cuda devices.

@EmilyXinyi
Copy link
Contributor Author

EmilyXinyi commented Jun 13, 2024

I have made the suggested changes to azure-pipelines.yml and .github/workflows/cuda-gpu-ci.yml, and here is the colab link for cuda and nightly build of scipy:
https://colab.research.google.com/drive/1Dw6ABHw-2wyjlzUQKb4VDJw_51mGZMLn?usp=sharing
Seems like the tests are passing with CUDA but doc tests are failing on CircleCI...
I am also not sure why it shows that I changed 300+ files... I am assuming it's because of the yml file changes

@betatim
Copy link
Member

betatim commented Jun 13, 2024

Since it is not released yet, we have two option:

* wait for scipy 1.14.0 to be released we should happen soon-ish and skip the tests for `mean_poisson_deviance` for `sp_version < parse_version("1.14.0")`: once scipy is released and our CI dependencies are updated, this test will start to not be skipped anymore.

* backport a similar generic fallback for `xlogy` under our own `sklearn.utils._array_api` module and use that instead of `scipy.special.xlogy`. Note that when `xp` is a NumPy-based namespace, we should still use `scipy.special.xlogy` (which is faster for NumPy because written in Cython) instead of the generic implementation.

If we only do the "wait for scipy 1.14.0" then we will introduce a minimum requirement on the scipy version for array API support to work right?

I'm not sure if that is a bad thing or not. However if we do we need some way to express that (version check when array API is enabled via set_config()?) to the user. As well as some documentation about it. In addition it means we need to figure out what to do about the environment variable that needs setting. Should scikit-learn set this for the user? Probably not. Should we instead check when enabling array API support and print a warning to the user?

In general I think minimising the number of functions implemented in sklearn.utils._array_api is a good thing. So this would imply we should wait.

@ogrisel
Copy link
Member

ogrisel commented Jun 13, 2024

Should we instead check when enabling array API support and print a warning to the user?

That sounds like a good idea.

Should scikit-learn set this for the user?

If scipy has already been imported, it's too late (I tried in a shell). The environment variable should be set before scipy is initialized apparently.

@ogrisel
Copy link
Member

ogrisel commented Jun 13, 2024

@EmilyXinyi the main has evolved concurrently and there is now a bunch of conflicts. Would you mind updating the PR to resolve those. Please let me know if you need help with that.

@ogrisel
Copy link
Member

ogrisel commented Jun 13, 2024

BTW, to set the environment variable in the colab notebook, you would need to do:

import os

os.environ["SCIPY_ARRAY_API"] = "1"

in one of the first cells, before importing sklearn or scipy.

EDIT: @EdAbati you might want to update your colab notebook accordingly, in case this PR or other similar PRs that need array API support for scipy functions are merged into main.

@EmilyXinyi
Copy link
Contributor Author

EmilyXinyi commented Jun 14, 2024

@EmilyXinyi the main has evolved concurrently and there is now a bunch of conflicts. Would you mind updating the PR to resolve those. Please let me know if you need help with that.

I think I have resolved the conflicts, but strangely enough the changes that happened on main ended up in my "resolve merge conflict" commit... If there is a way to avoid that please let me know, thank you!

@EmilyXinyi
Copy link
Contributor Author

Should we instead check when enabling array API support and print a warning to the user?

Should this change be made this PR? (If yes, could anyone help me with it? Thanks!)

@ogrisel
Copy link
Member

ogrisel commented Jun 20, 2024

I have made the suggested changes to azure-pipelines.yml and .github/workflows/cuda-gpu-ci.yml, and here is the colab link for cuda and nightly build of scipy:
https://colab.research.google.com/drive/1Dw6ABHw-2wyjlzUQKb4VDJw_51mGZMLn?usp=sharing
Seems like the tests are passing with CUDA

Actually the mean_poisson_deviance tests are skipped. I suppose there is a problem with the handling scipy version number:

klearn/metrics/tests/test_common.py::test_array_api_compliance[mean_poisson_deviance-check_array_api_regression_metric-numpy-None-None] SKIPPED [ 23%]
sklearn/metrics/tests/test_common.py::test_array_api_compliance[mean_poisson_deviance-check_array_api_regression_metric-array_api_strict-None-None] SKIPPED [ 23%]
sklearn/metrics/tests/test_common.py::test_array_api_compliance[mean_poisson_deviance-check_array_api_regression_metric-cupy-None-None] SKIPPED [ 23%]
sklearn/metrics/tests/test_common.py::test_array_api_compliance[mean_poisson_deviance-check_array_api_regression_metric-cupy.array_api-None-None] SKIPPED [ 23%]
sklearn/metrics/tests/test_common.py::test_array_api_compliance[mean_poisson_deviance-check_array_api_regression_metric-torch-cpu-float64] SKIPPED [ 23%]
sklearn/metrics/tests/test_common.py::test_array_api_compliance[mean_poisson_deviance-check_array_api_regression_metric-torch-cpu-float32] SKIPPED [ 24%]
sklearn/metrics/tests/test_common.py::test_array_api_compliance[mean_poisson_deviance-check_array_api_regression_metric-torch-cuda-float64] SKIPPED [ 24%]
sklearn/metrics/tests/test_common.py::test_array_api_compliance[mean_poisson_deviance-check_array_api_regression_metric-torch-cuda-float32] SKIPPED [ 24%]
sklearn/metrics/tests/test_common.py::test_array_api_compliance[mean_poisson_deviance-check_array_api_regression_metric-torch-mps-float32] SKIPPED [ 24%]

EDIT: looking at the notebook, it's not installing the scipy dev version:

! cd scikit-learn && pip install --pre --extra-index https://pypi.anaconda.org/scientific-python-nightly-wheels/simple numpy scipy
Looking in indexes: https://pypi.org/simple, https://pypi.anaconda.org/scientific-python-nightly-wheels/simple
Requirement already satisfied: numpy in /usr/local/lib/python3.10/dist-packages (1.25.2)
Requirement already satisfied: scipy in /usr/local/lib/python3.10/dist-packages (1.11.4)

We might need to add a --upgrade or --force-reinstall flag to that command. Or both flags.

Then check the output of the cell with sklearn.show_versions() to make sure that it's using the dev version of scipy.

@ogrisel
Copy link
Member

ogrisel commented Jun 20, 2024

@EmilyXinyi I merged the main branch into this PR to hide redundant changes related to the chi2_kernel thingy. Be sure to pull this merge commit to your local branch before adding any new commit to it.

@ogrisel
Copy link
Member

ogrisel commented Jun 20, 2024

Should we instead check when enabling array API support and print a warning to the user?
Should this change be made this PR? (If yes, could anyone help me with it? Thanks!)

I think so. I think the best place to do this is in _check_array_api_dispatch which is called by sklearn.set_config(array_api_dispatch=True/False).

We would need to raise a warning when os.environ.get("SCIPY_ARRAY_API") != "1" to tell the user that some scikit-learn array API features might rely on enabling scipy's own support for array API to work properly.

@EmilyXinyi EmilyXinyi force-pushed the array_API_mean_poisson_deviance branch 2 times, most recently from 4e1ed08 to 4618be3 Compare June 20, 2024 15:23
@EmilyXinyi
Copy link
Contributor Author

Just a note: please ignore all the recent commits and force pushes. I think I did a merge instead of rebase and a bunch of files I didn't touch changed.

@EmilyXinyi
Copy link
Contributor Author

EmilyXinyi commented Jun 20, 2024

We might need to add a --upgrade or --force-reinstall flag to that command. Or both flags.

Then check the output of the cell with sklearn.show_versions() to make sure that it's using the dev version of scipy.

Using this as thread to see how the CUDA situation goes. (Here is the link again: https://colab.research.google.com/drive/1Dw6ABHw-2wyjlzUQKb4VDJw_51mGZMLn?usp=sharing)

NumPy's nightly build (basically version 2.0.0 and above) has dependency issue, but Scipy's dev version can be installed using pip install --pre --upgrade scipy .
This installs scipy 1.14.0 and the CUDA tests run now. After merging with main, the failures regarding accuracy_score and zero_one_loss are no longer there.

sklearn/metrics/tests/test_common.py::test_array_api_compliance[mean_poisson_deviance-check_array_api_regression_metric-numpy-None-None] PASSED [ 18%]
sklearn/metrics/tests/test_common.py::test_array_api_compliance[mean_poisson_deviance-check_array_api_regression_metric-array_api_strict-None-None] SKIPPED [ 18%]
sklearn/metrics/tests/test_common.py::test_array_api_compliance[mean_poisson_deviance-check_array_api_regression_metric-cupy-None-None] PASSED [ 18%]
sklearn/metrics/tests/test_common.py::test_array_api_compliance[mean_poisson_deviance-check_array_api_regression_metric-cupy.array_api-None-None] PASSED [ 18%]
sklearn/metrics/tests/test_common.py::test_array_api_compliance[mean_poisson_deviance-check_array_api_regression_metric-torch-cpu-float64] PASSED [ 18%]
sklearn/metrics/tests/test_common.py::test_array_api_compliance[mean_poisson_deviance-check_array_api_regression_metric-torch-cpu-float32] PASSED [ 18%]
sklearn/metrics/tests/test_common.py::test_array_api_compliance[mean_poisson_deviance-check_array_api_regression_metric-torch-cuda-float64] PASSED [ 19%]
sklearn/metrics/tests/test_common.py::test_array_api_compliance[mean_poisson_deviance-check_array_api_regression_metric-torch-cuda-float32] PASSED [ 19%]
sklearn/metrics/tests/test_common.py::test_array_api_compliance[mean_poisson_deviance-check_array_api_regression_metric-torch-mps-float32] SKIPPED [ 19%]

@EmilyXinyi EmilyXinyi force-pushed the array_API_mean_poisson_deviance branch from 3750613 to 21f658c Compare July 3, 2024 09:48
Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR diff looks good to me. But I would wait for #29436 to be merged first and then merge main into this PR and re-trigger the CUDA CI to make sure that all tests pass.

Here are a few more minor suggestions.

@@ -260,6 +261,7 @@ jobs:
DISTRIB: 'conda'
LOCK_FILE: './build_tools/azure/pylatest_conda_forge_mkl_osx-64_conda.lock'
SKLEARN_TESTS_GLOBAL_RANDOM_SEED: '5' # non-default seed
SCIPY_ARRAY_API=1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If #29436 is merged first, a similar change will be needed for pylatest_pip_openblas_pandas.

@ogrisel
Copy link
Member

ogrisel commented Jul 9, 2024

#29436 was merged. Please feel free to merge main into this PR and address #29227 (comment).

@EmilyXinyi EmilyXinyi force-pushed the array_API_mean_poisson_deviance branch 2 times, most recently from 5d16602 to c32d6e1 Compare July 10, 2024 13:12
@EmilyXinyi EmilyXinyi closed this Jul 22, 2024
@EmilyXinyi EmilyXinyi force-pushed the array_API_mean_poisson_deviance branch from 8579f6f to 211a7f1 Compare July 22, 2024 10:28
@EmilyXinyi
Copy link
Contributor Author

I closed this by accident while fixing merge conflicts ;-; if a maintainer could reopen this PR it would be great, thank you!

@EmilyXinyi
Copy link
Contributor Author

I added one more change and git is allowing me to reopen it...

@EmilyXinyi EmilyXinyi reopened this Jul 22, 2024
@betatim betatim added CUDA CI and removed CUDA CI labels Jul 22, 2024
@github-actions github-actions bot removed the CUDA CI label Jul 22, 2024
Co-authored-by: Loïc Estève <loic.esteve@ymail.com>
Copy link
Contributor

@OmarManzoor OmarManzoor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks @EmilyXinyi

@OmarManzoor OmarManzoor merged commit 68b71b5 into scikit-learn:main Jul 23, 2024
29 of 30 checks passed
@lesteve
Copy link
Member

lesteve commented Jul 23, 2024

I tried running the tests on my machine and I get the following weird failure with array-api-strict, any suggestion (see details below)?

The command I used:

pytest -vl sklearn/metrics/tests/test_common.py -k 'api_regression_metric and mean_poisson_deviance'

I have scipy 1.14, array-api-strict 2.0.1. Not sure what else could explain the discrepancy with the CI ...

❯ pytest -vl sklearn/metrics/tests/test_common.py -k 'api_regression_metric and mean_poisson_deviance'

=========================================================================================================== test session starts ===========================================================================================================
platform linux -- Python 3.12.4, pytest-8.2.2, pluggy-1.5.0 -- /home/lesteve/micromamba/envs/scikit-learn-dev/bin/python
cachedir: .pytest_cache
hypothesis profile 'default' -> database=DirectoryBasedExampleDatabase(PosixPath('/home/lesteve/dev/scikit-learn/.hypothesis/examples'))
rootdir: /home/lesteve/dev/scikit-learn
configfile: setup.cfg
plugins: anyio-4.2.0, hypothesis-6.97.3, xdist-3.5.0, repeat-0.9.3, cov-5.0.0
collected 1648 items / 1639 deselected / 9 selected                                                                                                                                                                                       

sklearn/metrics/tests/test_common.py::test_array_api_compliance[mean_poisson_deviance-check_array_api_regression_metric-numpy-None-None] PASSED                                                                                     [ 11%]
sklearn/metrics/tests/test_common.py::test_array_api_compliance[mean_poisson_deviance-check_array_api_regression_metric-array_api_strict-None-None] FAILED                                                                          [ 22%]
sklearn/metrics/tests/test_common.py::test_array_api_compliance[mean_poisson_deviance-check_array_api_regression_metric-cupy-None-None] SKIPPED (cupy is not installed: not checking array_api input)                               [ 33%]
sklearn/metrics/tests/test_common.py::test_array_api_compliance[mean_poisson_deviance-check_array_api_regression_metric-cupy.array_api-None-None] SKIPPED (cupy.array_api is not installed: not checking array_api input)           [ 44%]
sklearn/metrics/tests/test_common.py::test_array_api_compliance[mean_poisson_deviance-check_array_api_regression_metric-torch-cpu-float64] PASSED                                                                                   [ 55%]
sklearn/metrics/tests/test_common.py::test_array_api_compliance[mean_poisson_deviance-check_array_api_regression_metric-torch-cpu-float32] PASSED                                                                                   [ 66%]
sklearn/metrics/tests/test_common.py::test_array_api_compliance[mean_poisson_deviance-check_array_api_regression_metric-torch-cuda-float64] SKIPPED (PyTorch test requires cuda, which is not available)                            [ 77%]
sklearn/metrics/tests/test_common.py::test_array_api_compliance[mean_poisson_deviance-check_array_api_regression_metric-torch-cuda-float32] SKIPPED (PyTorch test requires cuda, which is not available)                            [ 88%]
sklearn/metrics/tests/test_common.py::test_array_api_compliance[mean_poisson_deviance-check_array_api_regression_metric-torch-mps-float32] SKIPPED (Skipping MPS device test because PYTORCH_ENABLE_MPS_FALLBACK is not set.)       [100%]

================================================================================================================ FAILURES =================================================================================================================
______________________________________________________________ test_array_api_compliance[mean_poisson_deviance-check_array_api_regression_metric-array_api_strict-None-None] ______________________________________________________________

metric = <function mean_poisson_deviance at 0x7360c7d9c680>, array_namespace = 'array_api_strict', device = None, dtype_name = None, check_func = <function check_array_api_regression_metric at 0x7360c7880900>

    @pytest.mark.parametrize(
        "array_namespace, device, dtype_name", yield_namespace_device_dtype_combinations()
    )
    @pytest.mark.parametrize("metric, check_func", yield_metric_checker_combinations())
    def test_array_api_compliance(metric, array_namespace, device, dtype_name, check_func):
>       check_func(metric, array_namespace, device, dtype_name)

array_namespace = 'array_api_strict'
check_func = <function check_array_api_regression_metric at 0x7360c7880900>
device     = None
dtype_name = None
metric     = <function mean_poisson_deviance at 0x7360c7d9c680>

sklearn/metrics/tests/test_common.py:2048: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
sklearn/metrics/tests/test_common.py:1900: in check_array_api_regression_metric
    check_array_api_metric(
        array_namespace = 'array_api_strict'
        device     = None
        dtype_name = None
        func_name  = 'mean_poisson_deviance'
        metric     = <function mean_poisson_deviance at 0x7360c7d9c680>
        metric_kwargs = {'sample_weight': array([0.1, 2. , 1.5, 0.5])}
        metric_params = mappingproxy(OrderedDict({'y_true': <Parameter "y_true">, 'y_pred': <Parameter "y_pred">, 'sample_weight': <Parameter "sample_weight=None">}))
        y_pred_np  = array([0.5, 0.5, 2. , 2. ])
        y_true_np  = array([2. , 0.1, 1. , 4. ])
sklearn/metrics/tests/test_common.py:1773: in check_array_api_metric
    metric_xp = metric(a_xp, b_xp, **metric_kwargs)
        a_np       = array([2. , 0.1, 1. , 4. ])
        a_xp       = Array([2. , 0.1, 1. , 4. ], dtype=array_api_strict.float64)
        array_namespace = 'array_api_strict'
        b_np       = array([0.5, 0.5, 2. , 2. ])
        b_xp       = Array([0.5, 0.5, 2. , 2. ], dtype=array_api_strict.float64)
        device     = None
        dtype_name = None
        metric     = <function mean_poisson_deviance at 0x7360c7d9c680>
        metric_kwargs = {'sample_weight': Array([0.1, 2. , 1.5, 0.5], dtype=array_api_strict.float64)}
        metric_np  = 0.7082657951303076
        multioutput = None
        xp         = <module 'array_api_strict' from '/home/lesteve/micromamba/envs/scikit-learn-dev/lib/python3.12/site-packages/array_api_strict/__init__.py'>
sklearn/utils/_param_validation.py:216: in wrapper
    return func(*args, **kwargs)
        args       = (Array([2. , 0.1, 1. , 4. ], dtype=array_api_strict.float64), Array([0.5, 0.5, 2. , 2. ], dtype=array_api_strict.float64))
        func       = <function mean_poisson_deviance at 0x7360c7d9c5e0>
        func_sig   = <Signature (y_true, y_pred, *, sample_weight=None)>
        global_skip_validation = False
        kwargs     = {'sample_weight': Array([0.1, 2. , 1.5, 0.5], dtype=array_api_strict.float64)}
        parameter_constraints = {'sample_weight': ['array-like', None], 'y_pred': ['array-like'], 'y_true': ['array-like']}
        params     = {'sample_weight': Array([0.1, 2. , 1.5, 0.5], dtype=array_api_strict.float64), 'y_pred': Array([0.5, 0.5, 2. , 2. ], dtype=array_api_strict.float64), 'y_true': Array([2. , 0.1, 1. , 4. ], dtype=array_api_strict.float64)}
        prefer_skip_nested_validation = True
        to_ignore  = ['self', 'cls']
sklearn/metrics/_regression.py:1460: in mean_poisson_deviance
    return mean_tweedie_deviance(y_true, y_pred, sample_weight=sample_weight, power=1)
        sample_weight = Array([0.1, 2. , 1.5, 0.5], dtype=array_api_strict.float64)
        y_pred     = Array([0.5, 0.5, 2. , 2. ], dtype=array_api_strict.float64)
        y_true     = Array([2. , 0.1, 1. , 4. ], dtype=array_api_strict.float64)
sklearn/utils/_param_validation.py:189: in wrapper
    return func(*args, **kwargs)
        args       = (Array([2. , 0.1, 1. , 4. ], dtype=array_api_strict.float64), Array([0.5, 0.5, 2. , 2. ], dtype=array_api_strict.float64))
        func       = <function mean_tweedie_deviance at 0x7360c7d9c4a0>
        global_skip_validation = True
        kwargs     = {'power': 1, 'sample_weight': Array([0.1, 2. , 1.5, 0.5], dtype=array_api_strict.float64)}
        parameter_constraints = {'power': [<sklearn.utils._param_validation.Interval object at 0x7360c7d0d100>, <sklearn.utils._param_validation.Interval object at 0x7360c7d0d130>], 'sample_weight': ['array-like', None], 'y_pred': ['array-like'], 'y_true': ['array-like']}
        prefer_skip_nested_validation = True
sklearn/metrics/_regression.py:1415: in mean_tweedie_deviance
    return _mean_tweedie_deviance(
        _          = None
        message    = 'Mean Tweedie deviance error with power=1 can only be used on '
        power      = 1
        sample_weight = Array([[0.1],
       [2. ],
       [1.5],
       [0.5]], dtype=array_api_strict.float64)
        xp         = <module 'array_api_strict' from '/home/lesteve/micromamba/envs/scikit-learn-dev/lib/python3.12/site-packages/array_api_strict/__init__.py'>
        y_pred     = Array([[0.5],
       [0.5],
       [2. ],
       [2. ]], dtype=array_api_strict.float64)
        y_true     = Array([[2. ],
       [0.1],
       [1. ],
       [4. ]], dtype=array_api_strict.float64)
        y_type     = 'continuous'
sklearn/metrics/_regression.py:1320: in _mean_tweedie_deviance
    return float(_average(dev, weights=sample_weight))
        _          = True
        dev        = array([[2.54517744],
       [0.47811242],
       [0.61370564],
       [1.54517744]])
        p          = 1
        power      = 1
        sample_weight = Array([[0.1],
       [2. ],
       [1.5],
       [0.5]], dtype=array_api_strict.float64)
        xp         = <module 'array_api_strict' from '/home/lesteve/micromamba/envs/scikit-learn-dev/lib/python3.12/site-packages/array_api_strict/__init__.py'>
        y_pred     = Array([[0.5],
       [0.5],
       [2. ],
       [2. ]], dtype=array_api_strict.float64)
        y_true     = Array([[2. ],
       [0.1],
       [1. ],
       [4. ]], dtype=array_api_strict.float64)
        zero       = Array(0., dtype=array_api_strict.float64)
sklearn/utils/_array_api.py:725: in _average
    xp, _, device_ = get_namespace_and_device(a, weights)
        a          = array([[2.54517744],
       [0.47811242],
       [0.61370564],
       [1.54517744]])
        axis       = None
        normalize  = True
        weights    = Array([[0.1],
       [2. ],
       [1.5],
       [0.5]], dtype=array_api_strict.float64)
        xp         = None
sklearn/utils/_array_api.py:630: in get_namespace_and_device
    xp, is_array_api = get_namespace(*array_list, **skip_remove_kwargs)
        array_list = [array([[2.54517744],
       [0.47811242],
       [0.61370564],
       [1.54517744]]), Array([[0.1],
       [2. ],
       [1.5],
       [0.5]], dtype=array_api_strict.float64)]
        remove_none = True
        remove_types = (<class 'str'>,)
        skip_remove_kwargs = {'remove_none': False, 'remove_types': []}
sklearn/utils/_array_api.py:586: in get_namespace
    namespace, is_array_api_compliant = array_api_compat.get_namespace(*arrays), True
        array_api_compat = <module 'array_api_compat' from '/home/lesteve/micromamba/envs/scikit-learn-dev/lib/python3.12/site-packages/array_api_compat/__init__.py'>
        array_api_dispatch = True
        arrays     = [array([[2.54517744],
       [0.47811242],
       [0.61370564],
       [1.54517744]]), Array([[0.1],
       [2. ],
       [1.5],
       [0.5]], dtype=array_api_strict.float64)]
        remove_none = False
        remove_types = []
        xp         = None
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

api_version = None, use_compat = None, xs = (array([[2.54517744],
       [0.47811242],
       [0.61370564],
       [1.54517744]]), Array([[0.1],
       [2. ],
       [1.5],
       [0.5]], dtype=array_api_strict.float64))
_use_compat = True

    def array_namespace(*xs, api_version=None, use_compat=None):
        """
        Get the array API compatible namespace for the arrays `xs`.
    
        Parameters
        ----------
        xs: arrays
            one or more arrays.
    
        api_version: str
            The newest version of the spec that you need support for (currently
            the compat library wrapped APIs support v2022.12).
    
        use_compat: bool or None
            If None (the default), the native namespace will be returned if it is
            already array API compatible, otherwise a compat wrapper is used. If
            True, the compat library wrapped library will be returned. If False,
            the native library namespace is returned.
    
        Returns
        -------
    
        out: namespace
            The array API compatible namespace corresponding to the arrays in `xs`.
    
        Raises
        ------
        TypeError
            If `xs` contains arrays from different array libraries or contains a
            non-array.
    
    
        Typical usage is to pass the arguments of a function to
        `array_namespace()` at the top of a function to get the corresponding
        array API namespace:
    
        .. code:: python
    
           def your_function(x, y):
               xp = array_api_compat.array_namespace(x, y)
               # Now use xp as the array library namespace
               return xp.mean(x, axis=0) + 2*xp.std(y, axis=0)
    
    
        Wrapped array namespaces can also be imported directly. For example,
        `array_namespace(np.array(...))` will return `array_api_compat.numpy`.
        This function will also work for any array library not wrapped by
        array-api-compat if it explicitly defines `__array_namespace__
        <https://data-apis.org/array-api/latest/API_specification/generated/array_api.array.__array_namespace__.html>`__
        (the wrapped namespace is always preferred if it exists).
    
        See Also
        --------
    
        is_array_api_obj
        is_numpy_array
        is_cupy_array
        is_torch_array
        is_dask_array
        is_jax_array
        is_pydata_sparse_array
    
        """
        if use_compat not in [None, True, False]:
            raise ValueError("use_compat must be None, True, or False")
    
        _use_compat = use_compat in [None, True]
    
        namespaces = set()
        for x in xs:
            if is_numpy_array(x):
                from .. import numpy as numpy_namespace
                import numpy as np
                if use_compat is True:
                    _check_api_version(api_version)
                    namespaces.add(numpy_namespace)
                elif use_compat is False:
                    namespaces.add(np)
                else:
                    # numpy 2.0 has __array_namespace__ and is fully array API
                    # compatible.
                    if hasattr(x, '__array_namespace__'):
                        namespaces.add(x.__array_namespace__(api_version=api_version))
                    else:
                        namespaces.add(numpy_namespace)
            elif is_cupy_array(x):
                if _use_compat:
                    _check_api_version(api_version)
                    from .. import cupy as cupy_namespace
                    namespaces.add(cupy_namespace)
                else:
                    import cupy as cp
                    namespaces.add(cp)
            elif is_torch_array(x):
                if _use_compat:
                    _check_api_version(api_version)
                    from .. import torch as torch_namespace
                    namespaces.add(torch_namespace)
                else:
                    import torch
                    namespaces.add(torch)
            elif is_dask_array(x):
                if _use_compat:
                    _check_api_version(api_version)
                    from ..dask import array as dask_namespace
                    namespaces.add(dask_namespace)
                else:
                    import dask.array as da
                    namespaces.add(da)
            elif is_jax_array(x):
                if use_compat is True:
                    _check_api_version(api_version)
                    raise ValueError("JAX does not have an array-api-compat wrapper")
                elif use_compat is False:
                    import jax.numpy as jnp
                else:
                    # jax.experimental.array_api is already an array namespace. We do
                    # not have a wrapper submodule for it.
                    import jax.experimental.array_api as jnp
                namespaces.add(jnp)
            elif is_pydata_sparse_array(x):
                if use_compat is True:
                    _check_api_version(api_version)
                    raise ValueError("`sparse` does not have an array-api-compat wrapper")
                else:
                    import sparse
                # `sparse` is already an array namespace. We do not have a wrapper
                # submodule for it.
                namespaces.add(sparse)
            elif hasattr(x, '__array_namespace__'):
                if use_compat is True:
                    raise ValueError("The given array does not have an array-api-compat wrapper")
                namespaces.add(x.__array_namespace__(api_version=api_version))
            else:
                # TODO: Support Python scalars?
                raise TypeError(f"{type(x).__name__} is not a supported array type")
    
        if not namespaces:
            raise TypeError("Unrecognized array input")
    
        if len(namespaces) != 1:
>           raise TypeError(f"Multiple namespaces for array inputs: {namespaces}")
E           TypeError: Multiple namespaces for array inputs: {<module 'array_api_strict' from '/home/lesteve/micromamba/envs/scikit-learn-dev/lib/python3.12/site-packages/array_api_strict/__init__.py'>, <module 'numpy' from '/home/lesteve/micromamba/envs/scikit-learn-dev/lib/python3.12/site-packages/numpy/__init__.py'>}

_use_compat = True
api_version = None
namespaces = {<module 'array_api_strict' from '/home/lesteve/micromamba/envs/scikit-learn-dev/lib/python3.12/site-packages/array_ap... <module 'numpy' from '/home/lesteve/micromamba/envs/scikit-learn-dev/lib/python3.12/site-packages/numpy/__init__.py'>}
np         = <module 'numpy' from '/home/lesteve/micromamba/envs/scikit-learn-dev/lib/python3.12/site-packages/numpy/__init__.py'>
numpy_namespace = <module 'array_api_compat.numpy' from '/home/lesteve/micromamba/envs/scikit-learn-dev/lib/python3.12/site-packages/array_api_compat/numpy/__init__.py'>
use_compat = None
x          = Array([[0.1],
       [2. ],
       [1.5],
       [0.5]], dtype=array_api_strict.float64)
xs         = (array([[2.54517744],
       [0.47811242],
       [0.61370564],
       [1.54517744]]), Array([[0.1],
       [2. ],
       [1.5],
       [0.5]], dtype=array_api_strict.float64))

../../micromamba/envs/scikit-learn-dev/lib/python3.12/site-packages/array_api_compat/common/_helpers.py:372: TypeError
========================================================================================================= short test summary info =========================================================================================================
FAILED sklearn/metrics/tests/test_common.py::test_array_api_compliance[mean_poisson_deviance-check_array_api_regression_metric-array_api_strict-None-None] - TypeError: Multiple namespaces for array inputs: {<module 'array_api_strict' from '/home/lesteve/micromamba/envs/scikit-learn-dev/lib/python3.12/site-packages/array_api_strict/__init__.py'>, <module 'numpy' from '/home/lesteve/mic...
================================================================================== 1 failed, 3 passed, 5 skipped, 1639 deselected, 40 warnings in 0.97s ===================================================================================

@OmarManzoor
Copy link
Contributor

OmarManzoor commented Jul 23, 2024

I got this error too. Maybe this is related to an older version of numpy? I am updating numpy to check.

Edit: This doesn't seem to be the case.

@EmilyXinyi
Copy link
Contributor Author

EmilyXinyi commented Jul 23, 2024

All tests pass on my end with pytest -vl sklearn/metrics/tests/test_common.py -k. Let me check what is my configuration
Numpy = 1.26.4
SciPy = 1.13.1

@OmarManzoor
Copy link
Contributor

@lesteve Add this environment variable
SCIPY_ARRAY_API=1

@lesteve
Copy link
Member

lesteve commented Jul 23, 2024

OK I confirm that SCIPY_ARRAY_API=1 pytest -vl sklearn/metrics/tests/test_common.py -k 'api_regression_metric and mean_poisson_deviance' passes.

Honestly I think this is fair to say that as a newcomer to the array API work, my feeling is that it can break way too easily in very mysterious ways. There are probably a few unrelated reasons for this but I guess since at the last bi-weekly meetings we agreed that we should try to pay attention to this kind of breakages and improve the situation, here are a few comments on this PR:

  • I think we should have some tests with scipy>=1.14 and SCIPY_ARRAY_API unset. This seems something that could easily happen in practice and we should have tests that make sure that nothing breaks. It's not clear to me why only array-api-strict breaks and not pytorch example, maybe as the name implies array-api-strict is even stricter than other namespaces.
  • I don't think this is acceptable to skip the test for scipy<1.14 since you know a reasonable user may use array_api_dispatch=True with scipy<1.14. To be fair it is not clear to me what happens in this case. I guess scipy will transform the array into a numpy array, and maybe the code afterwards has issues because some arrays are not in the same namespace? Anyway, we should make sure that the user code does not break. If this is really too hard to achieve the alternative is to raise an exception saying something like "to enable array API dispatch with this particular function you need scipy 1.14 and SCIPY_ARRAY_API=1".
  • the warning about SCIPY_ARRAY_API seems too broad (as soon as set_config(array_api_dispatch=True) is called?). Could we not have warnings only in a few (at least for now) places where we know we call scipy functions?

@EmilyXinyi
Copy link
Contributor Author

  • I don't think this is acceptable to skip the test for scipy<1.14 since you know a reasonable user may use array_api_dispatch=True with scipy<1.14. To be fair it is not clear to me what happens in this case. I guess scipy will transform the array into a numpy array, and maybe the code afterwards has issues because some arrays are not in the same namespace? Anyway, we should make sure that the user code does not break. If this is really too hard to achieve the alternative is to raise an exception saying something like "to enable array API dispatch with this particular function you need scipy 1.14 and SCIPY_ARRAY_API=1".
  • the warning about SCIPY_ARRAY_API seems too broad (as soon as set_config(array_api_dispatch=True) is called?). Could we not have warnings only in a few (at least for now) places where we know we call scipy functions?

Should we open an issue to discuss these things?

@OmarManzoor
Copy link
Contributor

I agree that it is a bit inconvenient if this breaks like this and a more informative error might be better. I am not sure how feasible it is to handle this because scipy is involved and if the env variable is not set I think scipy returns numpy arrays instead of the actual api's arrays which can cause conflicts.

@lesteve
Copy link
Member

lesteve commented Jul 23, 2024

Should we open an issue to discuss these things?

Yep I opened #29549 with my original comment

MarcBresson pushed a commit to MarcBresson/scikit-learn that referenced this pull request Sep 2, 2024
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
Co-authored-by: Omar Salman <omar.salman@arbisoft.com>
Co-authored-by: Loïc Estève <loic.esteve@ymail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants