array API support for mean_poisson_deviance #29227

EmilyXinyi · 2024-06-10T14:16:04Z

Reference Issues/PRs

What does this implement/fix? Explain your changes.

add array API support for mean_poisson_deviance

Any other comments?

there is a dependency on scipy.special.xlogy (in sklearn.metrics._regression._mean_tweedie_deviance, used by this function) which does not have an equivalent array API compatible version. I am unsure what is the best way to approach this: create a wrapper, implement our own function that accomplishes the same task, open an issue in the array-api repo about this function (maybe we can help contributing to this function too), or something else?

github-actions · 2024-06-10T14:17:22Z

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

_{Generated for commit: 4371fb7. Link to the linter CI: here}

ogrisel · 2024-06-12T08:01:43Z

The upcomping scipy 1.14.0 release will ship a array API support for this function via a generic fallback.

https://github.com/scipy/scipy/blob/3aa715425291727cfd63d3929802d242681318ba/scipy/special/_support_alternative_backends.py#L76-L81

Since it is not released yet, we have two option:

wait for scipy 1.14.0 to be released we should happen soon-ish and skip the tests for mean_poisson_deviance for sp_version < parse_version("1.14.0"): once scipy is released and our CI dependencies are updated, this test will start to not be skipped anymore.
backport a similar generic fallback for xlogy under our own sklearn.utils._array_api module and use that instead of scipy.special.xlogy. Note that when xp is a NumPy-based namespace, we should still use scipy.special.xlogy (which is faster for NumPy because written in Cython) instead of the generic implementation.

EmilyXinyi · 2024-06-13T09:38:36Z

The upcomping scipy 1.14.0 release will ship a array API support for this function via a generic fallback.

https://github.com/scipy/scipy/blob/3aa715425291727cfd63d3929802d242681318ba/scipy/special/_support_alternative_backends.py#L76-L81

Since it is not released yet, we have two option:

wait for scipy 1.14.0 to be released we should happen soon-ish and skip the tests for mean_poisson_deviance for sp_version < parse_version("1.14.0"): once scipy is released and our CI dependencies are updated, this test will start to not be skipped anymore.

backport a similar generic fallback for xlogy under our own sklearn.utils._array_api module and use that instead of scipy.special.xlogy. Note that when xp is a NumPy-based namespace, we should still use scipy.special.xlogy (which is faster for NumPy because written in Cython) instead of the generic implementation.

Thank you @ogrisel. I have skipped the tests for now. I chose to skip the tests because we are not changing anything in the function itself but only skipping tests that we know are not failing, so I think it's the safer option.

ogrisel

We would also need to set the SCIPY_ARRAY_API environment variable to "1" in our CI to enable scipy support for array API (see https://docs.scipy.org/doc/scipy/dev/api-dev/array_api.html#using-array-api-standard-support).

To do that, you can edit the pylatest_conda_forge_mkl config in azure-pipelines.yml to set this variable, similarly to what we do for other environment variables such as SKLEARN_TESTS_GLOBAL_RANDOM_SEED for instance.

We would also need to do something similar in .github/workflows/cuda-gpu-ci.yml. In this case you can just edit the last step to replace:

pytest -k 'array_api'

by:

SCIPY_ARRAY_API=1 pytest -k 'array_api'

doc/modules/array_api.rst

ogrisel · 2024-06-13T12:49:02Z

Could you also please launch the tests yourself to check that they pass with the nightly build of scipy?

You can install the nightly dev builds of scipy (and numpy, just in case) with the following command:

pip install --pre --extra-index https://pypi.anaconda.org/scientific-python-nightly-wheels/simple numpy scipy

I would recommend not doing that in your usual dev env but rather create a dedicated conda environment for those tests.

Alternatively you can use a through away google colab env setup with the help of this notebook (to customize to install the scipy dev version):

https://gist.github.com/EdAbati/ff3bdc06bafeb92452b3740686cc8d7c

Doing this on colab would also make it possible to test with cuda devices.

EmilyXinyi · 2024-06-13T14:26:46Z

I have made the suggested changes to azure-pipelines.yml and .github/workflows/cuda-gpu-ci.yml, and here is the colab link for cuda and nightly build of scipy:
https://colab.research.google.com/drive/1Dw6ABHw-2wyjlzUQKb4VDJw_51mGZMLn?usp=sharing
Seems like the tests are passing with CUDA but doc tests are failing on CircleCI...
I am also not sure why it shows that I changed 300+ files... I am assuming it's because of the yml file changes

betatim · 2024-06-13T14:43:39Z

Since it is not released yet, we have two option:

* wait for scipy 1.14.0 to be released we should happen soon-ish and skip the tests for `mean_poisson_deviance` for `sp_version < parse_version("1.14.0")`: once scipy is released and our CI dependencies are updated, this test will start to not be skipped anymore.

* backport a similar generic fallback for `xlogy` under our own `sklearn.utils._array_api` module and use that instead of `scipy.special.xlogy`. Note that when `xp` is a NumPy-based namespace, we should still use `scipy.special.xlogy` (which is faster for NumPy because written in Cython) instead of the generic implementation.

If we only do the "wait for scipy 1.14.0" then we will introduce a minimum requirement on the scipy version for array API support to work right?

I'm not sure if that is a bad thing or not. However if we do we need some way to express that (version check when array API is enabled via set_config()?) to the user. As well as some documentation about it. In addition it means we need to figure out what to do about the environment variable that needs setting. Should scikit-learn set this for the user? Probably not. Should we instead check when enabling array API support and print a warning to the user?

In general I think minimising the number of functions implemented in sklearn.utils._array_api is a good thing. So this would imply we should wait.

ogrisel · 2024-06-13T15:55:47Z

Should we instead check when enabling array API support and print a warning to the user?

That sounds like a good idea.

Should scikit-learn set this for the user?

If scipy has already been imported, it's too late (I tried in a shell). The environment variable should be set before scipy is initialized apparently.

ogrisel · 2024-06-13T16:00:18Z

@EmilyXinyi the main has evolved concurrently and there is now a bunch of conflicts. Would you mind updating the PR to resolve those. Please let me know if you need help with that.

ogrisel · 2024-06-13T16:03:58Z

BTW, to set the environment variable in the colab notebook, you would need to do:

import os

os.environ["SCIPY_ARRAY_API"] = "1"

in one of the first cells, before importing sklearn or scipy.

EDIT: @EdAbati you might want to update your colab notebook accordingly, in case this PR or other similar PRs that need array API support for scipy functions are merged into main.

EmilyXinyi · 2024-06-14T08:15:35Z

@EmilyXinyi the main has evolved concurrently and there is now a bunch of conflicts. Would you mind updating the PR to resolve those. Please let me know if you need help with that.

I think I have resolved the conflicts, but strangely enough the changes that happened on main ended up in my "resolve merge conflict" commit... If there is a way to avoid that please let me know, thank you!

EmilyXinyi · 2024-06-14T08:26:01Z

Should we instead check when enabling array API support and print a warning to the user?

Should this change be made this PR? (If yes, could anyone help me with it? Thanks!)

sklearn/metrics/pairwise.py

ogrisel · 2024-06-20T09:18:48Z

I have made the suggested changes to azure-pipelines.yml and .github/workflows/cuda-gpu-ci.yml, and here is the colab link for cuda and nightly build of scipy:
https://colab.research.google.com/drive/1Dw6ABHw-2wyjlzUQKb4VDJw_51mGZMLn?usp=sharing
Seems like the tests are passing with CUDA

Actually the mean_poisson_deviance tests are skipped. I suppose there is a problem with the handling scipy version number:

klearn/metrics/tests/test_common.py::test_array_api_compliance[mean_poisson_deviance-check_array_api_regression_metric-numpy-None-None] SKIPPED [ 23%]
sklearn/metrics/tests/test_common.py::test_array_api_compliance[mean_poisson_deviance-check_array_api_regression_metric-array_api_strict-None-None] SKIPPED [ 23%]
sklearn/metrics/tests/test_common.py::test_array_api_compliance[mean_poisson_deviance-check_array_api_regression_metric-cupy-None-None] SKIPPED [ 23%]
sklearn/metrics/tests/test_common.py::test_array_api_compliance[mean_poisson_deviance-check_array_api_regression_metric-cupy.array_api-None-None] SKIPPED [ 23%]
sklearn/metrics/tests/test_common.py::test_array_api_compliance[mean_poisson_deviance-check_array_api_regression_metric-torch-cpu-float64] SKIPPED [ 23%]
sklearn/metrics/tests/test_common.py::test_array_api_compliance[mean_poisson_deviance-check_array_api_regression_metric-torch-cpu-float32] SKIPPED [ 24%]
sklearn/metrics/tests/test_common.py::test_array_api_compliance[mean_poisson_deviance-check_array_api_regression_metric-torch-cuda-float64] SKIPPED [ 24%]
sklearn/metrics/tests/test_common.py::test_array_api_compliance[mean_poisson_deviance-check_array_api_regression_metric-torch-cuda-float32] SKIPPED [ 24%]
sklearn/metrics/tests/test_common.py::test_array_api_compliance[mean_poisson_deviance-check_array_api_regression_metric-torch-mps-float32] SKIPPED [ 24%]

EDIT: looking at the notebook, it's not installing the scipy dev version:

! cd scikit-learn && pip install --pre --extra-index https://pypi.anaconda.org/scientific-python-nightly-wheels/simple numpy scipy

Looking in indexes: https://pypi.org/simple, https://pypi.anaconda.org/scientific-python-nightly-wheels/simple
Requirement already satisfied: numpy in /usr/local/lib/python3.10/dist-packages (1.25.2)
Requirement already satisfied: scipy in /usr/local/lib/python3.10/dist-packages (1.11.4)

We might need to add a --upgrade or --force-reinstall flag to that command. Or both flags.

Then check the output of the cell with sklearn.show_versions() to make sure that it's using the dev version of scipy.

ogrisel · 2024-06-20T09:25:06Z

@EmilyXinyi I merged the main branch into this PR to hide redundant changes related to the chi2_kernel thingy. Be sure to pull this merge commit to your local branch before adding any new commit to it.

ogrisel · 2024-06-20T09:29:16Z

Should we instead check when enabling array API support and print a warning to the user?
Should this change be made this PR? (If yes, could anyone help me with it? Thanks!)

I think so. I think the best place to do this is in _check_array_api_dispatch which is called by sklearn.set_config(array_api_dispatch=True/False).

We would need to raise a warning when os.environ.get("SCIPY_ARRAY_API") != "1" to tell the user that some scikit-learn array API features might rely on enabling scipy's own support for array API to work properly.

EmilyXinyi · 2024-06-20T15:28:06Z

Just a note: please ignore all the recent commits and force pushes. I think I did a merge instead of rebase and a bunch of files I didn't touch changed.

sklearn/utils/_array_api.py

EmilyXinyi · 2024-06-20T15:49:14Z

We might need to add a --upgrade or --force-reinstall flag to that command. Or both flags.

Then check the output of the cell with sklearn.show_versions() to make sure that it's using the dev version of scipy.

Using this as thread to see how the CUDA situation goes. (Here is the link again: https://colab.research.google.com/drive/1Dw6ABHw-2wyjlzUQKb4VDJw_51mGZMLn?usp=sharing)

NumPy's nightly build (basically version 2.0.0 and above) has dependency issue, but Scipy's dev version can be installed using pip install --pre --upgrade scipy .
This installs scipy 1.14.0 and the CUDA tests run now. After merging with main, the failures regarding accuracy_score and zero_one_loss are no longer there.

sklearn/metrics/tests/test_common.py::test_array_api_compliance[mean_poisson_deviance-check_array_api_regression_metric-numpy-None-None] PASSED [ 18%]
sklearn/metrics/tests/test_common.py::test_array_api_compliance[mean_poisson_deviance-check_array_api_regression_metric-array_api_strict-None-None] SKIPPED [ 18%]
sklearn/metrics/tests/test_common.py::test_array_api_compliance[mean_poisson_deviance-check_array_api_regression_metric-cupy-None-None] PASSED [ 18%]
sklearn/metrics/tests/test_common.py::test_array_api_compliance[mean_poisson_deviance-check_array_api_regression_metric-cupy.array_api-None-None] PASSED [ 18%]
sklearn/metrics/tests/test_common.py::test_array_api_compliance[mean_poisson_deviance-check_array_api_regression_metric-torch-cpu-float64] PASSED [ 18%]
sklearn/metrics/tests/test_common.py::test_array_api_compliance[mean_poisson_deviance-check_array_api_regression_metric-torch-cpu-float32] PASSED [ 18%]
sklearn/metrics/tests/test_common.py::test_array_api_compliance[mean_poisson_deviance-check_array_api_regression_metric-torch-cuda-float64] PASSED [ 19%]
sklearn/metrics/tests/test_common.py::test_array_api_compliance[mean_poisson_deviance-check_array_api_regression_metric-torch-cuda-float32] PASSED [ 19%]
sklearn/metrics/tests/test_common.py::test_array_api_compliance[mean_poisson_deviance-check_array_api_regression_metric-torch-mps-float32] SKIPPED [ 19%]

ogrisel

The PR diff looks good to me. But I would wait for #29436 to be merged first and then merge main into this PR and re-trigger the CUDA CI to make sure that all tests pass.

Here are a few more minor suggestions.

sklearn/utils/_array_api.py

ogrisel · 2024-07-09T09:10:26Z

azure-pipelines.yml

@@ -260,6 +261,7 @@ jobs:
        DISTRIB: 'conda'
        LOCK_FILE: './build_tools/azure/pylatest_conda_forge_mkl_osx-64_conda.lock'
        SKLEARN_TESTS_GLOBAL_RANDOM_SEED: '5'  # non-default seed
+        SCIPY_ARRAY_API=1


If #29436 is merged first, a similar change will be needed for pylatest_pip_openblas_pandas.

ogrisel · 2024-07-09T16:14:25Z

#29436 was merged. Please feel free to merge main into this PR and address #29227 (comment).

EmilyXinyi · 2024-07-22T10:33:11Z

I closed this by accident while fixing merge conflicts ;-; if a maintainer could reopen this PR it would be great, thank you!

EmilyXinyi · 2024-07-22T10:38:55Z

I added one more change and git is allowing me to reopen it...

sklearn/utils/tests/test_array_api.py

doc/whats_new/v1.6.rst

Co-authored-by: Loïc Estève <loic.esteve@ymail.com>

sklearn/utils/tests/test_array_api.py

OmarManzoor

LGTM. Thanks @EmilyXinyi

lesteve · 2024-07-23T09:37:53Z

I tried running the tests on my machine and I get the following weird failure with array-api-strict, any suggestion (see details below)?

The command I used:

pytest -vl sklearn/metrics/tests/test_common.py -k 'api_regression_metric and mean_poisson_deviance'

I have scipy 1.14, array-api-strict 2.0.1. Not sure what else could explain the discrepancy with the CI ...

❯ pytest -vl sklearn/metrics/tests/test_common.py -k 'api_regression_metric and mean_poisson_deviance'

=========================================================================================================== test session starts ===========================================================================================================
platform linux -- Python 3.12.4, pytest-8.2.2, pluggy-1.5.0 -- /home/lesteve/micromamba/envs/scikit-learn-dev/bin/python
cachedir: .pytest_cache
hypothesis profile 'default' -> database=DirectoryBasedExampleDatabase(PosixPath('/home/lesteve/dev/scikit-learn/.hypothesis/examples'))
rootdir: /home/lesteve/dev/scikit-learn
configfile: setup.cfg
plugins: anyio-4.2.0, hypothesis-6.97.3, xdist-3.5.0, repeat-0.9.3, cov-5.0.0
collected 1648 items / 1639 deselected / 9 selected                                                                                                                                                                                       

sklearn/metrics/tests/test_common.py::test_array_api_compliance[mean_poisson_deviance-check_array_api_regression_metric-numpy-None-None] PASSED                                                                                     [ 11%]
sklearn/metrics/tests/test_common.py::test_array_api_compliance[mean_poisson_deviance-check_array_api_regression_metric-array_api_strict-None-None] FAILED                                                                          [ 22%]
sklearn/metrics/tests/test_common.py::test_array_api_compliance[mean_poisson_deviance-check_array_api_regression_metric-cupy-None-None] SKIPPED (cupy is not installed: not checking array_api input)                               [ 33%]
sklearn/metrics/tests/test_common.py::test_array_api_compliance[mean_poisson_deviance-check_array_api_regression_metric-cupy.array_api-None-None] SKIPPED (cupy.array_api is not installed: not checking array_api input)           [ 44%]
sklearn/metrics/tests/test_common.py::test_array_api_compliance[mean_poisson_deviance-check_array_api_regression_metric-torch-cpu-float64] PASSED                                                                                   [ 55%]
sklearn/metrics/tests/test_common.py::test_array_api_compliance[mean_poisson_deviance-check_array_api_regression_metric-torch-cpu-float32] PASSED                                                                                   [ 66%]
sklearn/metrics/tests/test_common.py::test_array_api_compliance[mean_poisson_deviance-check_array_api_regression_metric-torch-cuda-float64] SKIPPED (PyTorch test requires cuda, which is not available)                            [ 77%]
sklearn/metrics/tests/test_common.py::test_array_api_compliance[mean_poisson_deviance-check_array_api_regression_metric-torch-cuda-float32] SKIPPED (PyTorch test requires cuda, which is not available)                            [ 88%]
sklearn/metrics/tests/test_common.py::test_array_api_compliance[mean_poisson_deviance-check_array_api_regression_metric-torch-mps-float32] SKIPPED (Skipping MPS device test because PYTORCH_ENABLE_MPS_FALLBACK is not set.)       [100%]

================================================================================================================ FAILURES =================================================================================================================
______________________________________________________________ test_array_api_compliance[mean_poisson_deviance-check_array_api_regression_metric-array_api_strict-None-None] ______________________________________________________________

metric = <function mean_poisson_deviance at 0x7360c7d9c680>, array_namespace = 'array_api_strict', device = None, dtype_name = None, check_func = <function check_array_api_regression_metric at 0x7360c7880900>

    @pytest.mark.parametrize(
        "array_namespace, device, dtype_name", yield_namespace_device_dtype_combinations()
    )
    @pytest.mark.parametrize("metric, check_func", yield_metric_checker_combinations())
    def test_array_api_compliance(metric, array_namespace, device, dtype_name, check_func):
>       check_func(metric, array_namespace, device, dtype_name)

array_namespace = 'array_api_strict'
check_func = <function check_array_api_regression_metric at 0x7360c7880900>
device     = None
dtype_name = None
metric     = <function mean_poisson_deviance at 0x7360c7d9c680>

sklearn/metrics/tests/test_common.py:2048: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
sklearn/metrics/tests/test_common.py:1900: in check_array_api_regression_metric
    check_array_api_metric(
        array_namespace = 'array_api_strict'
        device     = None
        dtype_name = None
        func_name  = 'mean_poisson_deviance'
        metric     = <function mean_poisson_deviance at 0x7360c7d9c680>
        metric_kwargs = {'sample_weight': array([0.1, 2. , 1.5, 0.5])}
        metric_params = mappingproxy(OrderedDict({'y_true': <Parameter "y_true">, 'y_pred': <Parameter "y_pred">, 'sample_weight': <Parameter "sample_weight=None">}))
        y_pred_np  = array([0.5, 0.5, 2. , 2. ])
        y_true_np  = array([2. , 0.1, 1. , 4. ])
sklearn/metrics/tests/test_common.py:1773: in check_array_api_metric
    metric_xp = metric(a_xp, b_xp, **metric_kwargs)
        a_np       = array([2. , 0.1, 1. , 4. ])
        a_xp       = Array([2. , 0.1, 1. , 4. ], dtype=array_api_strict.float64)
        array_namespace = 'array_api_strict'
        b_np       = array([0.5, 0.5, 2. , 2. ])
        b_xp       = Array([0.5, 0.5, 2. , 2. ], dtype=array_api_strict.float64)
        device     = None
        dtype_name = None
        metric     = <function mean_poisson_deviance at 0x7360c7d9c680>
        metric_kwargs = {'sample_weight': Array([0.1, 2. , 1.5, 0.5], dtype=array_api_strict.float64)}
        metric_np  = 0.7082657951303076
        multioutput = None
        xp         = <module 'array_api_strict' from '/home/lesteve/micromamba/envs/scikit-learn-dev/lib/python3.12/site-packages/array_api_strict/__init__.py'>
sklearn/utils/_param_validation.py:216: in wrapper
    return func(*args, **kwargs)
        args       = (Array([2. , 0.1, 1. , 4. ], dtype=array_api_strict.float64), Array([0.5, 0.5, 2. , 2. ], dtype=array_api_strict.float64))
        func       = <function mean_poisson_deviance at 0x7360c7d9c5e0>
        func_sig   = <Signature (y_true, y_pred, *, sample_weight=None)>
        global_skip_validation = False
        kwargs     = {'sample_weight': Array([0.1, 2. , 1.5, 0.5], dtype=array_api_strict.float64)}
        parameter_constraints = {'sample_weight': ['array-like', None], 'y_pred': ['array-like'], 'y_true': ['array-like']}
        params     = {'sample_weight': Array([0.1, 2. , 1.5, 0.5], dtype=array_api_strict.float64), 'y_pred': Array([0.5, 0.5, 2. , 2. ], dtype=array_api_strict.float64), 'y_true': Array([2. , 0.1, 1. , 4. ], dtype=array_api_strict.float64)}
        prefer_skip_nested_validation = True
        to_ignore  = ['self', 'cls']
sklearn/metrics/_regression.py:1460: in mean_poisson_deviance
    return mean_tweedie_deviance(y_true, y_pred, sample_weight=sample_weight, power=1)
        sample_weight = Array([0.1, 2. , 1.5, 0.5], dtype=array_api_strict.float64)
        y_pred     = Array([0.5, 0.5, 2. , 2. ], dtype=array_api_strict.float64)
        y_true     = Array([2. , 0.1, 1. , 4. ], dtype=array_api_strict.float64)
sklearn/utils/_param_validation.py:189: in wrapper
    return func(*args, **kwargs)
        args       = (Array([2. , 0.1, 1. , 4. ], dtype=array_api_strict.float64), Array([0.5, 0.5, 2. , 2. ], dtype=array_api_strict.float64))
        func       = <function mean_tweedie_deviance at 0x7360c7d9c4a0>
        global_skip_validation = True
        kwargs     = {'power': 1, 'sample_weight': Array([0.1, 2. , 1.5, 0.5], dtype=array_api_strict.float64)}
        parameter_constraints = {'power': [<sklearn.utils._param_validation.Interval object at 0x7360c7d0d100>, <sklearn.utils._param_validation.Interval object at 0x7360c7d0d130>], 'sample_weight': ['array-like', None], 'y_pred': ['array-like'], 'y_true': ['array-like']}
        prefer_skip_nested_validation = True
sklearn/metrics/_regression.py:1415: in mean_tweedie_deviance
    return _mean_tweedie_deviance(
        _          = None
        message    = 'Mean Tweedie deviance error with power=1 can only be used on '
        power      = 1
        sample_weight = Array([[0.1],
       [2. ],
       [1.5],
       [0.5]], dtype=array_api_strict.float64)
        xp         = <module 'array_api_strict' from '/home/lesteve/micromamba/envs/scikit-learn-dev/lib/python3.12/site-packages/array_api_strict/__init__.py'>
        y_pred     = Array([[0.5],
       [0.5],
       [2. ],
       [2. ]], dtype=array_api_strict.float64)
        y_true     = Array([[2. ],
       [0.1],
       [1. ],
       [4. ]], dtype=array_api_strict.float64)
        y_type     = 'continuous'
sklearn/metrics/_regression.py:1320: in _mean_tweedie_deviance
    return float(_average(dev, weights=sample_weight))
        _          = True
        dev        = array([[2.54517744],
       [0.47811242],
       [0.61370564],
       [1.54517744]])
        p          = 1
        power      = 1
        sample_weight = Array([[0.1],
       [2. ],
       [1.5],
       [0.5]], dtype=array_api_strict.float64)
        xp         = <module 'array_api_strict' from '/home/lesteve/micromamba/envs/scikit-learn-dev/lib/python3.12/site-packages/array_api_strict/__init__.py'>
        y_pred     = Array([[0.5],
       [0.5],
       [2. ],
       [2. ]], dtype=array_api_strict.float64)
        y_true     = Array([[2. ],
       [0.1],
       [1. ],
       [4. ]], dtype=array_api_strict.float64)
        zero       = Array(0., dtype=array_api_strict.float64)
sklearn/utils/_array_api.py:725: in _average
    xp, _, device_ = get_namespace_and_device(a, weights)
        a          = array([[2.54517744],
       [0.47811242],
       [0.61370564],
       [1.54517744]])
        axis       = None
        normalize  = True
        weights    = Array([[0.1],
       [2. ],
       [1.5],
       [0.5]], dtype=array_api_strict.float64)
        xp         = None
sklearn/utils/_array_api.py:630: in get_namespace_and_device
    xp, is_array_api = get_namespace(*array_list, **skip_remove_kwargs)
        array_list = [array([[2.54517744],
       [0.47811242],
       [0.61370564],
       [1.54517744]]), Array([[0.1],
       [2. ],
       [1.5],
       [0.5]], dtype=array_api_strict.float64)]
        remove_none = True
        remove_types = (<class 'str'>,)
        skip_remove_kwargs = {'remove_none': False, 'remove_types': []}
sklearn/utils/_array_api.py:586: in get_namespace
    namespace, is_array_api_compliant = array_api_compat.get_namespace(*arrays), True
        array_api_compat = <module 'array_api_compat' from '/home/lesteve/micromamba/envs/scikit-learn-dev/lib/python3.12/site-packages/array_api_compat/__init__.py'>
        array_api_dispatch = True
        arrays     = [array([[2.54517744],
       [0.47811242],
       [0.61370564],
       [1.54517744]]), Array([[0.1],
       [2. ],
       [1.5],
       [0.5]], dtype=array_api_strict.float64)]
        remove_none = False
        remove_types = []
        xp         = None
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

api_version = None, use_compat = None, xs = (array([[2.54517744],
       [0.47811242],
       [0.61370564],
       [1.54517744]]), Array([[0.1],
       [2. ],
       [1.5],
       [0.5]], dtype=array_api_strict.float64))
_use_compat = True

    def array_namespace(*xs, api_version=None, use_compat=None):
        """
        Get the array API compatible namespace for the arrays `xs`.
    
        Parameters
        ----------
        xs: arrays
            one or more arrays.
    
        api_version: str
            The newest version of the spec that you need support for (currently
            the compat library wrapped APIs support v2022.12).
    
        use_compat: bool or None
            If None (the default), the native namespace will be returned if it is
            already array API compatible, otherwise a compat wrapper is used. If
            True, the compat library wrapped library will be returned. If False,
            the native library namespace is returned.
    
        Returns
        -------
    
        out: namespace
            The array API compatible namespace corresponding to the arrays in `xs`.
    
        Raises
        ------
        TypeError
            If `xs` contains arrays from different array libraries or contains a
            non-array.
    
    
        Typical usage is to pass the arguments of a function to
        `array_namespace()` at the top of a function to get the corresponding
        array API namespace:
    
        .. code:: python
    
           def your_function(x, y):
               xp = array_api_compat.array_namespace(x, y)
               # Now use xp as the array library namespace
               return xp.mean(x, axis=0) + 2*xp.std(y, axis=0)
    
    
        Wrapped array namespaces can also be imported directly. For example,
        `array_namespace(np.array(...))` will return `array_api_compat.numpy`.
        This function will also work for any array library not wrapped by
        array-api-compat if it explicitly defines `__array_namespace__
        <https://data-apis.org/array-api/latest/API_specification/generated/array_api.array.__array_namespace__.html>`__
        (the wrapped namespace is always preferred if it exists).
    
        See Also
        --------
    
        is_array_api_obj
        is_numpy_array
        is_cupy_array
        is_torch_array
        is_dask_array
        is_jax_array
        is_pydata_sparse_array
    
        """
        if use_compat not in [None, True, False]:
            raise ValueError("use_compat must be None, True, or False")
    
        _use_compat = use_compat in [None, True]
    
        namespaces = set()
        for x in xs:
            if is_numpy_array(x):
                from .. import numpy as numpy_namespace
                import numpy as np
                if use_compat is True:
                    _check_api_version(api_version)
                    namespaces.add(numpy_namespace)
                elif use_compat is False:
                    namespaces.add(np)
                else:
                    # numpy 2.0 has __array_namespace__ and is fully array API
                    # compatible.
                    if hasattr(x, '__array_namespace__'):
                        namespaces.add(x.__array_namespace__(api_version=api_version))
                    else:
                        namespaces.add(numpy_namespace)
            elif is_cupy_array(x):
                if _use_compat:
                    _check_api_version(api_version)
                    from .. import cupy as cupy_namespace
                    namespaces.add(cupy_namespace)
                else:
                    import cupy as cp
                    namespaces.add(cp)
            elif is_torch_array(x):
                if _use_compat:
                    _check_api_version(api_version)
                    from .. import torch as torch_namespace
                    namespaces.add(torch_namespace)
                else:
                    import torch
                    namespaces.add(torch)
            elif is_dask_array(x):
                if _use_compat:
                    _check_api_version(api_version)
                    from ..dask import array as dask_namespace
                    namespaces.add(dask_namespace)
                else:
                    import dask.array as da
                    namespaces.add(da)
            elif is_jax_array(x):
                if use_compat is True:
                    _check_api_version(api_version)
                    raise ValueError("JAX does not have an array-api-compat wrapper")
                elif use_compat is False:
                    import jax.numpy as jnp
                else:
                    # jax.experimental.array_api is already an array namespace. We do
                    # not have a wrapper submodule for it.
                    import jax.experimental.array_api as jnp
                namespaces.add(jnp)
            elif is_pydata_sparse_array(x):
                if use_compat is True:
                    _check_api_version(api_version)
                    raise ValueError("`sparse` does not have an array-api-compat wrapper")
                else:
                    import sparse
                # `sparse` is already an array namespace. We do not have a wrapper
                # submodule for it.
                namespaces.add(sparse)
            elif hasattr(x, '__array_namespace__'):
                if use_compat is True:
                    raise ValueError("The given array does not have an array-api-compat wrapper")
                namespaces.add(x.__array_namespace__(api_version=api_version))
            else:
                # TODO: Support Python scalars?
                raise TypeError(f"{type(x).__name__} is not a supported array type")
    
        if not namespaces:
            raise TypeError("Unrecognized array input")
    
        if len(namespaces) != 1:
>           raise TypeError(f"Multiple namespaces for array inputs: {namespaces}")
E           TypeError: Multiple namespaces for array inputs: {<module 'array_api_strict' from '/home/lesteve/micromamba/envs/scikit-learn-dev/lib/python3.12/site-packages/array_api_strict/__init__.py'>, <module 'numpy' from '/home/lesteve/micromamba/envs/scikit-learn-dev/lib/python3.12/site-packages/numpy/__init__.py'>}

_use_compat = True
api_version = None
namespaces = {<module 'array_api_strict' from '/home/lesteve/micromamba/envs/scikit-learn-dev/lib/python3.12/site-packages/array_ap... <module 'numpy' from '/home/lesteve/micromamba/envs/scikit-learn-dev/lib/python3.12/site-packages/numpy/__init__.py'>}
np         = <module 'numpy' from '/home/lesteve/micromamba/envs/scikit-learn-dev/lib/python3.12/site-packages/numpy/__init__.py'>
numpy_namespace = <module 'array_api_compat.numpy' from '/home/lesteve/micromamba/envs/scikit-learn-dev/lib/python3.12/site-packages/array_api_compat/numpy/__init__.py'>
use_compat = None
x          = Array([[0.1],
       [2. ],
       [1.5],
       [0.5]], dtype=array_api_strict.float64)
xs         = (array([[2.54517744],
       [0.47811242],
       [0.61370564],
       [1.54517744]]), Array([[0.1],
       [2. ],
       [1.5],
       [0.5]], dtype=array_api_strict.float64))

../../micromamba/envs/scikit-learn-dev/lib/python3.12/site-packages/array_api_compat/common/_helpers.py:372: TypeError
========================================================================================================= short test summary info =========================================================================================================
FAILED sklearn/metrics/tests/test_common.py::test_array_api_compliance[mean_poisson_deviance-check_array_api_regression_metric-array_api_strict-None-None] - TypeError: Multiple namespaces for array inputs: {<module 'array_api_strict' from '/home/lesteve/micromamba/envs/scikit-learn-dev/lib/python3.12/site-packages/array_api_strict/__init__.py'>, <module 'numpy' from '/home/lesteve/mic...
================================================================================== 1 failed, 3 passed, 5 skipped, 1639 deselected, 40 warnings in 0.97s ===================================================================================

OmarManzoor · 2024-07-23T09:42:29Z

I got this error too. Maybe this is related to an older version of numpy? I am updating numpy to check.

Edit: This doesn't seem to be the case.

EmilyXinyi · 2024-07-23T09:44:24Z

All tests pass on my end with pytest -vl sklearn/metrics/tests/test_common.py -k. Let me check what is my configuration
Numpy = 1.26.4
SciPy = 1.13.1

OmarManzoor · 2024-07-23T09:50:05Z

@lesteve Add this environment variable
SCIPY_ARRAY_API=1

lesteve · 2024-07-23T10:12:19Z

OK I confirm that SCIPY_ARRAY_API=1 pytest -vl sklearn/metrics/tests/test_common.py -k 'api_regression_metric and mean_poisson_deviance' passes.

Honestly I think this is fair to say that as a newcomer to the array API work, my feeling is that it can break way too easily in very mysterious ways. There are probably a few unrelated reasons for this but I guess since at the last bi-weekly meetings we agreed that we should try to pay attention to this kind of breakages and improve the situation, here are a few comments on this PR:

I think we should have some tests with scipy>=1.14 and SCIPY_ARRAY_API unset. This seems something that could easily happen in practice and we should have tests that make sure that nothing breaks. It's not clear to me why only array-api-strict breaks and not pytorch example, maybe as the name implies array-api-strict is even stricter than other namespaces.
I don't think this is acceptable to skip the test for scipy<1.14 since you know a reasonable user may use array_api_dispatch=True with scipy<1.14. To be fair it is not clear to me what happens in this case. I guess scipy will transform the array into a numpy array, and maybe the code afterwards has issues because some arrays are not in the same namespace? Anyway, we should make sure that the user code does not break. If this is really too hard to achieve the alternative is to raise an exception saying something like "to enable array API dispatch with this particular function you need scipy 1.14 and SCIPY_ARRAY_API=1".
the warning about SCIPY_ARRAY_API seems too broad (as soon as set_config(array_api_dispatch=True) is called?). Could we not have warnings only in a few (at least for now) places where we know we call scipy functions?

EmilyXinyi · 2024-07-23T10:19:21Z

I don't think this is acceptable to skip the test for scipy<1.14 since you know a reasonable user may use array_api_dispatch=True with scipy<1.14. To be fair it is not clear to me what happens in this case. I guess scipy will transform the array into a numpy array, and maybe the code afterwards has issues because some arrays are not in the same namespace? Anyway, we should make sure that the user code does not break. If this is really too hard to achieve the alternative is to raise an exception saying something like "to enable array API dispatch with this particular function you need scipy 1.14 and SCIPY_ARRAY_API=1".

the warning about SCIPY_ARRAY_API seems too broad (as soon as set_config(array_api_dispatch=True) is called?). Could we not have warnings only in a few (at least for now) places where we know we call scipy functions?

Should we open an issue to discuss these things?

OmarManzoor · 2024-07-23T10:23:24Z

I agree that it is a bit inconvenient if this breaks like this and a more informative error might be better. I am not sure how feasible it is to handle this because scipy is involved and if the env variable is not set I think scipy returns numpy arrays instead of the actual api's arrays which can cause conflicts.

lesteve · 2024-07-23T14:50:46Z

Should we open an issue to discuss these things?

Yep I opened #29549 with my original comment

Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> Co-authored-by: Omar Salman <omar.salman@arbisoft.com> Co-authored-by: Loïc Estève <loic.esteve@ymail.com>

github-actions bot added the module:metrics label Jun 10, 2024

ogrisel added the Array API label Jun 12, 2024

EmilyXinyi marked this pull request as ready for review June 13, 2024 09:38

EmilyXinyi changed the title ~~WIP array API support for mean_poisson_deviance~~ array API support for mean_poisson_deviance Jun 13, 2024

ogrisel reviewed Jun 13, 2024

View reviewed changes

doc/modules/array_api.rst Outdated Show resolved Hide resolved

EdAbati reviewed Jun 15, 2024

View reviewed changes

sklearn/metrics/pairwise.py Outdated Show resolved Hide resolved

EmilyXinyi force-pushed the array_API_mean_poisson_deviance branch 2 times, most recently from 4e1ed08 to 4618be3 Compare June 20, 2024 15:23

EmilyXinyi commented Jun 20, 2024

View reviewed changes

sklearn/utils/_array_api.py Show resolved Hide resolved

EmilyXinyi force-pushed the array_API_mean_poisson_deviance branch from 3750613 to 21f658c Compare July 3, 2024 09:48

ogrisel approved these changes Jul 9, 2024

View reviewed changes

EmilyXinyi force-pushed the array_API_mean_poisson_deviance branch 2 times, most recently from 5d16602 to c32d6e1 Compare July 10, 2024 13:12

EmilyXinyi closed this Jul 22, 2024

EmilyXinyi force-pushed the array_API_mean_poisson_deviance branch from 8579f6f to 211a7f1 Compare July 22, 2024 10:28

add monkey patch test

9c7df88

EmilyXinyi reopened this Jul 22, 2024

betatim added CUDA CI and removed CUDA CI labels Jul 22, 2024

github-actions bot removed the CUDA CI label Jul 22, 2024

OmarManzoor reviewed Jul 23, 2024

View reviewed changes

sklearn/utils/tests/test_array_api.py Show resolved Hide resolved

OmarManzoor and others added 2 commits July 23, 2024 12:33

Update sklearn/utils/tests/test_array_api.py

68e051a

Merge branch 'main' into array_API_mean_poisson_deviance

9d67092

lesteve reviewed Jul 23, 2024

View reviewed changes

doc/whats_new/v1.6.rst Outdated Show resolved Hide resolved

Update doc/whats_new/v1.6.rst

9bed7ee

Co-authored-by: Loïc Estève <loic.esteve@ymail.com>

lesteve reviewed Jul 23, 2024

View reviewed changes

sklearn/utils/tests/test_array_api.py Outdated Show resolved Hide resolved

OmarManzoor reviewed Jul 23, 2024

View reviewed changes

sklearn/utils/tests/test_array_api.py Outdated Show resolved Hide resolved

Update sklearn/utils/tests/test_array_api.py

4371fb7

OmarManzoor approved these changes Jul 23, 2024

View reviewed changes

OmarManzoor merged commit 68b71b5 into scikit-learn:main Jul 23, 2024
29 of 30 checks passed

lesteve mentioned this pull request Jul 23, 2024

Follow-up after mean_poisson_deviance array API PR #29549

Closed

OmarManzoor mentioned this pull request Sep 18, 2024

Make more of the "tools" of scikit-learn Array API compatible #26024

Open

Uh oh!

array API support for mean_poisson_deviance #29227

array API support for mean_poisson_deviance #29227

Uh oh!

Conversation

EmilyXinyi commented Jun 10, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

github-actions bot commented Jun 10, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✔️ Linting Passed

Uh oh!

ogrisel commented Jun 12, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

EmilyXinyi commented Jun 13, 2024

Uh oh!

ogrisel left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ogrisel commented Jun 13, 2024

Uh oh!

EmilyXinyi commented Jun 13, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

betatim commented Jun 13, 2024

Uh oh!

ogrisel commented Jun 13, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ogrisel commented Jun 13, 2024

Uh oh!

ogrisel commented Jun 13, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

EmilyXinyi commented Jun 14, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

EmilyXinyi commented Jun 14, 2024

Uh oh!

Uh oh!

ogrisel commented Jun 20, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ogrisel commented Jun 20, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ogrisel commented Jun 20, 2024

Uh oh!

EmilyXinyi commented Jun 20, 2024

Uh oh!

Uh oh!

EmilyXinyi commented Jun 20, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ogrisel left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

ogrisel Jul 9, 2024

Choose a reason for hiding this comment

Uh oh!

ogrisel commented Jul 9, 2024

Uh oh!

EmilyXinyi commented Jul 22, 2024

Uh oh!

EmilyXinyi commented Jul 22, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

OmarManzoor left a comment

Choose a reason for hiding this comment

EmilyXinyi commented Jun 10, 2024 •

edited

Loading

github-actions bot commented Jun 10, 2024 •

edited

Loading

ogrisel commented Jun 12, 2024 •

edited

Loading

EmilyXinyi commented Jun 13, 2024 •

edited

Loading

ogrisel commented Jun 13, 2024 •

edited

Loading

ogrisel commented Jun 13, 2024 •

edited

Loading

EmilyXinyi commented Jun 14, 2024 •

edited

Loading

ogrisel commented Jun 20, 2024 •

edited

Loading

ogrisel commented Jun 20, 2024 •

edited

Loading

EmilyXinyi commented Jun 20, 2024 •

edited

Loading

ogrisel left a comment •

edited

Loading

lesteve commented Jul 23, 2024 •

edited

Loading

OmarManzoor commented Jul 23, 2024 •

edited

Loading

EmilyXinyi commented Jul 23, 2024 •

edited

Loading

lesteve commented Jul 23, 2024 •

edited

Loading