-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
array API support for mean_poisson_deviance #29227
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
array API support for mean_poisson_deviance #29227
Conversation
The upcomping scipy 1.14.0 release will ship a array API support for this function via a generic fallback. Since it is not released yet, we have two option:
|
Thank you @ogrisel. I have skipped the tests for now. I chose to skip the tests because we are not changing anything in the function itself but only skipping tests that we know are not failing, so I think it's the safer option. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We would also need to set the SCIPY_ARRAY_API
environment variable to "1"
in our CI to enable scipy support for array API (see https://docs.scipy.org/doc/scipy/dev/api-dev/array_api.html#using-array-api-standard-support).
To do that, you can edit the pylatest_conda_forge_mkl
config in azure-pipelines.yml
to set this variable, similarly to what we do for other environment variables such as SKLEARN_TESTS_GLOBAL_RANDOM_SEED
for instance.
We would also need to do something similar in .github/workflows/cuda-gpu-ci.yml
. In this case you can just edit the last step to replace:
pytest -k 'array_api'
by:
SCIPY_ARRAY_API=1 pytest -k 'array_api'
Could you also please launch the tests yourself to check that they pass with the nightly build of scipy? You can install the nightly dev builds of scipy (and numpy, just in case) with the following command:
I would recommend not doing that in your usual dev env but rather create a dedicated conda environment for those tests. Alternatively you can use a through away google colab env setup with the help of this notebook (to customize to install the scipy dev version): https://gist.github.com/EdAbati/ff3bdc06bafeb92452b3740686cc8d7c Doing this on colab would also make it possible to test with cuda devices. |
I have made the suggested changes to |
If we only do the "wait for scipy 1.14.0" then we will introduce a minimum requirement on the scipy version for array API support to work right? I'm not sure if that is a bad thing or not. However if we do we need some way to express that (version check when array API is enabled via In general I think minimising the number of functions implemented in |
That sounds like a good idea.
If scipy has already been imported, it's too late (I tried in a shell). The environment variable should be set before scipy is initialized apparently. |
@EmilyXinyi the |
BTW, to set the environment variable in the colab notebook, you would need to do: import os
os.environ["SCIPY_ARRAY_API"] = "1" in one of the first cells, before importing EDIT: @EdAbati you might want to update your colab notebook accordingly, in case this PR or other similar PRs that need array API support for scipy functions are merged into |
I think I have resolved the conflicts, but strangely enough the changes that happened on main ended up in my "resolve merge conflict" commit... If there is a way to avoid that please let me know, thank you! |
Should this change be made this PR? (If yes, could anyone help me with it? Thanks!) |
Actually the
EDIT: looking at the notebook, it's not installing the scipy dev version:
We might need to add a Then check the output of the cell with |
@EmilyXinyi I merged the |
I think so. I think the best place to do this is in We would need to raise a warning when |
4e1ed08
to
4618be3
Compare
Just a note: please ignore all the recent commits and force pushes. I think I did a merge instead of rebase and a bunch of files I didn't touch changed. |
Using this as thread to see how the CUDA situation goes. (Here is the link again: https://colab.research.google.com/drive/1Dw6ABHw-2wyjlzUQKb4VDJw_51mGZMLn?usp=sharing) NumPy's nightly build (basically version 2.0.0 and above) has dependency issue, but Scipy's dev version can be installed using
|
3750613
to
21f658c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PR diff looks good to me. But I would wait for #29436 to be merged first and then merge main
into this PR and re-trigger the CUDA CI to make sure that all tests pass.
Here are a few more minor suggestions.
azure-pipelines.yml
Outdated
@@ -260,6 +261,7 @@ jobs: | |||
DISTRIB: 'conda' | |||
LOCK_FILE: './build_tools/azure/pylatest_conda_forge_mkl_osx-64_conda.lock' | |||
SKLEARN_TESTS_GLOBAL_RANDOM_SEED: '5' # non-default seed | |||
SCIPY_ARRAY_API=1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If #29436 is merged first, a similar change will be needed for pylatest_pip_openblas_pandas
.
#29436 was merged. Please feel free to merge |
5d16602
to
c32d6e1
Compare
8579f6f
to
211a7f1
Compare
I closed this by accident while fixing merge conflicts ;-; if a maintainer could reopen this PR it would be great, thank you! |
I added one more change and git is allowing me to reopen it... |
Co-authored-by: Loïc Estève <loic.esteve@ymail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks @EmilyXinyi
I tried running the tests on my machine and I get the following weird failure with array-api-strict, any suggestion (see details below)? The command I used:
I have scipy 1.14, array-api-strict 2.0.1. Not sure what else could explain the discrepancy with the CI ...
|
I got this error too. Maybe this is related to an older version of numpy? I am updating numpy to check. Edit: This doesn't seem to be the case. |
All tests pass on my end with |
@lesteve Add this environment variable |
OK I confirm that Honestly I think this is fair to say that as a newcomer to the array API work, my feeling is that it can break way too easily in very mysterious ways. There are probably a few unrelated reasons for this but I guess since at the last bi-weekly meetings we agreed that we should try to pay attention to this kind of breakages and improve the situation, here are a few comments on this PR:
|
Should we open an issue to discuss these things? |
I agree that it is a bit inconvenient if this breaks like this and a more informative error might be better. I am not sure how feasible it is to handle this because scipy is involved and if the env variable is not set I think scipy returns numpy arrays instead of the actual api's arrays which can cause conflicts. |
Yep I opened #29549 with my original comment |
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> Co-authored-by: Omar Salman <omar.salman@arbisoft.com> Co-authored-by: Loïc Estève <loic.esteve@ymail.com>
Reference Issues/PRs
towards #26024
What does this implement/fix? Explain your changes.
add array API support for mean_poisson_deviance
Any other comments?
there is a dependency on
scipy.special.xlogy
(insklearn.metrics._regression._mean_tweedie_deviance
, used by this function) which does not have an equivalent array API compatible version. I am unsure what is the best way to approach this: create a wrapper, implement our own function that accomplishes the same task, open an issue in the array-api repo about this function (maybe we can help contributing to this function too), or something else?