-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
WIP Enabling different array types (CuPy) in PCA with NEP 37 #16574
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP Enabling different array types (CuPy) in PCA with NEP 37 #16574
Conversation
In some discussions this week (maybe with @rgommers ?) the concern was raised that it will be awkward if each library using this interface will add a config flag, and that it might be nicer if the opt-in was on the level of numpy. But I'm not entirely sure if this is possible. |
Yes we did discuss this sklearn context manager idea, also with one or both of @jakirkham or @jni I believe. This PR is an interesting experiment. It would be useful to step back and figure out how this will look to the end user once more libraries start doing this. Options:
Perhaps there are more options? Dask and CuPy have already done (3) for |
Another thing to discuss is having a context manager in the first place. E.g. @shoyer was very much against that when it was proposed with |
Since I see some activity, a while ago, we decided that we should explore both things, if easier in an (as private as possible) way in the master branch, with the understanding that it will by default be removed for the next release, unless we agree otherwise. EDIT: Sorry, the main point was this: I am not aware of anyone actively trying to do this/experimenting with it, so picking it up, is welcome. |
@seberg I wanted to give this PR a closer look and I found it needed a couple of fixes + a merge from master to take into account recent changes to get the CI up and running. I tested it locally with jax and found an issue with read-only intermediate data when
Can you give a bit more details? Another problem will be to deal with AFAIK |
@ogrisel have to go in details later, but basically. I think we have not reached the point where we are sure that
You could check here a bit: https://github.com/numpy/archive/blob/master/other_meetings/2020-04-21-array-protocols_discussion_and_notes.md although mostly what is under "Agenda" is what we discussed, after that is mainly thoughts by either me and Hameer compiled before the meeting... Basically, we had decided that we would be willing to very actively explore both options (if helpful even in the master branch). At the time, 1.19 branching was still off, we have the release now, so now would be the time to attack such thoughts! |
For X -= self.mean_ Also we would not be able to use the Edit: I disabled some instances to be nicer to the CI while there is a sprint going on. Edit2: I do not think array_function is supported in jax yet: jax-ml/jax#1565 . I do not know of a non-gpu array library that implements a good majority of the numpy api. |
The problem for the test failing with jax is that |
To answer @seberg's comment on NEP 37 vs @thomasjpfan have you already started such experiments on your side? If not I think it would be interesting to open a sister PR to try to experiment with that approach on the PCA class as well. |
The |
I remember experimenting with Feel free to open another PR to explore the |
Its great that you are exploring this. I will note again that previously we would be OK to experiment a bit within NumPy master (e.g. add There are a few issues with
|
I'll put together another PR to see how we can use |
I think there is a good chance this will be changed in JAX: |
If a function looks like this def func():
some_pre_processing() # Pure NumPy
algorithm_core() # Cython / Fortran / C / C++
some_post_processing() # Pure NumPy there could potentially be some value, if the cost of the pre and post processings ouweight the cost of CPU / GPU switch. Not sure if it's often the case in scikit-learn though. |
With the work around Array API, I think this PR is not needed anymore. |
Thank you Thomas for your work here! 🙏 It is nice to see how array support has evolved |
Reference Issues/PRs
After discussing with @seberg, it was concluded that a prototype of using CuPy with sklearn can help with seeing how libraries would use NEP 37.
enable_duck_array
as a config flag for nowWhat does this implement/fix? Explain your changes.
For reference, this is how this will look like for the user:
https://gist.github.com/thomasjpfan/da54f874f3d434eb4ae360b5e6fa4c12
With numpy
With cupy
Any other comments?
This is a quick prototype on how we can use NEP 37 in PCA. (There is still work for other solvers)
Note the
get_array_module
is a hack for now. It would be called with numpy if NEP 37 gets accepted.CC: @amueller @seberg