Skip to content

Use np.empty_like in PolynomialFeatures for NEP18 support #16196

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

maartenbreddels
Copy link

@maartenbreddels maartenbreddels commented Jan 24, 2020

Reference Issues/PRs

Followup of my quest to make vaex work with sklearn via NEP18 #14963 / vaexio/vaex#415

What does this implement/fix? Explain your changes.

PolynomialFeatures uses np.empty, if it uses np.empty_like, I can intercept it via NEP18, so that vaex can use this transformer.

Any other comments?

I could push more of these changes on this branch, but I first want to see if this is what people like to see or not.

@thomasjpfan
Copy link
Member

It would be nice to find a solution to array creation that can take advantage of NEP 18.

This is being looked at in NEP 35 numpy/numpy#14715

@maartenbreddels maartenbreddels force-pushed the np_empty_like_nep18 branch 2 times, most recently from 3b9cd3a to 216fcaa Compare January 24, 2020 21:11
@maartenbreddels
Copy link
Author

Thanks @thomasjpfan I'll take a look at that NEP

I think the failure is unrelated (SpectralCoclustering)

@glemaitre
Copy link
Member

I think the failure is unrelated (SpectralCoclustering)

It should be fixed in master.

@maartenbreddels
Copy link
Author

maartenbreddels commented Jan 26, 2020 via email

@maartenbreddels
Copy link
Author

shape for empty_like requires nump 1.17, is this an issue?
https://docs.scipy.org/doc/numpy/reference/generated/numpy.empty_like.html

@thomasjpfan
Copy link
Member

That will be a problem since our minimum requirements for numpy are 1.11.0

With #15106 it will move to 1.13.3.

@maartenbreddels
Copy link
Author

Hmm, this is a blocker to move forward with nep13/18 support. Do you have any suggestions to be future compatible with numpy >= 1.17, for instance, try np.empty_like, but provide a fallback to np.empy for earlier versions. Is this generally accepted in sklearn?

Maybe @amueller has some ideas as well, I believe he has some interest in nep13/nep18 support.

@jnothman
Copy link
Member

jnothman commented Jan 27, 2020 via email

@maartenbreddels maartenbreddels force-pushed the np_empty_like_nep18 branch 3 times, most recently from de610d2 to 1183d81 Compare January 28, 2020 10:21
@maartenbreddels
Copy link
Author

I did an attempt to implement empty_like in fixes.py, it is less simple than I hoped. It works on numpy < 1.17 by forwarding to np.empy or np.empty_like depending on the shape parameter and the type of array. For >= 1.17, it simply forwards to np.empty_like.

The only loose end is order='K', but one could argue not to use that, and always specify 'C' or 'F' and maybe allow 'A'.

@maartenbreddels maartenbreddels changed the title Use np.empty_like in PolynomialFeatures Use np.empty_like in PolynomialFeatures for NEP18 support Jan 28, 2020
@amueller
Copy link
Member

Are there _like for everything we need? And do we want to adopt that or wait what happens with numpy/numpy#14715 and NEP 37?

We will need dispatching for zeros_like and ones_like. There is no arange_like, right? Are there others that are missing? For random arrays we have the separate fix of allowing more general random states, but that requires some extra effort by the user, right?

@jnothman
Copy link
Member

jnothman commented Jan 28, 2020 via email

@maartenbreddels
Copy link
Author

From https://docs.scipy.org/doc/numpy/reference/routines.array-creation.html I can see
4: (zeros|full|ones|empty)_like.

I thought linspace could be an issue, but grep -n linspace sklearn/**/*.py | grep -v "test" revealed:

  • QuantileTransformer: uses it for the quantiles, this shouldn't hurt too much.
  • _hist_gradient_boosting/binning.py:73 the percentiles, should also be small.

I guess np.arange would be most common.

Apart from whatever NEP will be the best way forward, do you think it makes sense to put array creation functions in a separate module (maybe not fixed.py)?

@amueller
Copy link
Member

If we adopt towards whatever NEP will succeed, maybe we should have a separate file just for helping implement the NEP, including array creation and other things (like scipy dispatch).

Though right now it looks like we mostly do a backport, which is somewhat different from writing helpers.

@maartenbreddels
Copy link
Author

maybe we should have a separate file just for helping implement the NEP, including array creation and other things

so, for instance, a stripped-down version of empty_like in this PR, plus some others, possible a range_like, etc, is that what you have in mind?

grep -n order sklearn/**/*.py | grep -v "test" | grep 'K seems to confirm we don't need to support order='K'.

like scipy dispatch

Never heard of this, does scipy have it's own dispatch mechanism?

@amueller
Copy link
Member

Never heard of this, does scipy have it's own dispatch mechanism?

Sorry for being cryptic. No, it does not. But we often use scipy.linalg and we need to figure out in what cases we can use the __array_function__ machinery instead and I assume we'll need some helpers around that.

@maartenbreddels
Copy link
Author

But we often use scipy.linalg and we need to figure out in what cases we can use the __array_function__ machinery instead and I assume we'll need some helpers around that.

Indeed, I rely on monkey patching now, but it would be nice to for instance to try to use the NEP18 protocol for this.

I gave it another refactor, a separate module for array creation function (implemented (empty|zeros|ones)_like) which has a numpy like API.
I made the KBinsDiscretizer _BaseEncoder and PolynomialFeatures use these.

The downside of this approach is that it introduces another layer of indirection, which does not make it easier to read the source code. Using an API with the same name and similar behavior should help digest it I think.

@pentschev
Copy link

Regarding the *_like functions, I'd just like to point out that I finally got numpy/numpy#16935 to implement NEP-35, which will add a like= argument to all NumPy array creation functions with similar behavior to that of empty_like (and derived functions), using __array_function__ as the dispatching mechanism. The PR only supports a few functions now, but hopefully it can be extended to most or all array creation functions in the coming weeks if there's no major setback in the implementation that would cause it to be rejected by NumPy maintainers and the community.

Base automatically changed from master to main January 22, 2021 10:51
@glemaitre
Copy link
Member

@ogrisel @betatim I wondering if this PR is still relevant or with the work of the array-API, we will use an alternative to use the proper namespace to create the array?

@adrinjalali
Copy link
Member

So closing as superseded by array API work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants