Skip to content

BUG? hstack/vstack/column_stack no longer accept generators #12263

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
shoyer opened this issue Oct 26, 2018 · 9 comments
Closed

BUG? hstack/vstack/column_stack no longer accept generators #12263

shoyer opened this issue Oct 26, 2018 · 9 comments
Labels
Priority: high High priority, also add milestones for urgent issues

Comments

@shoyer
Copy link
Member

shoyer commented Oct 26, 2018

As reported by @ilayn over in #12262:

There are other backwards incompatible changes for hstack and column_stack which were explicitly accepting generator expressions but now they don't.

Xref : scipy/scipy#9405

@shoyer
Copy link
Member Author

shoyer commented Oct 26, 2018

This is definitely due to my __array_function__ refactor to add dispatchers to these functions (specifically #12115). The new dispatcher functions now exhaust the generators before they reach functions.

This is certainly not supported behavior -- the docstrings for these functions describes these parameters as accepting a "sequence of ndarrays" and generators are not sequences.

That said, if scipy relied on this behavior it would be better to handle this change more gracefully. My proposal:

  • For now, raise a FutureWarning when passed a generator, as determined by isinstance(arg, types.GeneratorType). Skip dispatch to __array_function__ implementations in this case.
  • In the future, raise TypeError.

This will result in a small performance penalty, but I don't see any good alternatives. In particular, I don't see any way we could correctly implement these functions with dispatching with support for generator arguments.

@amueller
Copy link

Scikit-learn relied on this also in many places. Even if it wasn't officially supported I would think a deprecation cycle / FutureWarning would be good. If scipy and scikit-learn rely on it, chances are lots of people were.

@tylerjereddy tylerjereddy added the Priority: high High priority, also add milestones for urgent issues label Oct 26, 2018
@charris
Copy link
Member

charris commented Oct 26, 2018

@shoyer That sounds like a good solution to me.

@tylerjereddy
Copy link
Contributor

Yeah, sounds sensible given the strong indication of downstream dependency from big projects.

@shoyer
Copy link
Member Author

shoyer commented Oct 26, 2018

I think we will need this fix for stack, column_stack, dstack, vstack and hstack. Anything else?

concatenate uses PySequence_Check internally, so it doesn't need any changes. But note that that this is a more relaxed check than isinstance(..., collections.abc.Sequence) (https://stackoverflow.com/questions/43566044/what-is-pythons-sequence-protocol). At the very least, NumPy arrays should be fine to use as "sequences" here.

@ilayn
Copy link
Contributor

ilayn commented Oct 26, 2018

To be honest I don't see the benefit of generators in this context since at some point all elements have to be ... generated and be present in the memory all at once. Thus I would be in favor of deprecating and eventually removing them if they make __array_function__ perform better which is way more important than this detail.

@shoyer
Copy link
Member Author

shoyer commented Oct 26, 2018

For scikit-learn, I think they used this just because it looked a little cleaner with two fewer characters, e.g.,

self.rows_ = np.vstack(self.row_labels_ == c
                       for c in range(self.n_clusters))

vs

self.rows_ = np.vstack([self.row_labels_ == c
                        for c in range(self.n_clusters)])

@amueller
Copy link

amueller commented Oct 26, 2018

@shoyer that seems a likely explanation and I don't think we actually care about adding two characters. Obviously we do care about scikit-learn breaking with the next numpy release, but I think we're all on the same page there, i.e. do a FutureWarning and eventually remove.

shoyer added a commit to shoyer/numpy that referenced this issue Oct 27, 2018
Fixes numpygh-12263

We can't support generators with dispatch for ``__array_function__``.
@shoyer shoyer changed the title BUG? hstack and column_stack no longer accept generators BUG? hstack/vstack/column_stack no longer accept generators Oct 28, 2018
@shoyer
Copy link
Member Author

shoyer commented Oct 28, 2018

This is also the cause of test failures for pandas: pandas-dev/pandas#23172

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Priority: high High priority, also add milestones for urgent issues
Projects
None yet
Development

No branches or pull requests

5 participants