Skip to content

Feature request: signal broadcasting is OK over core dimension #8811

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
mhvk opened this issue Mar 22, 2017 · 9 comments
Open

Feature request: signal broadcasting is OK over core dimension #8811

mhvk opened this issue Mar 22, 2017 · 9 comments

Comments

@mhvk
Copy link
Contributor

mhvk commented Mar 22, 2017

EDITS:

Rationale

Before numpy 1.10, there was automatic broadcasting over core dimensions in gufunc. While this is not necessarily good for many things (e.g., np.inner1d in the example in the documentation), it is for others, such as the all_equal implementation in #8528. It would be nice if the signature allowed one to make this distinction.

Wishlist beyond broadcasting

  • Multiple related signatures, such as for matmul: (i,k),(k,j)->(i,j), (i,j),(i)=(i,j),(i,1)->(i,j), (i),(i,j)=(1,i),(i,j)->(i,j)
  • Possibly having the output signature be calculated from the input dimensions (min(i,j))

Possible implementations

Noting that signature has to be a single char to remain compatible with the API:

  • Adjust the interpretation of dimensions, e.g., (i|1), (i|1)->() or (i?), (i?)->() (Add an axis argument to generalized ufuncs? #5197) might convey what is needed for all_equal (easy to do with setting relevant strides to 0)
  • Allow multiple signatures; e.g., (i),(i)->(); (i),()->(); (),(i)->(); (),()->(). This would likely make the most sense for things like matmul, but would seem to imply different functions for each signature.
@eric-wieser
Copy link
Member

eric-wieser commented Mar 22, 2017

Rather than producing an increasingly complex syntax for signatures (#5015, and i've wanted (i),(j)->(min(i,j)) in the past), perhaps instead we should provide a hook that takes the input dimensions and provides an appropriate set of output dimensions, or NULL if no such set exists. (Which @jaimefrio suggested in the linked issue)

@mhvk
Copy link
Contributor Author

mhvk commented Mar 22, 2017

@eric-wieser - again a good suggestion; I added it above. I think the hook is separate (and good for more complicated cases).

@shoyer
Copy link
Member

shoyer commented Mar 22, 2017

I would be much happier with a multiple dispatch system for gufuncs, where a given function has a list of signatures that are tried in order, which we need for matmul anyways.

@mhvk
Copy link
Contributor Author

mhvk commented Mar 22, 2017

@shoyer - I now listed this as an option. I think this is hard to do without those different signatures corresponding to different C functions, while all that is required for my original suggestion is to re-allow broadcasting by the iterator.

@njsmith
Copy link
Member

njsmith commented Mar 22, 2017 via email

@mhvk
Copy link
Contributor Author

mhvk commented Mar 22, 2017

@njsmith - fair enough; added a "wishlist" section. I think this does not necessarily influence axis -- in its simplest, most direct form, that remains a list of tuples with whatever core dimensions are required by a given interpretation of the signature; it would seem up to the caller to ensure this makes sense.

@eric-wieser
Copy link
Member

there's Eric's min case (can you give more details?),

This is desirable in linalg.svd, where right now we have a different ufunc for which of m and n is bigger.

@mattharrigan
Copy link
Contributor

mattharrigan commented Mar 25, 2017

I think think the ufuncs should be universal and general enough that (almost) all numpy functions COULD be performed by a ufunc. Note I'm not saying they SHOULD all be ufuncs. Here are few examples not from linalg with different challenges:

  • linspace - it could be nice if it broadcasted over start and stop, plus a ufunc could almost certainly be more optimized. For example linspace([1,2], 0, 3) = [[1, .5, 0], [2,1,0]].
  • average - if weights is a scalar 1, then it should broadcast, otherwise specific weights for each value can be passed
  • correlate - more complex example of determining output shape
  • diff - for n>1, doing it in a single pass would be much more efficient
  • cross - it has its own unique syntax for defining the axes
  • nonzero - this is even more complicated because the size of the output isn't known until after processing all elements of the array. An alternative could be a signature (i)->(i),(). Basically return an array of equal size as the input and another scalar with the actual count.

There are probably other examples that could be pulled from existing numpy functions.

@shoyer would multiple dispatch require an excessive number of possible signatures? As an example, would all_equal need (i),(i)->(), (i),()->(), (),(i)->(), and (),()->()? As the number of inputs increase the number of functions could go exponential.

@jaimefrio
Copy link
Member

I'm really late to the party, but disallowing broadcasting over core dimensions was a conscious decision, that took a long time, with lots of disagreements, see #5077. I very rarely have strong opinions on this kind of things, but since we are just past July 4th: "Prudence, indeed, will dictate that Governments long established should not be changed for light and transient causes."

In any case, most of the fancy changes being proposed require adding additional stuff to the ufunc struct, which means breaking ABI compatibility. At some point in the past we were planning on breaking it once to hide all internal details, similarly to what was done with PyArrayObject and PyArrayObject_fields in 1.7.

Unless I'm missing something, without that prior work none of these other plans can really fly, so it would be a good thing if we could figure out a plan to get it done...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants