Skip to content

ENH: add ufuncs additional kwargs like out, dtype etc.. for np.where (out is needed most) #18516

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
victor-zou opened this issue Mar 1, 2021 · 8 comments

Comments

@victor-zou
Copy link

Feature

np.where can be regarded as a ternary ufuncs composed of cond?x:y, so it is natural
for np.where to have kwargs like out, dtype etc. Supporting these kwargs can save
efforts for allocating memory, type casting, etc. In addtion, library numexpr does support
this kind of syntax.

@eric-wieser
Copy link
Member

Largely duplicates #8994 I think

@victor-zou
Copy link
Author

Largely duplicates #8994 I think

Thank you for the reference, to find that the feature is still not implemented after four years. It looks that the obstacle lays in the initial imature design that merges two inrelevent things in one function.

So, can we have a second best choice, i.e., if it is not easy to make where a ufunc, simply add out kwarg. If a user use the nonzero part and pass in an out, raise a ValueError. Similiarly, I think a dtype kwarg is also not hard to impl and works
well with the nonzero part (for example, the user may choose to output int32, uint32, int64, uint64, size_t, ptrdiff_t, ssize_t ...).

@eric-wieser
Copy link
Member

It looks that the obstacle lays in the initial imature design that merges two inrelevent things in one function.

I don't think this is really the obstacle - a bigger issues is that where works on arbitrary dtypes, but ufuncs only work on simple dtypes (ie not string, unicode, void, etc)

@seberg
Copy link
Member

seberg commented Mar 3, 2021

Well, I am working on that part (next big item on my agenda). Hopefully we have better support for "flexible" or "parametric" dtypes fairly soon. We actually do have some support, but its weird enough that either nobody knew we have it or nobody felt like using it.

On the other hand, where is a bit special, in that the actual inner-loop could probably be written without any dtype specific code, but rather "using" the existing copy code. numexpr probably would require a proper ufunc, adding out is probably acceptable right now, dtype might be too, but I am not sure how much churn it would be to include casting logic in the current code.

@victor-zou
Copy link
Author

Well, I am working on that part (next big item on my agenda). Hopefully we have better support for "flexible" or "parametric" dtypes fairly soon. We actually do have some support, but its weird enough that either nobody knew we have it or nobody felt like using it.

On the other hand, where is a bit special, in that the actual inner-loop could probably be written without any dtype specific code, but rather "using" the existing copy code. numexpr probably would require a proper ufunc, adding out is probably acceptable right now, dtype might be too, but I am not sure how much churn it would be to include casting logic in the current code.

Thanks for the reply. Until today do I read the source code and know that the np.where is implementated “simply” via if and copy instead of npyv_select* macros (namely, the _mm*_blend** simd instructions). My personal suggestion is to completely refactor the function and use the _mm*_blend** simd instructions for numeric types. Adding kwargs out is for performance consideration, the original code draw the speed down and makes the little time saved from memory allocation meaningless.

@victor-zou
Copy link
Author

Well, I am working on that part (next big item on my agenda). Hopefully we have better support for "flexible" or "parametric" dtypes fairly soon. We actually do have some support, but its weird enough that either nobody knew we have it or nobody felt like using it.

On the other hand, where is a bit special, in that the actual inner-loop could probably be written without any dtype specific code, but rather "using" the existing copy code. numexpr probably would require a proper ufunc, adding out is probably acceptable right now, dtype might be too, but I am not sure how much churn it would be to include casting logic in the current code.

I found that there is already a ternary ufunc named clip, which is de facto composed by two where. So it would be not hard to add another ufunc. If keeping compatible with the old np.where is hard, is it ok to add another ufunc whose name is blend (as "where" and "select" are all used, and it is same with the instruction name), and another function np.blend.

@seberg
Copy link
Member

seberg commented Mar 4, 2021

If you don't mind overwriting one of the inputs, np.copyto should actually be a pretty decent solution. Of course it also doesn't do anything particularly fancy (but then aside from a few ufunc, not a whole lot of things in NumPy do). Although, I wonder if a dedicated meld is actually much faster for most use cases (e.g. if True/False's are blocked, you may not even have to even read both arrays for a chunk larger than a cache line).

I think adding a ternary ufunc for this "meld" operation is fine (assuming its not insanely much churn). Adding it to the main namespace, I am not sure... There is a clip ufunc, but that is just used inside np.clip and not exposed to end-users.
I also profoundly dislike the dual-use of np.where, but the np.nonzero alternative is probably more "clear". Although, if there is a "canonical" name most packages use, we could try to add it and nudge users towards it very slowly, but dispite my personal dislike of np.where due to its dual-use, the name is very pleasing and I think it is used a lot in every-day code.

@victor-zou
Copy link
Author

The purpose for out is to avoid mem alloc, instead of writing to specific buffer.

When cond, x, y are contiguous, if dtype is float64, a simd version (written by Eigen and wrapped by pybind11, in an old intel cpu that only support avx2 that I tested) will give a nearly 50% performance boost on a large array (I have carefully excludes the time for memory allocation, memory dellocation, etc.). Let alone for servers with avx512 instructions.

The name where is good. I do not care what its name is, but how fast it is. I want the np.where can be as optimized as other part of the numpy package, instead of the status quo that 1) does not use simd instructions 2) cannot avoid memory allocation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants