Skip to content

PERF: Fast check on equivalent arrays in PyArray_EQUIVALENTLY_ITERABLE_OVERLAP_OK #21464

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

eendebakpt
Copy link
Contributor

@eendebakpt eendebakpt commented May 6, 2022

The PyArray_EQUIVALENTLY_ITERABLE_OVERLAP_OK has to go through all the checks for identical arrays (a common case for inplace operations). This PR adds a fast check to see whether the two arrays to be compared are identical.

Benchmark

import numpy as np
import time

x=np.random.rand(10,20)   
niter=20_0000

t0=time.perf_counter()
for ii in range(niter):
    x=np.cos(x, out=x)
dt=time.perf_counter()-t0
print(dt)

Results in:

main: 0.51
PR: 0.39

Addresses one of the items in #21455

@eendebakpt eendebakpt marked this pull request as draft May 6, 2022 11:56
@eendebakpt
Copy link
Contributor Author

The stride1 !=0 part of the condition is to catch the case

    import numpy as np
    x = np.array(2)
    np.add(x, [1], x)

@eendebakpt eendebakpt marked this pull request as ready for review May 7, 2022 17:34
@seberg
Copy link
Member

seberg commented May 7, 2022

I am not wondering if it may also influence the the case of (I admit, I hope nobody does this!):

arr = np.array([5])
arr = np.broadcast_to(arr, 10)
arr.flags.writeable = True

np.add(arr, 0, out=arr)

But probably only interesting for a potential test...

I think I would remove the assert here again, it will lead to hard crashes anyway ;). Also wondering if just repeating that "macro" call may not read easier, but thats just styling now.

@eendebakpt
Copy link
Contributor Author

I am not wondering if it may also influence the the case of (I admit, I hope nobody does this!):

arr = np.array([5])
arr = np.broadcast_to(arr, 10)
arr.flags.writeable = True

np.add(arr, 0, out=arr)

But probably only interesting for a potential test...

Output of your example is the same for main as for this PR: an array [5 5 5 5 5 5 5 5 5 5].


if (arr1 == arr2 && stride1 !=0) {
// case common for inplace operations
return 1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PyArray_TRIVIAL_PAIR_ITERATION_STRIDE returns 0 for 1-sized arrays (i.e. 0-D). So we miss those cases here. OTOH, 0-D arrays are (currently) pretty rare.

Maybe we should just add that logic and move it into the return? When (arr1 == arr2), we can return size <= 1 || stride != 0; (solve_may_share_memory really does nothing in the whole arr1 == arr2 branch.)

Another thought, is to use PyArray_DATA(arr1) == PyArray_DATA(arr2), we need the data anyway and it will allow catching views (may also help subclasses for example).

I agree with the change! I am just still a bit curious if it would not make more sense to push it into solve_may_share_memory so that it is used everywhere.

Another point is, that th stride1 != 0 check here doesn't really seem right unless we care about my weird example, the check here really protects us from incorrect shape checking earlier...

Anyway, if nothing moves soon, I may just add a brief comment and merge then. But if you have an idea or think moving it is a good idea please do. Otherwise, the only thing would be style nitpicks (blank line and space after !=).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it depends on the use cases of interest (how often they occur and whether they are time critical). E.g. PyArray_DATA(arr1) == PyArray_DATA(arr2) will catch more cases, but perhaps it does require additional checks (e.g. on the stride or other properties?).

Maybe open an issue with label 'good first issue' to further investigate ideas like moving the check into solve_may_share_memory or using return size <= 1 || stride != 0?

@eendebakpt eendebakpt force-pushed the fast_check_PyArray_EQUIVALENTLY_ITERABLE_OVERLAP_OK branch from dc95b3b to 70e63d0 Compare May 11, 2022 08:15
@seberg
Copy link
Member

seberg commented May 11, 2022

The code clutter is minimal now, so I am OK with this. But to confirm that you think it is still worthwhile?

@seberg
Copy link
Member

seberg commented May 11, 2022

Wait, I posted this on the wrong PR... I meant to ask for the int change one.

…ction

That is a bit of a weird dynamic here, and it would probably nice to clean
up.  However, it is also not a reason to not add the fast-path right now.
@seberg seberg merged commit cfbbde8 into numpy:main May 14, 2022
@seberg
Copy link
Member

seberg commented May 14, 2022

@eendebakpt thanks! Lets put this in. I added (a longish) comment on the weird subtlety that this the stride check seems to implicitly reject output broadcasting.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants