Skip to content

ENH: Add SIMD operation copysign #19770

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed

Conversation

howjmay
Copy link
Contributor

@howjmay howjmay commented Aug 28, 2021

The benchmark run in the CI gave the following response.

    before           after         delta
    [79a8986b]       [bdd62b3]
    <master>         <simd>

-   546±0μs          490±0μs        10.26%  bench_core.PackBits.time_copysign

@howjmay howjmay force-pushed the simd-copysign branch 15 times, most recently from c466922 to 5bc0d7a Compare August 29, 2021 06:50
@howjmay howjmay marked this pull request as ready for review August 29, 2021 08:04
@howjmay howjmay changed the title ENH: Add SIM copysign ENH: Add SIMD operation copysign Aug 29, 2021
@seberg seberg added the component: SIMD Issues in SIMD (fast instruction sets) code or machinery label Sep 8, 2021
@rgommers
Copy link
Member

Thanks @howjmay! Would you be able to add the results from an asv benchmark to the PR description? For an example, see gh-17102. benchmarks/bench_ufunc.py seems to have a copysign benchmark; if it doesn't cover the case you are optimizing then maybe you can add a new one?

@howjmay howjmay force-pushed the simd-copysign branch 4 times, most recently from ae737db to bde468f Compare September 12, 2021 10:39
@howjmay
Copy link
Contributor Author

howjmay commented Oct 4, 2021

Thanks @howjmay! Would you be able to add the results from an asv benchmark to the PR description? For an example, see gh-17102. benchmarks/bench_ufunc.py seems to have a copysign benchmark; if it doesn't cover the case you are optimizing then maybe you can add a new one?

Hi @rgommers I have added copysign to the benchmark!

@rgommers
Copy link
Member

rgommers commented Oct 4, 2021

Thanks @howjmay! It would be good to add the results to the PR description, in the before after format like in gh-17102. That way it's immediately clear that this PR improves performance and to what extent.

@howjmay
Copy link
Contributor Author

howjmay commented Oct 5, 2021

Hi @rgommers, I have appended the result of benchmark ran in CI. Please take a look. Thanks!

Copy link
Member

@rgommers rgommers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @howjmay. The speedup is only ~10% so that seems a bit marginal, but on the other the code complexity here isn't very high. So it may be a decent tradeoff - it's not quite clear to me. Any opinions @seiko2plus, @Qiyu8, @mattip?

@@ -81,6 +81,8 @@ def setup(self):
self.i = np.ones(150000, dtype=np.int32)
self.f = np.zeros(150000, dtype=np.float32)
self.d = np.zeros(75000, dtype=np.float64)

self.tf = np.ones(150000, dtype=np.int32)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this the same as self.i, so you can reuse that instead?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to point it out, but you do not want to use an integer array in the test. This will trigger casting. You need to use a homogeneous signature (i.e. both inputs are float32 or float64).
The speedup may still not be huge (this is probably memory-bound anyway), but adding a cast might hide most of it even if it is huge.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @rgommers, @seberg do you guys know how can I run two benchmark for comparing the original implementation and the SIMD implementation? I have changed the benchmark test for using float64, but the elapsed time increased. I think it is quite weird, and the 10% speedup is weird for my side too. I thought it will accelerate much more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For running the benchmark, you could reorder your commits so the benchmark one comes first, and put a new branch name pointing to that commit. Then you can compare that new branch with the simd-copysign one.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reordering commits should not even be necessary (although maybe nice) if you use --bench-compare from asv through runtests: https://numpy.org/doc/stable/benchmarking.html

Not sure if the docs needs updating, I feel I always have to run it from the benchmarks folder with ../runtests.py.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a missing understanding here, this patch only adds universal intrinsics for copysign but without actually using them in the inner loop of ufunc so there's no need for a benchmark.

@seiko2plus
Copy link
Member

@howjmay, First of all, I would like to thank you for your interest in improving the performance of NumPy, we really appreciate your efforts but again, there's no need to add new intrinsics for copysign for the same reasons I mentioned in #19780 (review).

@howjmay
Copy link
Contributor Author

howjmay commented Nov 16, 2021

Thank you for informing!

@howjmay howjmay closed this Nov 16, 2021
@howjmay howjmay reopened this Nov 16, 2021
@InessaPawson
Copy link
Member

@howjmay Did you mean to reopen this PR? If not, please close it once again when you get a chance.

@charris
Copy link
Member

charris commented Feb 20, 2023

@seiko2plus Should this be closes (again)?

@charris charris added the triage review Issue/PR to be discussed at the next triage meeting label Feb 20, 2023
@seiko2plus seiko2plus added the 57 - Close? Issues which may be closable unless discussion continued label Feb 21, 2023
@seiko2plus
Copy link
Member

@charris, yes, unless if @howjmay decided to re-implemented similar to #19780

@charris
Copy link
Member

charris commented Feb 21, 2023

Closing. @howjmay If you wish to pursue this, please make another PR.

@charris charris closed this Feb 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
01 - Enhancement 57 - Close? Issues which may be closable unless discussion continued component: SIMD Issues in SIMD (fast instruction sets) code or machinery triage review Issue/PR to be discussed at the next triage meeting
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants