-
-
Notifications
You must be signed in to change notification settings - Fork 10.8k
WIP::ENH:SIMD Improve the performance of comparison operators #16960
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -18,6 +18,29 @@ | |||
#define BOOL_fmax BOOL_maximum | |||
#define BOOL_fmin BOOL_minimum | |||
|
|||
/* | |||
***************************************************************************** | |||
** Compersion & Logical ** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comparison
112b5ac
to
2e1682c
Compare
2e1682c
to
95c485a
Compare
9628906
to
8d4ae79
Compare
c09eecc
to
cedc863
Compare
@@ -0,0 +1,192 @@ | |||
#!/usr/bin/env python3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect this would make sense in a standalone PR, along with tests and documention.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, indeed. it's under experiments right now. sure I will move it later into a seprate pr along with doc and testing unit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good to look around and see if there are existing template solutions that can be reused. tempita
I think is used in some places in numpy, and jinja
may be an option too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried almost everything until I figure out the most flexible template engine is the one who doesn't bring new language syntax or philosophies and that what "pyas" does "Python as a template language", its simply treat Python as a PHP and f-strings as a template. it also provides a simple translation mechanism.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And the reason why I drop repeat template is the generated source size almost hit 9mb without finishing the rest of the work also it can't be used for generating C macros.
7602864
to
2c4415b
Compare
I wonder if we could refactor the dispatch mechanism to be much more limited: only have two loops: a baseline and an advanced loop, written in C. Then only use these loops via the current ufunc reassign-c-function-loops-at-import rather than the macro-based runtime mechanism via |
If the generated code is so large, maybe we need to rethink what we are trying to do here. |
xref, mattip#46 (comment) |
The issue in the conv_template(template repeater) that we had to count on C preprocessors for everything even with internal looping! $ python numpy/numpy/distutils/conv_template.py einsum_sumprod.c.src
$ du einsum_sumprod.c
4576 einsum_sumprod.c This PR is covering more and more kernels than einsum. So the best thing we can do is to cut the roots from the beginning |
2c4415b
to
727a2e7
Compare
closed in favor of #21483, while it doesn't contains all the improvements this pr has but we could add later during moving to C++ |
Don't merge! Work in progress
Summary of the changes, performance achievements, TODO list will be written later.
NOTE: Feel free to leave comment/review while I'm working on it