-
-
Notifications
You must be signed in to change notification settings - Fork 10.8k
ENH: Added bitwise_count
UFuncs
#21429
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool! One small comment for simplifying a bit, but the code looks great on first sight!
My two main questions are:
- Did we decide what the output dtype is? The default integer, the same integer?
- While I am in favor of adding the
bit_count
ufunc (which still may need more discussion unfortunately), I would still need a bit convincing on adding a method, it doesn't seem as clear to me. There could be an argument of where to add the ufunc, but hopefully we don't lose track of this because of such questions.
numpy/core/src/umath/funcs.inc.src
Outdated
|
||
/* Try to use inbuilt popcount if available */ | ||
static PyObject *builtin_popcount_func = NULL; | ||
builtin_popcount_func = PyObject_GetAttrString(obj, "bit_count"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, most ufuncs already try to call a method of the same name when it exists? There should be a shorthand to do this in generate_umath
without the need to define this function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ohh yeah, let me see if we can auto-gen this, will make things easier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ohh also makes sense now, why not all umath functions are here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @seberg , I dug a bit deeper here and it seems all the o
supported dtypes have a function in funcs.inc.srs
and it does not seem automatic. Am I missing something here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, there is the "P"
character code that means: "Object, but just call the method".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah nice, I missed it. Thanks! Added via b86a322
Thanks for the review @seberg
For the scalar version we decided to always output an 8 bit (
This brings a very good point on if we can create a |
015a1a2
to
abcfc59
Compare
This PR seems so close. Anything I can do to help get this over the finish line @ganesh-k13? For certain similarity functions (like hamming distance) the bit count function is the bottleneck, and this could really unlock pure python solutions without needing to resort to something in a lower level language. |
Hey @softwaredoug , we support Well to answer your question on moving it forward, implementation wise, perhaps the So in conclusion, if you feel this has a genuine use case for many in the community (the UFunc of course, the scalar version is already in), you can start a thread in the mailing list. Now I can prioritize this to completion if everyone is ok with adding this UFunc and if you want to contribute, I can help you in any way with terms of navigating this bit. |
I will give my 👍 for the ufunc, but not for the method (on the array object). Feel free to announce to the mailing list that pending no substantial objects we will move forward with adding it. The return dtype has to be resolved, but I guess it was defined as uint8 for the scalars? In which case we have prior art. |
Awesome - thanks @ganesh-k13 and @seberg Maybe this is my numpy ignorance, but is there a way to efficiently apply the scalar function to a whole array? Without something like |
No, there isn't an efficient way (maybe a JIT'er like |
abcfc59
to
050127a
Compare
Hey @BvB93 , basic question: If we fix the return type of UFunc to one dtype, where do we make the typing changes? Now I do get |
6738683
to
bcd7a6f
Compare
bcd7a6f
to
1731966
Compare
Updated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Lots of nitpicks of the optional kind. I have to think about the naming a bit more.
Overall, I lean against adding any methods, which is the only meaningful change besides possible name bike-shedding.
numpy/core/src/multiarray/methods.c
Outdated
{"__dlpack_device__", | ||
(PyCFunction)array_dlpack_device, | ||
METH_NOARGS, NULL}, | ||
{"bitwise_count", | ||
(PyCFunction)array_bitwise_count, | ||
METH_VARARGS | METH_KEYWORDS, NULL}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as for bit_count
, I would just drop the method. I could see a point to bit_count()
for supporting the nested array case, but overall I am leaning against both.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might be handy? If we want to support it on other sequences? I'm guessing there will be some other way though without this addition? happy to do both
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed it via d7f364f. I'm a bit confused now, I thought this adds bitwise_count
to np namespace?
@@ -23,6 +23,7 @@ | |||
UNARY_UFUNCS = [obj for obj in np.core.umath.__dict__.values() | |||
if isinstance(obj, np.ufunc)] | |||
UNARY_OBJECT_UFUNCS = [uf for uf in UNARY_UFUNCS if "O->O" in uf.types] | |||
UNARY_OBJECT_UFUNCS.remove(getattr(np, 'bitwise_count')) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just in case someone wonders: it seems the path assumes float64 loops to exist, which is not the case for bitwise.
1731966
to
0ff8230
Compare
bit_count
UFuncsbitwise_count
UFuncs
0ff8230
to
47ad3bd
Compare
47ad3bd
to
e985c36
Compare
Resolved merge conflicts (rebased with latest main).
|
Long overdue look at the Array API standard. Two callouts:
|
e985c36
to
5928575
Compare
Are any more changes needed here @seberg? |
I have made few changes to the code to allow auto-vectorization, and here the result on x86 with avx2 enabled: ❯ lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 39 bits physical, 48 bits virtual
Byte Order: Little Endian
CPU(s): 8
On-line CPU(s) list: 0-7
Vendor ID: GenuineIntel
Model name: Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz
❯ gcc --version
gcc (GCC) 13.2.1 20230801
❯ spin bench --compare enh_16325_popcount_ufunc_before -t bitwise_count
| Change | Before [03fc9291] <enh_16325_popcount_ufunc_before> | After [fc6b5911] <enh_16325_popcount_ufunc> | Ratio | Benchmark (Parameter) |
|----------|-------------------------------------------------------|-----------------------------------------------|---------|-----------------------------------------------------------------------------------|
| - | 253±0.1μs | 170±0.1μs | 0.67 | bench_ufunc_strides.UnaryIntContig.time_unary(<ufunc 'bitwise_count'>, 1, 1, 'H') |
| - | 180±0.4μs | 58.9±0.6μs | 0.33 | bench_ufunc.UFunc.time_ufunc_types('bitwise_count') |
| - | 353±0.8μs | 117±1μs | 0.33 | bench_ufunc_strides.UnaryIntContig.time_unary(<ufunc 'bitwise_count'>, 1, 1, 'h') |
| - | 564±0.5μs | 121±0.2μs | 0.22 | bench_ufunc_strides.UnaryIntContig.time_unary(<ufunc 'bitwise_count'>, 1, 1, 'l') |
| - | 566±4μs | 123±0.4μs | 0.22 | bench_ufunc_strides.UnaryIntContig.time_unary(<ufunc 'bitwise_count'>, 1, 1, 'q') |
| - | 621±3μs | 108±1μs | 0.17 | bench_ufunc_strides.UnaryIntContig.time_unary(<ufunc 'bitwise_count'>, 1, 1, 'i') |
| - | 562±0.5μs | 79.6±0.2μs | 0.14 | bench_ufunc_strides.UnaryIntContig.time_unary(<ufunc 'bitwise_count'>, 1, 1, 'L') |
| - | 562±0.4μs | 80.5±0.2μs | 0.14 | bench_ufunc_strides.UnaryIntContig.time_unary(<ufunc 'bitwise_count'>, 1, 1, 'Q') |
| - | 679±3μs | 75.1±0.06μs | 0.11 | bench_ufunc_strides.UnaryIntContig.time_unary(<ufunc 'bitwise_count'>, 1, 1, 'I') |
| - | 269±1μs | 11.9±0.08μs | 0.04 | bench_ufunc_strides.UnaryIntContig.time_unary(<ufunc 'bitwise_count'>, 1, 1, 'B') |
| - | 309±0.2μs | 12.8±0.05μs | 0.04 | bench_ufunc_strides.UnaryIntContig.time_unary(<ufunc 'bitwise_count'>, 1, 1, 'b') | |
Wow! Thanks Sayed! |
Not that good, the GCC compiler was expected to apply something similar to Harley-Seal popcount algorithm for optimized machine code generation. However, it defaulted to using the standard popcnt instruction without additional optimizations. To potentially enhance performance, we could improve the existing "fast macros" to provides better hints for the compiler maybe by manually unrolling specialized loops counting on C++ meta or/and implementing the algorithm with handwritten generic SIMD intrinsics. |
@ganesh-k13 seems like nobody but me has much squirms about diverging from Python naming. Could you rebase/merge to fix the |
03e77b7
to
a7cb8a9
Compare
assert_type(np.bitwise_count.signature, None) | ||
assert_type(np.bitwise_count.identity, None) | ||
assert_type(np.bitwise_count(i8), Any) | ||
assert_type(np.bitwise_count(AR_i8), Any) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is failing now. I'll take a look tomorrow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think npt.NDArray[Any]
is correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct, I'm somewhat surprised that it even produced an Any
here at one point. Though it could have been accepted due to the somewhat broad substring matching used previously.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I see. That makes sense why it was passing before.
a7cb8a9
to
9e052cd
Compare
@seberg I've resolved the conflicts and UT failures, let me know if anything else needed here. |
@seberg any other changes needed here? |
Sorry, do you have time to resolve the conflict, I think that was really all that was left here, just I was in and out a bit much. |
ENH, DOC: Added countbits (popcount) ENH: Popcount implementation ENH: Add popcount to umath ENH: Added countbits (popcount) to umath `__all__` ENH: Refined popcount logic DOC: Added `bit_count` Co-authored-by: Eric Wieser <wieser.eric@gmail.com> MAINT: Renamed `countbits` to `bit_count` MAINT: Fixed 4 1s magic number DOC: Added `popcount` to docstring ENH: Added bit_count annotations ENH: Added GNU/CLANG popcount DOC: Added `popcount` language example ENH, BUG: Moved `bitcount` to npy_math.h as `popcount` | Fixed final right shift ENH: Enable `popcount` for signed TST: Tests for `bit_count` BUG, DOC: (BUG) Added missing typecast causing an unwanted upcast (DOC) Added more details on `popcount` implementation MAINT, BUG: (MAINT) Refined `popcount` TC to use typecode (BUG) Fixed ufunc.ntypes to include signed ints ENH: Added windows builtin support ENH: Added `popcount` implementation for big python ints natively [1/2] `popcount` object loop changes ENH: Object loop for `bit_count` [2/2] `popcount` object loop changes TST: Refined `bit_count` tests and added object type ENH: Added `bit_count` to `np.int*` DOC: Added `np.bit_count` (numpy#19355) MAINT: Various linting and minor fixes: 1. Fixed passing all args to _internals umath bitcount. Note: We use kwargs here that might hinder performance 2. Fixed linting errors. 3. Improved verbosity of logs 4. Made a generic TO_BITS_LEN macro to accomdate more length based functions in future BENCH: Added bit_count (popcount) MAINT: Style nits | Added signed case DOC, MAINT: Improved example ENH: Added annotations for bit_count TST: Added annotations tests for bit_count MAINT: Fixed linting errors MAINT: Moved Magic constants to npy_math_internal MAINT: Remove python implementation | Added 3.10 check to tests DOC: Added abs value usage to doc MAINT: Resolved merge conflicts
WIP: Fix the typing
* Changed popcount -> bitwise_count * Added comment on why we remove certain function in float64 object loops * Comment on when we skip bitwise_count
* Added return type for integer inputs * Refined release note * Refined external references for popcount
9e052cd
to
e8c8824
Compare
@seberg it seems to pass now 👍 , good to go from my end. |
Thanks, let's get it in before it diverges again. |
This is probably the longest I've worked on a PR :). Now time to get hex/fromhex in 😉 |
This is great! One observation – the implementation does not appear to support general output dtypes: >>> import numpy as np
>>> np.bitwise_count(2, dtype='int32')
# ---------------------------------------------------------------------------
# TypeError Traceback (most recent call last)
# <ipython-input-6-f7427f7dde09> in <cell line: 2>()
# 1 import numpy as np
# ----> 2 np.bitwise_count(2, dtype='int32')
# TypeError: No loop matching the specified signature and casting was found for ufunc bitwise_count It would be nice if we could support the |
Probably could do something about it, but the problem is that So to make it work, you would have to explicitly teach the ufunc that |
bitwise_count
UFuncsExpose
bit_count
as an UFunc.This is a raw port of #19355, specifically from 600947e to 4ee8a40
Side Note:
Anyone new can use this a template to add new UFuncs (I guess). Related to #19494
TODO:
generate_umath
Continues: #19355
Resolves: #16325