Skip to content

ENH: Added bitwise_count UFuncs #21429

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Sep 28, 2023
Merged

Conversation

ganesh-k13
Copy link
Member

@ganesh-k13 ganesh-k13 commented May 3, 2022

bitwise_count UFuncs

Expose bit_count as an UFunc.

>>> a = (2**np.arange(20) - 1)
>>> a
array([     0,      1,      3,      7,     15,     31,     63,    127,
          255,    511,   1023,   2047,   4095,   8191,  16383,  32767,
        65535, 131071, 262143, 524287])
>>> np.bitwise_count(a)
array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19])

This is a raw port of #19355, specifically from 600947e to 4ee8a40

Side Note:

Anyone new can use this a template to add new UFuncs (I guess). Related to #19494

TODO:

  • Figure out if it's up to date with latest main
  • Add bencmarks (at least now :) )
  • Commits are a mess, fix it
  • Figure out the return dtype
  • Clean generate_umath

Continues: #19355
Resolves: #16325

Copy link
Member

@seberg seberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool! One small comment for simplifying a bit, but the code looks great on first sight!

My two main questions are:

  • Did we decide what the output dtype is? The default integer, the same integer?
  • While I am in favor of adding the bit_count ufunc (which still may need more discussion unfortunately), I would still need a bit convincing on adding a method, it doesn't seem as clear to me. There could be an argument of where to add the ufunc, but hopefully we don't lose track of this because of such questions.


/* Try to use inbuilt popcount if available */
static PyObject *builtin_popcount_func = NULL;
builtin_popcount_func = PyObject_GetAttrString(obj, "bit_count");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, most ufuncs already try to call a method of the same name when it exists? There should be a shorthand to do this in generate_umath without the need to define this function.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ohh yeah, let me see if we can auto-gen this, will make things easier.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ohh also makes sense now, why not all umath functions are here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @seberg , I dug a bit deeper here and it seems all the o supported dtypes have a function in funcs.inc.srs and it does not seem automatic. Am I missing something here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, there is the "P" character code that means: "Object, but just call the method".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah nice, I missed it. Thanks! Added via b86a322

@ganesh-k13
Copy link
Member Author

ganesh-k13 commented May 5, 2022

Thanks for the review @seberg

Did we decide what the output dtype is? The default integer, the same integer?

For the scalar version we decided to always output an 8 bit (uint8_t). Although here it might be the case, will need to change it.

There could be an argument of where

This brings a very good point on if we can create a np.bits namespace of sorts and club the bits stuff there, what do you think?

@ganesh-k13 ganesh-k13 force-pushed the enh_16325_popcount_ufunc branch from 015a1a2 to abcfc59 Compare September 18, 2022 03:44
@softwaredoug
Copy link

This PR seems so close. Anything I can do to help get this over the finish line @ganesh-k13? For certain similarity functions (like hamming distance) the bit count function is the bottleneck, and this could really unlock pure python solutions without needing to resort to something in a lower level language.

@ganesh-k13
Copy link
Member Author

Hey @softwaredoug , we support bit_count today and was added via #19355. This PR is mainly to get a bit_count UFunc so we can operate it on arrays, etc.

Well to answer your question on moving it forward, implementation wise, perhaps the generate_umath can be made simpler. But the bigger hurdle would be convince the mailing list that this function is needed and doesn't add more to the already bloated namespace.

So in conclusion, if you feel this has a genuine use case for many in the community (the UFunc of course, the scalar version is already in), you can start a thread in the mailing list. Now I can prioritize this to completion if everyone is ok with adding this UFunc and if you want to contribute, I can help you in any way with terms of navigating this bit.

@seberg
Copy link
Member

seberg commented Nov 24, 2022

I will give my 👍 for the ufunc, but not for the method (on the array object). Feel free to announce to the mailing list that pending no substantial objects we will move forward with adding it.

The return dtype has to be resolved, but I guess it was defined as uint8 for the scalars? In which case we have prior art.

@seberg seberg added the 62 - Python API Changes or additions to the Python API. Mailing list should usually be notified. label Nov 24, 2022
@softwaredoug
Copy link

softwaredoug commented Nov 24, 2022

Awesome - thanks @ganesh-k13 and @seberg

Maybe this is my numpy ignorance, but is there a way to efficiently apply the scalar function to a whole array? Without something like apply_along_axis which in my mental model wouldn't be particularly fast?

@seberg
Copy link
Member

seberg commented Nov 24, 2022

No, there isn't an efficient way (maybe a JIT'er like numba can do it, not sure). You could write your own ufunc, its not actually that terribly hard, but the learning curve is just quite steep, especially if you are not used to C (ufunclab is my goto example for external ufuncs currently).

@ganesh-k13 ganesh-k13 force-pushed the enh_16325_popcount_ufunc branch from abcfc59 to 050127a Compare November 26, 2022 13:16
@ganesh-k13
Copy link
Member Author

Hey @BvB93 , basic question: If we fix the return type of UFunc to one dtype, where do we make the typing changes?

Now I do get ValueError: Unexpected reveal line format: 23: error: Name "ctypes._Pointer" is not defined in numpy/typing/tests/test_typing.py, but I don't know how to interpret this. I guess we need to make some change in numpy/typing/tests/data/reveal/ufuncs.pyi?

@ganesh-k13 ganesh-k13 force-pushed the enh_16325_popcount_ufunc branch 3 times, most recently from 6738683 to bcd7a6f Compare December 2, 2022 13:31
@ganesh-k13 ganesh-k13 marked this pull request as ready for review December 3, 2022 12:22
@ganesh-k13 ganesh-k13 requested a review from seberg December 3, 2022 12:22
@ganesh-k13 ganesh-k13 force-pushed the enh_16325_popcount_ufunc branch from bcd7a6f to 1731966 Compare December 5, 2022 11:13
@ganesh-k13
Copy link
Member Author

Updated bit_count (just the UFunc) to bitwise_count. Reference to discussion regarding the same: discussions

Copy link
Member

@seberg seberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Lots of nitpicks of the optional kind. I have to think about the naming a bit more.
Overall, I lean against adding any methods, which is the only meaningful change besides possible name bike-shedding.

{"__dlpack_device__",
(PyCFunction)array_dlpack_device,
METH_NOARGS, NULL},
{"bitwise_count",
(PyCFunction)array_bitwise_count,
METH_VARARGS | METH_KEYWORDS, NULL},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as for bit_count, I would just drop the method. I could see a point to bit_count() for supporting the nested array case, but overall I am leaning against both.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be handy? If we want to support it on other sequences? I'm guessing there will be some other way though without this addition? happy to do both

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed it via d7f364f. I'm a bit confused now, I thought this adds bitwise_count to np namespace?

@@ -23,6 +23,7 @@
UNARY_UFUNCS = [obj for obj in np.core.umath.__dict__.values()
if isinstance(obj, np.ufunc)]
UNARY_OBJECT_UFUNCS = [uf for uf in UNARY_UFUNCS if "O->O" in uf.types]
UNARY_OBJECT_UFUNCS.remove(getattr(np, 'bitwise_count'))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just in case someone wonders: it seems the path assumes float64 loops to exist, which is not the case for bitwise.

@ganesh-k13 ganesh-k13 force-pushed the enh_16325_popcount_ufunc branch from 1731966 to 0ff8230 Compare December 11, 2022 06:47
@ganesh-k13 ganesh-k13 changed the title ENH: Added bit_count UFuncs ENH: Added bitwise_count UFuncs Dec 12, 2022
@ganesh-k13 ganesh-k13 force-pushed the enh_16325_popcount_ufunc branch from 0ff8230 to 47ad3bd Compare January 25, 2023 16:30
@ganesh-k13 ganesh-k13 force-pushed the enh_16325_popcount_ufunc branch from 47ad3bd to e985c36 Compare March 18, 2023 06:10
@ganesh-k13
Copy link
Member Author

ganesh-k13 commented Mar 18, 2023

Resolved merge conflicts (rebased with latest main).

[EDIT] Should have built locally before pushing, checking... Bad merge, fixed now

@ganesh-k13
Copy link
Member Author

Long overdue look at the Array API standard. Two callouts:

  1. elementwise_functions section does not include bitwise_count, but a note: All other functions that operates on bits use bitwise_*
  2. bitwise_operators specification only talks of: Bitwise operators should be defined for arrays having integer and boolean data types., which we do follow in this change list.

@ganesh-k13 ganesh-k13 force-pushed the enh_16325_popcount_ufunc branch from e985c36 to 5928575 Compare March 18, 2023 06:27
@ganesh-k13 ganesh-k13 requested a review from seberg March 28, 2023 03:49
@ganesh-k13
Copy link
Member Author

Are any more changes needed here @seberg?

@seiko2plus
Copy link
Member

seiko2plus commented Aug 31, 2023

I have made few changes to the code to allow auto-vectorization, and here the result on x86 with avx2 enabled:

❯ lscpu
Architecture:            x86_64
  CPU op-mode(s):        32-bit, 64-bit
  Address sizes:         39 bits physical, 48 bits virtual
  Byte Order:            Little Endian
CPU(s):                  8
  On-line CPU(s) list:   0-7
Vendor ID:               GenuineIntel
  Model name:            Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz

❯ gcc --version
gcc (GCC) 13.2.1 20230801

❯ spin bench --compare enh_16325_popcount_ufunc_before -t bitwise_count
| Change   | Before [03fc9291] <enh_16325_popcount_ufunc_before>   | After [fc6b5911] <enh_16325_popcount_ufunc>   |   Ratio | Benchmark (Parameter)                                                             |
|----------|-------------------------------------------------------|-----------------------------------------------|---------|-----------------------------------------------------------------------------------|
| -        | 253±0.1μs                                             | 170±0.1μs                                     |    0.67 | bench_ufunc_strides.UnaryIntContig.time_unary(<ufunc 'bitwise_count'>, 1, 1, 'H') |
| -        | 180±0.4μs                                             | 58.9±0.6μs                                    |    0.33 | bench_ufunc.UFunc.time_ufunc_types('bitwise_count')                               |
| -        | 353±0.8μs                                             | 117±1μs                                       |    0.33 | bench_ufunc_strides.UnaryIntContig.time_unary(<ufunc 'bitwise_count'>, 1, 1, 'h') |
| -        | 564±0.5μs                                             | 121±0.2μs                                     |    0.22 | bench_ufunc_strides.UnaryIntContig.time_unary(<ufunc 'bitwise_count'>, 1, 1, 'l') |
| -        | 566±4μs                                               | 123±0.4μs                                     |    0.22 | bench_ufunc_strides.UnaryIntContig.time_unary(<ufunc 'bitwise_count'>, 1, 1, 'q') |
| -        | 621±3μs                                               | 108±1μs                                       |    0.17 | bench_ufunc_strides.UnaryIntContig.time_unary(<ufunc 'bitwise_count'>, 1, 1, 'i') |
| -        | 562±0.5μs                                             | 79.6±0.2μs                                    |    0.14 | bench_ufunc_strides.UnaryIntContig.time_unary(<ufunc 'bitwise_count'>, 1, 1, 'L') |
| -        | 562±0.4μs                                             | 80.5±0.2μs                                    |    0.14 | bench_ufunc_strides.UnaryIntContig.time_unary(<ufunc 'bitwise_count'>, 1, 1, 'Q') |
| -        | 679±3μs                                               | 75.1±0.06μs                                   |    0.11 | bench_ufunc_strides.UnaryIntContig.time_unary(<ufunc 'bitwise_count'>, 1, 1, 'I') |
| -        | 269±1μs                                               | 11.9±0.08μs                                   |    0.04 | bench_ufunc_strides.UnaryIntContig.time_unary(<ufunc 'bitwise_count'>, 1, 1, 'B') |
| -        | 309±0.2μs                                             | 12.8±0.05μs                                   |    0.04 | bench_ufunc_strides.UnaryIntContig.time_unary(<ufunc 'bitwise_count'>, 1, 1, 'b') |

@ganesh-k13
Copy link
Member Author

Wow! Thanks Sayed!

@seiko2plus
Copy link
Member

seiko2plus commented Aug 31, 2023

Wow! Thanks Sayed!

Not that good, the GCC compiler was expected to apply something similar to Harley-Seal popcount algorithm for optimized machine code generation. However, it defaulted to using the standard popcnt instruction without additional optimizations. To potentially enhance performance, we could improve the existing "fast macros" to provides better hints for the compiler maybe by manually unrolling specialized loops counting on C++ meta or/and implementing the algorithm with handwritten generic SIMD intrinsics.

@seberg
Copy link
Member

seberg commented Sep 6, 2023

@ganesh-k13 seems like nobody but me has much squirms about diverging from Python naming. Could you rebase/merge to fix the ufuncs.pyi conflict so that we can get this in?

@ganesh-k13 ganesh-k13 force-pushed the enh_16325_popcount_ufunc branch 2 times, most recently from 03e77b7 to a7cb8a9 Compare September 6, 2023 18:02
assert_type(np.bitwise_count.signature, None)
assert_type(np.bitwise_count.identity, None)
assert_type(np.bitwise_count(i8), Any)
assert_type(np.bitwise_count(AR_i8), Any)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is failing now. I'll take a look tomorrow.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think npt.NDArray[Any] is correct?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, I'm somewhat surprised that it even produced an Any here at one point. Though it could have been accepted due to the somewhat broad substring matching used previously.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see. That makes sense why it was passing before.

@ganesh-k13 ganesh-k13 force-pushed the enh_16325_popcount_ufunc branch from a7cb8a9 to 9e052cd Compare September 8, 2023 14:22
@ganesh-k13
Copy link
Member Author

@seberg I've resolved the conflicts and UT failures, let me know if anything else needed here.

@ganesh-k13
Copy link
Member Author

@seberg any other changes needed here?

@seberg
Copy link
Member

seberg commented Sep 27, 2023

Sorry, do you have time to resolve the conflict, I think that was really all that was left here, just I was in and out a bit much.

ganesh-k13 and others added 10 commits September 27, 2023 23:28
ENH, DOC: Added countbits (popcount)

ENH: Popcount implementation

ENH: Add popcount to umath

ENH: Added countbits (popcount) to umath `__all__`

ENH: Refined popcount logic

DOC: Added `bit_count`

Co-authored-by: Eric Wieser <wieser.eric@gmail.com>

MAINT: Renamed `countbits` to `bit_count`

MAINT: Fixed 4 1s magic number

DOC: Added `popcount` to docstring

ENH: Added bit_count annotations

ENH: Added GNU/CLANG popcount

DOC: Added `popcount` language example

ENH, BUG: Moved `bitcount` to npy_math.h as `popcount` | Fixed final right shift

ENH: Enable `popcount` for signed

TST: Tests for `bit_count`

BUG, DOC: (BUG) Added missing typecast causing an unwanted upcast
          (DOC) Added more details on `popcount` implementation

MAINT, BUG: (MAINT) Refined `popcount` TC to use typecode
            (BUG) Fixed ufunc.ntypes to include signed ints

ENH: Added windows builtin support

ENH: Added `popcount` implementation for big python ints natively
[1/2] `popcount` object loop changes

ENH: Object loop for `bit_count`
[2/2] `popcount` object loop changes

TST: Refined `bit_count` tests and added object type

ENH: Added `bit_count` to `np.int*`

DOC: Added `np.bit_count` (numpy#19355)

MAINT: Various linting and minor fixes:
1. Fixed passing all args to _internals umath bitcount.
   Note: We use kwargs here that might hinder performance
2. Fixed linting errors.
3. Improved verbosity of logs
4. Made a generic TO_BITS_LEN macro to accomdate more length based
   functions in future

BENCH: Added bit_count (popcount)

MAINT: Style nits | Added signed case

DOC, MAINT: Improved example

ENH: Added annotations for bit_count

TST: Added annotations tests for bit_count

MAINT: Fixed linting errors

MAINT: Moved Magic constants to npy_math_internal

MAINT: Remove python implementation | Added 3.10 check to tests

DOC: Added abs value usage to doc

MAINT: Resolved merge conflicts
* Changed popcount -> bitwise_count
* Added comment on why we remove certain function in float64 object
  loops
* Comment on when we skip bitwise_count
* Added return type for integer inputs
* Refined release note
* Refined external references for popcount
@ganesh-k13 ganesh-k13 force-pushed the enh_16325_popcount_ufunc branch from 9e052cd to e8c8824 Compare September 27, 2023 23:33
@ganesh-k13
Copy link
Member Author

@seberg it seems to pass now 👍 , good to go from my end.

@seberg
Copy link
Member

seberg commented Sep 28, 2023

Thanks, let's get it in before it diverges again.

@seberg seberg merged commit 5d4d05a into numpy:main Sep 28, 2023
@ganesh-k13
Copy link
Member Author

This is probably the longest I've worked on a PR :). Now time to get hex/fromhex in 😉

@jakevdp
Copy link
Contributor

jakevdp commented Oct 2, 2023

This is great! One observation – the implementation does not appear to support general output dtypes:

>>> import numpy as np
>>> np.bitwise_count(2, dtype='int32')
# ---------------------------------------------------------------------------
# TypeError                                 Traceback (most recent call last)
# <ipython-input-6-f7427f7dde09> in <cell line: 2>()
#       1 import numpy as np
# ----> 2 np.bitwise_count(2, dtype='int32')

# TypeError: No loop matching the specified signature and casting was found for ufunc bitwise_count

It would be nice if we could support the dtype argument.

@seberg
Copy link
Member

seberg commented Oct 4, 2023

Probably could do something about it, but the problem is that dtype= doesn't really mean output_dtype= which can only be achieved by out=.... dtype= means more something like "operation dtype".

So to make it work, you would have to explicitly teach the ufunc that dtype=int32 should just use the int8 loop (and cast). Which is actually possible by adding a promoter.
(Or it is be possible via some slight extensions/changes, by saying that it is really an to integer loop that happens to use (u)int8.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
01 - Enhancement 62 - Python API Changes or additions to the Python API. Mailing list should usually be notified.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ENH: Expose bit_count ufunc equivalent to the scalar methods
8 participants