-
-
Notifications
You must be signed in to change notification settings - Fork 11.2k
ENH: np.unique: support hash based unique for float and complex dtype #29537
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Very cool! When you finish this up for review it'd be nice to see some before/after comparisons on some benchmarks. |
Now it's ready for being reviewed. |
I added the triage review label so hopefully someone at the next triage meeting on Wedesnday will volunteer to look closer at this. Unfortunately I can't take that on this month - too many things going on! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR extends NumPy's np.unique
function to support hash-based unique value extraction for float and complex dtypes, providing significant performance improvements over the existing sort-based approach. The implementation leverages hash tables to efficiently identify unique values without requiring sorting.
Key changes include:
- Implementation of hash-based unique extraction for float and complex numeric types
- Addition of specialized hash and equality functions that handle NaN values appropriately
- Updates to test cases to accommodate unsorted output from hash-based approach
Reviewed Changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.
Show a summary per file
File | Description |
---|---|
numpy/_core/src/multiarray/unique.cpp |
Core implementation adding hash-based unique support for float/complex types with NaN handling |
numpy/lib/tests/test_arraysetops.py |
Updated test assertions to handle unsorted output from hash-based unique operations |
numpy/_core/meson.build |
Added npymath include directory for math function access |
doc/release/upcoming_changes/29537.performance.rst |
Documentation of performance improvements |
doc/release/upcoming_changes/29537.change.rst |
Documentation of behavior change regarding sorted output |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
int lhs_isnan = npy_isnan(lhs_real) || npy_isnan(lhs_imag); | ||
S rhs_real = real(*rhs); | ||
S rhs_imag = imag(*rhs); | ||
int rhs_isnan = npy_isnan(rhs_real) || npy_isnan(rhs_imag); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inconsistent use of npy_isnan vs npy_isnan_wrapper. This should use npy_isnan_wrapper for consistency with the template pattern used elsewhere.
int rhs_isnan = npy_isnan(rhs_real) || npy_isnan(rhs_imag); | |
int lhs_isnan = npy_isnan_wrapper<S>(lhs_real) || npy_isnan_wrapper<S>(lhs_imag); | |
S rhs_real = real(*rhs); | |
S rhs_imag = imag(*rhs); | |
int rhs_isnan = npy_isnan_wrapper<S>(rhs_real) || npy_isnan_wrapper<S>(rhs_imag); |
Copilot uses AI. Check for mistakes.
int lhs_isnan = npy_isnan(lhs_real) || npy_isnan(lhs_imag); | ||
S rhs_real = real(*rhs); | ||
S rhs_imag = imag(*rhs); | ||
int rhs_isnan = npy_isnan(rhs_real) || npy_isnan(rhs_imag); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inconsistent use of npy_isnan vs npy_isnan_wrapper. This should use npy_isnan_wrapper for consistency with the template pattern used elsewhere.
int rhs_isnan = npy_isnan(rhs_real) || npy_isnan(rhs_imag); | |
int lhs_isnan = npy_isnan_wrapper<S>(lhs_real) || npy_isnan_wrapper<S>(lhs_imag); | |
S rhs_real = real(*rhs); | |
S rhs_imag = imag(*rhs); | |
int rhs_isnan = npy_isnan_wrapper<S>(rhs_real) || npy_isnan_wrapper<S>(rhs_imag); |
Copilot uses AI. Check for mistakes.
I'm not sure what the state of the It's probably worth benchmarking several sizes of inputs, not just really huge arrays. |
@charris is curious how you're handling NaN. What happens if there's more than one distinct NaN value in the array. |
Description
This PR introduces hash-based uniqueness extraction support for float and complex types in NumPy's np.unique function.
Benchmark Results
The following benchmark demonstrates significant performance improvement from the new implementation.
float
complex
Result
float
complex
close #28363