Skip to content

ENH: np.unique: support hash based unique for float and complex dtype #29537

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 18 commits into
base: main
Choose a base branch
from

Conversation

math-hiyoko
Copy link
Contributor

@math-hiyoko math-hiyoko commented Aug 11, 2025

Description

This PR introduces hash-based uniqueness extraction support for float and complex types in NumPy's np.unique function.

Benchmark Results

The following benchmark demonstrates significant performance improvement from the new implementation.

float

import random
import time

import numpy as np

arr = np.array(
    [
        random.random() for _ in range(1_000)
    ] * 5_000_000,
    dtype=np.float64,
)
np.random.shuffle(arr)

time_start = time.perf_counter()
print("unique count (hash based): ", len(np.unique(arr, sorted=False, equal_nan=False)))
time_elapsed = (time.perf_counter() - time_start)
print ("%5.3f secs" % (time_elapsed))

complex

import random
import time

import numpy as np

arr = np.array(
    [
        complex(random.random(), random.random()) for _ in range(1_000)
    ] * 1_000_000,
    dtype=np.complex128,
)
np.random.shuffle(arr)

time_start = time.perf_counter()
print("unique count (hash based): ", len(np.unique(arr, sorted=False, equal_nan=False)))
time_elapsed = (time.perf_counter() - time_start)
print ("%5.3f secs" % (time_elapsed))

Result

float

unique count (hash based):  1000
406.947 secs
unique count (numpy main):  1000
441.833 secs

complex

unique count (hash based):  1000
11.433 secs
unique count (numpy main):  1000
47.408 secs

close #28363

@math-hiyoko math-hiyoko marked this pull request as draft August 11, 2025 04:25
@math-hiyoko math-hiyoko changed the title Feature/#28363 ENH: np.unique: support hash based unique for float and complex dtype Aug 11, 2025
@ngoldbaum
Copy link
Member

Very cool! When you finish this up for review it'd be nice to see some before/after comparisons on some benchmarks.

@math-hiyoko math-hiyoko marked this pull request as ready for review August 15, 2025 13:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

np.unique: support float dtypes
2 participants