ENH: Extend coverage for benchmark of np.unique #29621

math-hiyoko · 2025-08-25T04:37:47Z

Following the discussion in #29537 (comment), this PR expands our np.unique benchmarks beyond small float arrays to better reflect real-world usage.
In particular, it adds complex and string inputs, tests a wider range of sizes, and varies the proportion of distinct values.

ngoldbaum · 2025-08-25T21:24:43Z

I tried running the benchmarks locally using this PR but couldn't actually finish running them. Can you make sure that these can complete in a reasonable amount of time? For reference, the current Unique benchmarks on main complete in about 10 seconds on my development machine. I was waiting at least two minutes before I killed the benchmark run on this PR.

You can run a "quick" version of the benchmarks you modified locally with this command:

spin bench --version --quick -t Unique

math-hiyoko · 2025-08-26T14:50:57Z

I trimmed the parameter grid (fewer nans ratios) in order to reduce the number of repetitions per case. With these changes the Unique benchmarks complete in about 6-7 minutes on my machine.

$ time spin bench --quick -t Unique
...
real    6m39.247s
user    6m16.166s
sys     0m10.816s

math-hiyoko · 2025-08-27T04:37:06Z

For large StringDType arrays, a single trial can take several seconds, so keeping the entire benchmark run to ~10 seconds while still collecting meaningful measurements is difficult.

mattip · 2025-08-27T07:28:50Z

This is run for every PR in CI, so keeping the time down is critical. The whole benchmark CI run now takes 20 minutes in this PR, where previously it took about 5-6 minutes.

benchmarks/benchmarks/bench_lib.py

ngoldbaum · 2025-08-29T16:26:04Z

@math-hiyoko I know you're more concerned with getting the performance improvement merged than this. I'll try to set aside some time to suggest how to trim down on the parameterization or data size to get a more reasonable benchmark.

Unfortunately benchmark is difficult and the numpy benchmarks are imperfect. We might have to err on the side of missing stuff just to have a benchmark suite that developers can run and iterate on in a reasonable amount of time.

math-hiyoko · 2025-08-30T16:23:17Z

If we want the ability to run more thorough benchmarks on some occasions and not others, one possible solution is to follow the model used by scikit-learn , or something similar to it. For example, they have an SKLBENCH_PROFILE environment variable that can be swapped between regular, fast, and large_scale.

I agree this would be a good direction, but I won’t implement it in this PR.
I’ll open a separate issue to track this idea.

math-hiyoko · 2025-08-30T16:40:44Z

We might have to err on the side of missing stuff just to have a benchmark suite that developers can run and iterate on in a reasonable amount of time.

I agree that prioritizing reasonable runtime is the right approach.

With the current state of the commits, the Unique benchmarks complete in under a minute on my local machine.
Do you all think this is still insufficient?

$ time spin bench --quick -t Unique
...

real    0m56.388s
user    0m53.458s
sys     0m2.464s

mattip · 2025-09-04T02:35:17Z

Benchmarking in CI is now 5m29s, as opposed to before this PR it was 3m10s - 3m40s.

math-hiyoko · 2025-09-04T15:19:55Z

Now it completes in 4m27s.
https://github.com/numpy/numpy/actions/runs/17467174382/job/49606517073?pr=29621

mattip · 2025-09-05T06:05:22Z

benchmarks/benchmarks/bench_lib.py

    params = [
        # sizes of the 1D arrays
-        [200, int(2e5)],
+        [int(1e3), int(1e6)],


Can you revert this change?

I’ve reverted this change.
Larger dataset benchmarks would indeed be useful, and I believe they will be better addressed as part of #29644.

mattip · 2025-09-07T09:00:24Z

Thanks @math-hiyoko

math-hiyoko added 2 commits August 25, 2025 02:43

enh: extend coverage

d5692ba

enh: coverage

723e2c6

math-hiyoko marked this pull request as draft August 25, 2025 04:37

github-actions bot added the 01 - Enhancement label Aug 25, 2025

math-hiyoko added 8 commits August 25, 2025 13:40

Merge branch 'main' into enh/extend_unique_benchmark

0e35683

fix: lint

45c9779

fix: change array_size

eba1760

fix: change length of string

24459e7

fix: change length of string

c9d6050

fix: change parameter

bbb2442

fix: change parameter

3036cc7

fix: change parameter

4754c87

math-hiyoko marked this pull request as ready for review August 25, 2025 13:20

fix: change parameter

3cbaf3f

fix: change parameter

aa5360d

tylerjereddy reviewed Aug 27, 2025

View reviewed changes

benchmarks/benchmarks/bench_lib.py Show resolved Hide resolved

math-hiyoko mentioned this pull request Aug 30, 2025

Add benchmark profile switching (fast / regular / large_scale) for asv benchmarks #29644

Open

fix: change parameter

2235ce1

mattip reviewed Sep 5, 2025

View reviewed changes

fix: change parameter

598f2df

mattip merged commit e099d05 into numpy:main Sep 7, 2025
76 checks passed

math-hiyoko mentioned this pull request Sep 7, 2025

ENH: np.unique: support hash based unique for float and complex dtype #29537

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

ENH: Extend coverage for benchmark of np.unique #29621

ENH: Extend coverage for benchmark of np.unique #29621

math-hiyoko commented Aug 25, 2025 •

edited

Loading

Uh oh!

ngoldbaum commented Aug 25, 2025

Uh oh!

math-hiyoko commented Aug 26, 2025 •

edited

Loading

Uh oh!

math-hiyoko commented Aug 27, 2025

Uh oh!

mattip commented Aug 27, 2025

Uh oh!

Uh oh!

ngoldbaum commented Aug 29, 2025

Uh oh!

math-hiyoko commented Aug 30, 2025

Uh oh!

math-hiyoko commented Aug 30, 2025 •

edited

Loading

Uh oh!

mattip commented Sep 4, 2025

Uh oh!

math-hiyoko commented Sep 4, 2025 •

edited

Loading

Uh oh!

mattip Sep 5, 2025

Uh oh!

math-hiyoko Sep 5, 2025

Uh oh!

Uh oh!

mattip commented Sep 7, 2025

Uh oh!

Uh oh!

Uh oh!

ENH: Extend coverage for benchmark of np.unique #29621

ENH: Extend coverage for benchmark of np.unique #29621

Conversation

math-hiyoko commented Aug 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ngoldbaum commented Aug 25, 2025

Uh oh!

math-hiyoko commented Aug 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

math-hiyoko commented Aug 27, 2025

Uh oh!

mattip commented Aug 27, 2025

Uh oh!

Uh oh!

ngoldbaum commented Aug 29, 2025

Uh oh!

math-hiyoko commented Aug 30, 2025

Uh oh!

math-hiyoko commented Aug 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mattip commented Sep 4, 2025

Uh oh!

math-hiyoko commented Sep 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mattip Sep 5, 2025

Choose a reason for hiding this comment

Uh oh!

math-hiyoko Sep 5, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mattip commented Sep 7, 2025

Uh oh!

Uh oh!

math-hiyoko commented Aug 25, 2025 •

edited

Loading

math-hiyoko commented Aug 26, 2025 •

edited

Loading

math-hiyoko commented Aug 30, 2025 •

edited

Loading

math-hiyoko commented Sep 4, 2025 •

edited

Loading