Skip to content

Conversation

math-hiyoko
Copy link
Contributor

@math-hiyoko math-hiyoko commented Aug 25, 2025

Following the discussion in #29537 (comment), this PR expands our np.unique benchmarks beyond small float arrays to better reflect real-world usage.
In particular, it adds complex and string inputs, tests a wider range of sizes, and varies the proportion of distinct values.

@math-hiyoko math-hiyoko marked this pull request as draft August 25, 2025 04:37
@math-hiyoko math-hiyoko marked this pull request as ready for review August 25, 2025 13:20
@ngoldbaum
Copy link
Member

I tried running the benchmarks locally using this PR but couldn't actually finish running them. Can you make sure that these can complete in a reasonable amount of time? For reference, the current Unique benchmarks on main complete in about 10 seconds on my development machine. I was waiting at least two minutes before I killed the benchmark run on this PR.

You can run a "quick" version of the benchmarks you modified locally with this command:

spin bench --version --quick -t Unique

@math-hiyoko
Copy link
Contributor Author

math-hiyoko commented Aug 26, 2025

I trimmed the parameter grid (fewer nans ratios) in order to reduce the number of repetitions per case. With these changes the Unique benchmarks complete in about 6-7 minutes on my machine.

$ time spin bench --quick -t Unique
...
real    6m39.247s
user    6m16.166s
sys     0m10.816s

@math-hiyoko
Copy link
Contributor Author

For large StringDType arrays, a single trial can take several seconds, so keeping the entire benchmark run to ~10 seconds while still collecting meaningful measurements is difficult.

@mattip
Copy link
Member

mattip commented Aug 27, 2025

This is run for every PR in CI, so keeping the time down is critical. The whole benchmark CI run now takes 20 minutes in this PR, where previously it took about 5-6 minutes.

@@ -119,37 +120,71 @@ def time_nanpercentile(self, array_size, percent_nans):
class Unique(Benchmark):
"""Benchmark for np.unique with np.nan values."""

param_names = ["array_size", "percent_nans"]
param_names = ["array_size", "percent_nans", "percent_unique_values", "dtype"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR was discussed in the community call today because of the slowdown in the asv benchmarks, especially because of the impact it has on the CI run time/resource usage.

If we want the ability to run more thorough benchmarks on some occasions and not others, one possible solution is to follow the model used by scikit-learn , or something similar to it. For example, they have an SKLBENCH_PROFILE environment variable that can be swapped between regular, fast, and large_scale.

https://github.com/scikit-learn/scikit-learn/blob/726ed184ed80b0191732baaaf5825b86b41db4d2/asv_benchmarks/benchmarks/config.json#L2-L10

It looks like the different asv profile settings then propagate through an abstract benchmark base class at: https://github.com/scikit-learn/scikit-learn/blob/main/asv_benchmarks/benchmarks/common.py#L98

Anyway, something like that may be an option, and I told Matti I'd share this earlier today.

@ngoldbaum
Copy link
Member

@math-hiyoko I know you're more concerned with getting the performance improvement merged than this. I'll try to set aside some time to suggest how to trim down on the parameterization or data size to get a more reasonable benchmark.

Unfortunately benchmark is difficult and the numpy benchmarks are imperfect. We might have to err on the side of missing stuff just to have a benchmark suite that developers can run and iterate on in a reasonable amount of time.

@math-hiyoko
Copy link
Contributor Author

If we want the ability to run more thorough benchmarks on some occasions and not others, one possible solution is to follow the model used by scikit-learn , or something similar to it. For example, they have an SKLBENCH_PROFILE environment variable that can be swapped between regular, fast, and large_scale.

I agree this would be a good direction, but I won’t implement it in this PR.
I’ll open a separate issue to track this idea.

@math-hiyoko
Copy link
Contributor Author

math-hiyoko commented Aug 30, 2025

We might have to err on the side of missing stuff just to have a benchmark suite that developers can run and iterate on in a reasonable amount of time.

I agree that prioritizing reasonable runtime is the right approach.

With the current state of the commits, the Unique benchmarks complete in under a minute on my local machine.
Do you all think this is still insufficient?

$ time spin bench --quick -t Unique
...

real    0m56.388s
user    0m53.458s
sys     0m2.464s

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants