BUG: performance regression of polynomial evaluation #26843

PeterWurmsdobler · 2024-07-03T13:26:55Z

Describe the issue:

The evaluation of a polynomial of 2nd order takes about 50% more time when using numpy 2.0.0 in comparison to numpy version 1.26.4 which was revealed by running asv.

Reproduce the code example:

import numpy as np
from numpy.polynomial import Polynomial

class PolynomialBenchmark:
    def setup(self) -> None:
        self.polynomial = Polynomial([10, 2, 6])
        self.p_points = np.linspace(0.0, 1.0, num=1000)

    def time_eval(self) -> None:
        for p in self.p_points:
            self.polynomial(p)

Error message:

Change   | numpy 1.26.4  | numpy 2.0.0   | Ratio   | Benchmark (Parameter)                                                                                    |---------|---------------|---------------|---------|-------------------------------|
|+        | 2.67±0.03ms   | 4.09±0.06ms   | 1.53    | PolynomialBenchmark.time_eval |

Python and NumPy Versions:

2.0.0
3.11.8 (main, Apr 29 2024, 16:52:56) [GCC 11.4.0]

Runtime Environment:

Python 3.11.8 (main, Apr 29 2024, 16:52:56) [GCC 11.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.

import numpy; numpy.show_runtime()
[{'numpy_version': '2.0.0',
'python': '3.11.8 (main, Apr 29 2024, 16:52:56) [GCC 11.4.0]',
'uname': uname_result(system='Linux', node='DE-C-001VH', release='5.15.0-113-generic', version='#123-Ubuntu SMP Mon Jun 10 08:16:17 UTC 2024', machine='x86_64')},
{'simd_extensions': {'baseline': ['SSE', 'SSE2', 'SSE3'],
'found': ['SSSE3',
'SSE41',
'POPCNT',
'SSE42',
'AVX',
'F16C',
'FMA3',
'AVX2'],
'not_found': ['AVX512F',
'AVX512CD',
'AVX512_KNL',
'AVX512_KNM',
'AVX512_SKX',
'AVX512_CLX',
'AVX512_CNL',
'AVX512_ICL']}},
{'architecture': 'Haswell',
'filepath': '/home/peter/project/.env311/lib/python3.11/site-packages/numpy.libs/libscipy_openblas64_-99b71e71.so',
'internal_api': 'openblas',
'num_threads': 24,
'prefix': 'libscipy_openblas',
'threading_layer': 'pthreads',
'user_api': 'blas',
'version': '0.3.27'}]

Context for the issue:

The polynomial evaluation is part of a larger project of using cubic splines as well as scipy numerical integration and root finding where the polynomial is evaluated repeatedly. The overall performance of the project suffers a 25% execution increase.

The text was updated successfully, but these errors were encountered:

charris · 2024-07-03T15:11:25Z

Hmm. I suspect the dtype/casting changes here, the code itself hasn't changed. Does it make a difference if you use float coefficients instead of integers?

charris · 2024-07-03T15:16:05Z

There is also a lot asarray type stuff, which may dominate for degree 2 polynomials. Another thing to check might be looking at the time difference for higher degree polynomials.

PeterWurmsdobler · 2024-07-04T08:50:31Z

Hmm. I suspect the dtype/casting changes here, the code itself hasn't changed. Does it make a difference if you use float coefficients instead of integers?

Tried with both int and float coefficients, does not make a difference.

PeterWurmsdobler · 2024-07-04T09:53:08Z

There is also a lot asarray type stuff, which may dominate for degree 2 polynomials. Another thing to check might be looking at the time difference for higher degree polynomials.

I am not quite sure I understand the first part of the sentence about the asarray issue. As for the 2nd order polynomial, this and 3rd order are the polynomials being used and no higher order polynomials are needed. Perhaps it would be faster to use an explicit version of the 2nd order polynomial.

eendebakpt · 2024-07-04T12:14:50Z

@charris Actually, the code has changed. Compare

https://github.com/numpy/numpy/blob/maintenance/1.26.x/numpy/polynomial/_polybase.py#L510

with

https://github.com/numpy/numpy/blob/main/numpy/polynomial/_polybase.py#L525

I will look into whether this is the cause of the regression or not

eendebakpt · 2024-07-04T19:43:46Z

@PeterWurmsdobler First of all: numpy performs well on large matrices, but not so well on small arrays. So if you could vectorize your code, e.g. call polynomial(p_points) instead of a for loop over single data points that would greatly improve performance.

In your minimal example the call self.polynomial(p) takes about 1.5 microseconds on my system (numpy 1.26.x). For numpy 2.0 some overhead is added:

The call to pu.mapdomain is almost the same code, but the extra function call adds some overhead (~200 ns, which is typical for python function calls, but it noticable here)
In mapdomain there is an additional call to np.asarray which adds ~150 ns
After the call to np.asarray the argument p (which was type np.float64) becomes a numpy array of dtype=float64. The expression off + scl*x is much faster for scalars (used in numpy 1.26) than for arrays (which are used in numpy 2.0 due to the np.asarray). This adds again a couple of hundreds of ns.

I think this explain the performance regression you are seeing.

The changes that have been made are to resolve #17949. We could partly restore performance by replacing x = np.asanyarray(arg) with something like:

if isinstance(arg, numpy.generic):
   x = arg
else:
   x = np.asanyarray(arg)

(preferably implemented in C, not sure whether it already exists in numpy). I did not check whether such a change would pass unit tests.

@charris Would adding special casing such as in the example above be acceptable?

PeterWurmsdobler · 2024-07-05T07:57:21Z

@PeterWurmsdobler First of all: numpy performs well on large matrices, but not so well on small arrays. So if you could vectorize your code, e.g. call polynomial(p_points) instead of a for loop over single data points that would greatly improve performance.

Thanks for the suggestion; unfortunately the evaluation of the polynomial is part of an optimisation loop with unknown arguments a priori.

Also thanks for your detailed analysis of the internal execution performance. As it happens I tried to implement a simple version of a 2nd order polynomial with additions and multiplications; the performance was on par with the polynomial.

eendebakpt · 2024-07-05T17:01:20Z

Adding the fast path for np.generic does improve performance, it might recover nearly all of the performance of numpy 1.26. I added the code and a becnhmark to #26550.

Benchmark results on main:

· Discovering benchmarks
· Running 5 total benchmarks (1 commits * 1 environments * 5 benchmarks)
[ 0.00%] ·· Benchmarking existing-pyC__projects_misc_numpy_.venv_Scripts_python.exe
[10.00%] ··· Running (bench_polynomial.Polynomial.time_polynomial_addition--).....
[60.00%] ··· bench_polynomial.Polynomial.time_polynomial_addition                                                                                   14.3±0.3μs
[70.00%] ··· bench_polynomial.Polynomial.time_polynomial_evaluation_array_1000                                                                      15.0±0.4μs
[80.00%] ··· bench_polynomial.Polynomial.time_polynomial_evaluation_array_3                                                                         5.01±0.2μs
[90.00%] ··· bench_polynomial.Polynomial.time_polynomial_evaluation_python_float                                                                   2.16±0.02μs
[100.00%] ··· bench_polynomial.Polynomial.time_polynomial_evaluation_scalar                                                                         2.20±0.01μs

On the PR

· Discovering benchmarks
· Running 5 total benchmarks (1 commits * 1 environments * 5 benchmarks)
[ 0.00%] ·· Benchmarking existing-pyC__projects_misc_numpy_.venv_Scripts_python.exe
[10.00%] ··· Running (bench_polynomial.Polynomial.time_polynomial_addition--).....
[60.00%] ··· bench_polynomial.Polynomial.time_polynomial_addition                                                                                  13.6±0.05μs
[70.00%] ··· bench_polynomial.Polynomial.time_polynomial_evaluation_array_1000                                                                        15.1±1μs
[80.00%] ··· bench_polynomial.Polynomial.time_polynomial_evaluation_array_3                                                                        5.21±0.03μs
[90.00%] ··· bench_polynomial.Polynomial.time_polynomial_evaluation_python_float                                                                   1.45±0.01μs
[100.00%] ··· bench_polynomial.Polynomial.time_polynomial_evaluation_scalar                                                                         1.51±0.04μs

eendebakpt · 2024-07-06T19:20:51Z

With #26550 merged I think most of the performance regression has been recovered.

@PeterWurmsdobler One way to improve performance even more might be to use polyval directly. This works if the polynomial does not change during the optimization:

import numpy as np
from numpy.polynomial import Polynomial
from numpy.polynomial.polynomial import polyval

polynomial = Polynomial([10, 2, 6])
p_points = np.linspace(0.0, 1.0, num=1000)

def g():
    for p in p_points:
        polynomial(p)

polynomial = polynomial.convert()

def h():
    for p in p_points:
        polyval(p, polynomial.coef)

%timeit g()            
%timeit h()            
# 1.66 ms ± 22.9 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
# 824 µs ± 7.93 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

charris · 2024-07-06T19:38:58Z

If you actually use linspace, there is a linspace method in the Polynomial class that returns both the x and y values.

kveretennicov · 2024-07-11T08:19:25Z

Thank you very much, @eendebakpt and @charris! When do you think the next numpy release will be cut?

seberg · 2024-08-15T06:58:40Z

Going to close this, there doesn't seem to be a very concrete thing to improve here.

When do you think the next numpy release will be cut?

Not sure exactly, but in a few weeks there is already an rc.

PeterWurmsdobler added the 00 - Bug label Jul 3, 2024

eendebakpt mentioned this issue Jul 5, 2024

BUG: Make Polynomial evaluation adhere to nep 50 #26550

Merged

eendebakpt mentioned this issue Jul 5, 2024

ENH: Add asv benchmark for Polynomial #26863

Closed

seberg closed this as completed Aug 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: performance regression of polynomial evaluation #26843

BUG: performance regression of polynomial evaluation #26843

PeterWurmsdobler commented Jul 3, 2024

charris commented Jul 3, 2024 •

edited

Loading

charris commented Jul 3, 2024

PeterWurmsdobler commented Jul 4, 2024

PeterWurmsdobler commented Jul 4, 2024

eendebakpt commented Jul 4, 2024

eendebakpt commented Jul 4, 2024

PeterWurmsdobler commented Jul 5, 2024 •

edited

Loading

eendebakpt commented Jul 5, 2024

eendebakpt commented Jul 6, 2024

charris commented Jul 6, 2024

kveretennicov commented Jul 11, 2024

seberg commented Aug 15, 2024

BUG: performance regression of polynomial evaluation #26843

BUG: performance regression of polynomial evaluation #26843

Comments

PeterWurmsdobler commented Jul 3, 2024

Describe the issue:

Reproduce the code example:

Error message:

Python and NumPy Versions:

Runtime Environment:

Context for the issue:

charris commented Jul 3, 2024 • edited Loading

charris commented Jul 3, 2024

PeterWurmsdobler commented Jul 4, 2024

PeterWurmsdobler commented Jul 4, 2024

eendebakpt commented Jul 4, 2024

eendebakpt commented Jul 4, 2024

PeterWurmsdobler commented Jul 5, 2024 • edited Loading

eendebakpt commented Jul 5, 2024

eendebakpt commented Jul 6, 2024

charris commented Jul 6, 2024

kveretennicov commented Jul 11, 2024

seberg commented Aug 15, 2024

charris commented Jul 3, 2024 •

edited

Loading

PeterWurmsdobler commented Jul 5, 2024 •

edited

Loading