Skip to content

gh-126703: add freelist for PyComplexObject's #135233

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jun 13, 2025

Conversation

skirpichev
Copy link
Contributor

@skirpichev skirpichev commented Jun 7, 2025


Micro-benchmark:

Benchmark ref patch
a+b 223 ns 198 ns: 1.13x faster
a*b 236 ns 209 ns: 1.13x faster
Geometric mean (ref) 1.13x faster
# bench.py
import pyperf
from operator import add, mul
runner = pyperf.Runner()
a,b = 1+2j, -7+1.1j
runner.bench_func('a+b', add, a, b)
runner.bench_func('a*b', mul, a, b)

@corona10
Copy link
Member

corona10 commented Jun 7, 2025

@skirpichev Would you like to provide benchmark enhancement for this change?

@skirpichev
Copy link
Contributor Author

Oops, I just forgot to post one. Description updated.

@corona10
Copy link
Member

corona10 commented Jun 7, 2025

Please share the pyperformance benchmark results.
I'm not sure how often complex types are used in real-world workloads.
Microbenchmarks like this might suggest adding every built-in type to the freelist, but I don't think that's what we want.
Most of the types we've added so far are commonly used in loop-based operations.

cc @Fidget-Spinner @encukou @eendebakpt

@skirpichev skirpichev marked this pull request as draft June 7, 2025 13:51
@skirpichev
Copy link
Contributor Author

Please share the pyperformance benchmark results.

Well, it shouldn't be affected. This collection has no benchmarks for complex math at all:

$ git grep _E '[0-9.][jJ]|cmath|complex' pyperformance/data-files/benchmarks/||echo Oops
Oops

I'm not sure how often complex types are used in real-world workloads.

Probably less than floats, but not too much.

@skirpichev skirpichev marked this pull request as ready for review June 7, 2025 14:12
@encukou
Copy link
Member

encukou commented Jun 7, 2025

AFAICS, real-world workloads that don't use complex numbers will pay the price of:

  • two unused words of memory
  • the maintenance burden of a (I assume!) standard and unsurprising use of _Py_freelist

In applications that do use complex numbers, is it likely that they need to create/destroy them quickly?
IMO, an educated guess is enough to answer that, and if we've established that this is what freelists help with... I'd just trust Sergey here.

To review this I'll need to teach myself about Python's freelists; I can do that next week.

@corona10
Copy link
Member

corona10 commented Jun 7, 2025

IMO, an educated guess is enough to answer that, and if we've established that this is what freelists help with... I'd just trust Sergey here.

I believe that we use pyperformance benchmark as the real world proxy. So if there is no benchmark for it, I am unsure. At least someone needs to share the benchmark for this. If not, people will add new PR for other built-in types with a microbenchmark, which would not be worth adding. We need criteria for this.

@skirpichev
Copy link
Contributor Author

two unused words of memory

Yes, just a new empty _Py_freelists.

In applications that do use complex numbers, is it likely that they need to create/destroy them quickly?

That happens in every arithmetic operation.

Here is a benchmark with a simple sin() implementation:

Benchmark ref patch
mysin(1j) 14.0 us 12.8 us: 1.09x faster
mysin(1+1j) 15.7 us 14.4 us: 1.09x faster
Geometric mean (ref) 1.09x faster
# bench2.py
def mysin(z):
    i, lasts, s, fact, num, sign = 1, 0, z, 1, z, 1
    while s != lasts:
        lasts = s
        i += 2
        fact *= i*(i-1)
        num *= z*z
        sign *= -1
        s += num/fact*sign
    return s
import pyperf
runner = pyperf.Runner()
runner.bench_func('mysin(1j)', mysin, 1j)
runner.bench_func('mysin(1+1j)', mysin, 1+1j)

To review this I'll need to teach myself about Python's freelists

You are not alone, this is new for me as well :D IIUC (see help topic), this technique is not something, allowed for external C extensions.

If not, people will add new PR for other built-in types with a microbenchmark

I doubt we have too much built-in types. Being a built-in type is a sign from the real world too ;-)

@encukou
Copy link
Member

encukou commented Jun 9, 2025

I see this as a PR that speeds up complex numbers, not “X% of Python's real-world usage”. And that's fine.

We need criteria for this.

Why shouldn't the criteria be “any small builtin type where typical usage needs many instances”?

If not, people will add new PR for other built-in types with a microbenchmark, which would not be worth adding.

Why not? What would be the downside of all such built-in types using freelists?

@corona10
Copy link
Member

corona10 commented Jun 12, 2025

Why shouldn't the criteria be “any small builtin type where typical usage needs many instances”?
Why not? What would be the downside of all such built-in types using freelists?

Then why don’t we just register all built-in objects to the freelist? I think our primary focus is on keeping maintenance costs low. Once we add complex to the freelist, we’ll need to maintain it anyway.

We already have over 20 object types registered. If we’re going to add more, shouldn’t we also consider a more manageable way to maintain them?

We take a similar approach when deciding not to add a fast path, if the benefit isn’t significant enough, we skip it. From what I’ve observed, most recent optimizations are based on pyperformance results.

That said, if you still think we should add this, please go ahead. I just wanted to share my concern about adding it by default. Once an object is added to the freelist, it will usually see some benefit, but we should still weigh that against the maintenance burden.

@vstinner
Copy link
Member

We already have over 20 object types registered. If we’re going to add more, shouldn’t we also consider a more manageable way to maintain them?

We already have an API for that, the _Py_freelist API which is being used in this PR, no? What do you mean by "a more manageable way to maintain them"?

@corona10
Copy link
Member

We already have an API for that, the _Py_freelist API which is being used in this PR, no? What do you mean by "a more manageable way to maintain them"?

I thought of managing the list of types and generating the pycore_freelist_state.h with a Python script. but it looks like not simple, though.

@vstinner
Copy link
Member

I thought of managing the list of types and generating the pycore_freelist_state.h with a Python script. but it looks like not simple, though.

Oh ok, I see. I'm not sure that it's worth it. This list evolves rarely and IMO the maintenance cost is low.

@skirpichev skirpichev requested a review from vstinner June 13, 2025 03:14
@skirpichev
Copy link
Contributor Author

From what I’ve observed, most recent optimizations are based on pyperformance results.

But pyperformance can't measure any performance changes in the Python core type, complex. Is that a bug or a feature?

@corona10
Copy link
Member

corona10 commented Jun 13, 2025

But pyperformance can't measure any performance changes in the Python core type, complex. Is that a bug or a feature?

In the past, we observed many optimization enhancements for the binary operation, especially for the int or float types.
More worth optimization would be lowering the opcodes at the JIT optimzation what Ken Jin and Brandt is doing :)

@skirpichev
Copy link
Contributor Author

we observed many optimization enhancements for the binary operation, especially for the int or float types.

This is possible for complex type as well, but it's another story.

Copy link
Member

@vstinner vstinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. The change is short and 1.09x faster on the mysin() micro-benchmark is worth it IMO.

@corona10: You consider that the change is not worth it?

Copy link
Member

@corona10 corona10 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, Okay, let's consider the possibility of usage in the scientific area.

@corona10 corona10 merged commit c646846 into python:main Jun 13, 2025
42 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants