Skip to content

gh-134821: Automatically enable deferred reference counting on shared objects #134880

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

ZeroIntensity
Copy link
Member

@ZeroIntensity ZeroIntensity commented May 29, 2025

This is a very rough POC. I think it's important that we don't cause performance regressions here, so I've implemented this alongside the garbage collector.

Does this approach make sense to everyone?

Copy link
Member

@corona10 corona10 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you run the pyperformance benchmark first?

@ZeroIntensity
Copy link
Member Author

Sorry for the delay! The pyperformance benchmarks are showing quite a hit in performance:

Benchmark main.json patch.json Change Significance
2to3 241 ms 271 ms 1.13x slower Significant (t=-13.56)
async_generators 299 ms 360 ms 1.20x slower Significant (t=-20.38)
asyncio_tcp 299 ms 378 ms 1.27x slower Significant (t=-40.45)
asyncio_tcp_ssl 907 ms 1.13 sec 1.25x slower Significant (t=-40.04)
asyncio_websockets 388 ms 538 ms 1.39x slower Significant (t=-116.83)
bench_mp_pool 25.3 ms 36.5 ms 1.44x slower Significant (t=-54.78)
bench_thread_pool 790 us 1.58 ms 2.00x slower Significant (t=-54.14)
chaos 48.6 ms 64.7 ms 1.33x slower Significant (t=-32.61)
comprehensions 12.8 us 18.5 us 1.45x slower Significant (t=-68.65)
coroutines 16.6 ms 27.1 ms 1.63x slower Significant (t=-7.29)
coverage 65.1 ms 87.3 ms 1.34x slower Significant (t=-81.16)
create_gc_cycles 784 us 1.05 ms 1.34x slower Significant (t=-138.49)
crypto_pyaes 61.7 ms 81.0 ms 1.31x slower Significant (t=-52.46)
deepcopy 221 us 294 us 1.33x slower Significant (t=-89.80)
deepcopy_memo 23.3 us 31.0 us 1.33x slower Significant (t=-31.68)
deepcopy_reduce 2.50 us 3.29 us 1.32x slower Significant (t=-67.66)
deltablue 2.65 ms 3.54 ms 1.33x slower Significant (t=-35.96)
docutils 1.84 sec 2.55 sec 1.39x slower Significant (t=-26.80)
dulwich_log 31.3 ms 40.9 ms 1.31x slower Significant (t=-69.29)
fannkuch 292 ms 390 ms 1.34x slower Significant (t=-37.36)
float 47.8 ms 64.6 ms 1.35x slower Significant (t=-20.71)
gc_traversal 1.66 ms 2.10 ms 1.26x slower Significant (t=-17.49)
generators 25.0 ms 31.9 ms 1.27x slower Significant (t=-13.48)
genshi_text 20.7 ms 27.7 ms 1.34x slower Significant (t=-69.80)
genshi_xml 43.6 ms 56.4 ms 1.29x slower Significant (t=-58.43)
go 91.6 ms 123 ms 1.34x slower Significant (t=-80.14)
hexiom 4.81 ms 6.74 ms 1.40x slower Significant (t=-67.13)
html5lib 39.4 ms 52.5 ms 1.33x slower Significant (t=-28.43)
json_dumps 8.00 ms 10.7 ms 1.33x slower Significant (t=-71.96)
json_loads 16.2 us 22.9 us 1.41x slower Significant (t=-128.83)
logging_format 5.53 us 7.43 us 1.34x slower Significant (t=-72.04)
logging_silent 480 ns 629 ns 1.31x slower Significant (t=-55.59)
logging_simple 5.09 us 6.84 us 1.34x slower Significant (t=-72.23)
mako 8.98 ms 11.6 ms 1.30x slower Significant (t=-67.66)
mdp 967 ms 1.29 sec 1.34x slower Significant (t=-45.06)
meteor_contest 72.4 ms 98.3 ms 1.36x slower Significant (t=-107.75)
nbody 97.1 ms 128 ms 1.32x slower Significant (t=-17.66)
nqueens 73.1 ms 96.8 ms 1.32x slower Significant (t=-58.32)
pathlib 13.0 ms 17.4 ms 1.34x slower Significant (t=-140.58)
pickle 8.12 us 10.6 us 1.31x slower Significant (t=-130.07)
pickle_dict 18.0 us 23.7 us 1.32x slower Significant (t=-148.71)
pickle_list 2.83 us 3.78 us 1.33x slower Significant (t=-163.84)
pickle_pure_python 268 us 356 us 1.33x slower Significant (t=-84.57)
pidigits 130 ms 173 ms 1.33x slower Significant (t=-279.46)
pprint_pformat 1.38 sec 1.90 sec 1.37x slower Significant (t=-10.28)
pprint_safe_repr 683 ms 894 ms 1.31x slower Significant (t=-75.80)
pyflate 325 ms 447 ms 1.37x slower Significant (t=-91.12)
python_startup 10.1 ms 14.1 ms 1.40x slower Significant (t=-131.69)
python_startup_no_site 7.17 ms 12.4 ms 1.73x slower Significant (t=-69.40)
raytrace 220 ms 297 ms 1.35x slower Significant (t=-85.28)
regex_compile 89.5 ms 121 ms 1.35x slower Significant (t=-122.74)
regex_dna 115 ms 163 ms 1.41x slower Significant (t=-92.83)
regex_effbot 1.92 ms 2.63 ms 1.37x slower Significant (t=-105.13)
regex_v8 15.7 ms 20.8 ms 1.32x slower Significant (t=-60.71)
richards 36.4 ms 48.2 ms 1.32x slower Significant (t=-55.65)
richards_super 42.3 ms 57.5 ms 1.36x slower Significant (t=-45.53)
scimark_fft 264 ms 339 ms 1.28x slower Significant (t=-41.68)
scimark_lu 92.7 ms 123 ms 1.32x slower Significant (t=-31.12)
scimark_monte_carlo 57.8 ms 73.7 ms 1.28x slower Significant (t=-36.04)
scimark_sor 90.2 ms 122 ms 1.36x slower Significant (t=-30.54)
scimark_sparse_mat_mult 4.71 ms 5.89 ms 1.25x slower Significant (t=-24.55)
spectral_norm 82.9 ms 103 ms 1.25x slower Significant (t=-22.27)
sqlglot_normalize 228 ms 118 ms 1.93x faster Significant (t=249.63)
sqlglot_optimize 41.7 ms 55.8 ms 1.34x slower Significant (t=-91.94)
sqlglot_parse 1.04 ms 1.38 ms 1.32x slower Significant (t=-39.12)
sqlglot_transpile 1.27 ms 1.73 ms 1.36x slower Significant (t=-43.31)
sqlite_synth 1.42 us 1.87 us 1.32x slower Significant (t=-86.85)
telco 6.72 ms 9.28 ms 1.38x slower Significant (t=-96.69)
tomli_loads 1.73 sec 2.18 sec 1.26x slower Significant (t=-37.94)
typing_runtime_protocols 141 us 190 us 1.35x slower Significant (t=-70.98)
unpack_sequence 42.2 ns 52.4 ns 1.24x slower Significant (t=-16.62)
unpickle 10.7 us 14.4 us 1.34x slower Significant (t=-111.09)
unpickle_list 3.41 us 4.39 us 1.29x slower Significant (t=-53.52)
unpickle_pure_python 180 us 238 us 1.32x slower Significant (t=-48.67)
xml_etree_generate 67.8 ms 91.5 ms 1.35x slower Significant (t=-127.05)
xml_etree_iterparse 54.5 ms 71.5 ms 1.31x slower Significant (t=-82.95)
xml_etree_parse 83.6 ms 113 ms 1.35x slower Significant (t=-75.07)
xml_etree_process 48.3 ms 66.8 ms 1.38x slower Significant (t=-59.14)

Considering this approach is pretty non-intrusive, I suspect automatic DRC just isn't feasible right now, because there will always be false positive "shared" objects no matter how we implement it, and those false positives will experience an apparent 30% hit in performance. I think that individual patches, like #134762, will turn out to work pretty well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants