gh-134821: Automatically enable deferred reference counting on shared objects #134880

ZeroIntensity · 2025-05-29T11:15:28Z

This is a very rough POC. I think it's important that we don't cause performance regressions here, so I've implemented this alongside the garbage collector.

Does this approach make sense to everyone?

Issue: Automatically defer heavily shared objects in free-threading. #134821

corona10

Could you run the pyperformance benchmark first?

ZeroIntensity · 2025-06-09T11:26:49Z

Sorry for the delay! The pyperformance benchmarks are showing quite a hit in performance:

Benchmark	main.json	patch.json	Change	Significance
2to3	241 ms	271 ms	1.13x slower	Significant (t=-13.56)
async_generators	299 ms	360 ms	1.20x slower	Significant (t=-20.38)
asyncio_tcp	299 ms	378 ms	1.27x slower	Significant (t=-40.45)
asyncio_tcp_ssl	907 ms	1.13 sec	1.25x slower	Significant (t=-40.04)
asyncio_websockets	388 ms	538 ms	1.39x slower	Significant (t=-116.83)
bench_mp_pool	25.3 ms	36.5 ms	1.44x slower	Significant (t=-54.78)
bench_thread_pool	790 us	1.58 ms	2.00x slower	Significant (t=-54.14)
chaos	48.6 ms	64.7 ms	1.33x slower	Significant (t=-32.61)
comprehensions	12.8 us	18.5 us	1.45x slower	Significant (t=-68.65)
coroutines	16.6 ms	27.1 ms	1.63x slower	Significant (t=-7.29)
coverage	65.1 ms	87.3 ms	1.34x slower	Significant (t=-81.16)
create_gc_cycles	784 us	1.05 ms	1.34x slower	Significant (t=-138.49)
crypto_pyaes	61.7 ms	81.0 ms	1.31x slower	Significant (t=-52.46)
deepcopy	221 us	294 us	1.33x slower	Significant (t=-89.80)
deepcopy_memo	23.3 us	31.0 us	1.33x slower	Significant (t=-31.68)
deepcopy_reduce	2.50 us	3.29 us	1.32x slower	Significant (t=-67.66)
deltablue	2.65 ms	3.54 ms	1.33x slower	Significant (t=-35.96)
docutils	1.84 sec	2.55 sec	1.39x slower	Significant (t=-26.80)
dulwich_log	31.3 ms	40.9 ms	1.31x slower	Significant (t=-69.29)
fannkuch	292 ms	390 ms	1.34x slower	Significant (t=-37.36)
float	47.8 ms	64.6 ms	1.35x slower	Significant (t=-20.71)
gc_traversal	1.66 ms	2.10 ms	1.26x slower	Significant (t=-17.49)
generators	25.0 ms	31.9 ms	1.27x slower	Significant (t=-13.48)
genshi_text	20.7 ms	27.7 ms	1.34x slower	Significant (t=-69.80)
genshi_xml	43.6 ms	56.4 ms	1.29x slower	Significant (t=-58.43)
go	91.6 ms	123 ms	1.34x slower	Significant (t=-80.14)
hexiom	4.81 ms	6.74 ms	1.40x slower	Significant (t=-67.13)
html5lib	39.4 ms	52.5 ms	1.33x slower	Significant (t=-28.43)
json_dumps	8.00 ms	10.7 ms	1.33x slower	Significant (t=-71.96)
json_loads	16.2 us	22.9 us	1.41x slower	Significant (t=-128.83)
logging_format	5.53 us	7.43 us	1.34x slower	Significant (t=-72.04)
logging_silent	480 ns	629 ns	1.31x slower	Significant (t=-55.59)
logging_simple	5.09 us	6.84 us	1.34x slower	Significant (t=-72.23)
mako	8.98 ms	11.6 ms	1.30x slower	Significant (t=-67.66)
mdp	967 ms	1.29 sec	1.34x slower	Significant (t=-45.06)
meteor_contest	72.4 ms	98.3 ms	1.36x slower	Significant (t=-107.75)
nbody	97.1 ms	128 ms	1.32x slower	Significant (t=-17.66)
nqueens	73.1 ms	96.8 ms	1.32x slower	Significant (t=-58.32)
pathlib	13.0 ms	17.4 ms	1.34x slower	Significant (t=-140.58)
pickle	8.12 us	10.6 us	1.31x slower	Significant (t=-130.07)
pickle_dict	18.0 us	23.7 us	1.32x slower	Significant (t=-148.71)
pickle_list	2.83 us	3.78 us	1.33x slower	Significant (t=-163.84)
pickle_pure_python	268 us	356 us	1.33x slower	Significant (t=-84.57)
pidigits	130 ms	173 ms	1.33x slower	Significant (t=-279.46)
pprint_pformat	1.38 sec	1.90 sec	1.37x slower	Significant (t=-10.28)
pprint_safe_repr	683 ms	894 ms	1.31x slower	Significant (t=-75.80)
pyflate	325 ms	447 ms	1.37x slower	Significant (t=-91.12)
python_startup	10.1 ms	14.1 ms	1.40x slower	Significant (t=-131.69)
python_startup_no_site	7.17 ms	12.4 ms	1.73x slower	Significant (t=-69.40)
raytrace	220 ms	297 ms	1.35x slower	Significant (t=-85.28)
regex_compile	89.5 ms	121 ms	1.35x slower	Significant (t=-122.74)
regex_dna	115 ms	163 ms	1.41x slower	Significant (t=-92.83)
regex_effbot	1.92 ms	2.63 ms	1.37x slower	Significant (t=-105.13)
regex_v8	15.7 ms	20.8 ms	1.32x slower	Significant (t=-60.71)
richards	36.4 ms	48.2 ms	1.32x slower	Significant (t=-55.65)
richards_super	42.3 ms	57.5 ms	1.36x slower	Significant (t=-45.53)
scimark_fft	264 ms	339 ms	1.28x slower	Significant (t=-41.68)
scimark_lu	92.7 ms	123 ms	1.32x slower	Significant (t=-31.12)
scimark_monte_carlo	57.8 ms	73.7 ms	1.28x slower	Significant (t=-36.04)
scimark_sor	90.2 ms	122 ms	1.36x slower	Significant (t=-30.54)
scimark_sparse_mat_mult	4.71 ms	5.89 ms	1.25x slower	Significant (t=-24.55)
spectral_norm	82.9 ms	103 ms	1.25x slower	Significant (t=-22.27)
sqlglot_normalize	228 ms	118 ms	1.93x faster	Significant (t=249.63)
sqlglot_optimize	41.7 ms	55.8 ms	1.34x slower	Significant (t=-91.94)
sqlglot_parse	1.04 ms	1.38 ms	1.32x slower	Significant (t=-39.12)
sqlglot_transpile	1.27 ms	1.73 ms	1.36x slower	Significant (t=-43.31)
sqlite_synth	1.42 us	1.87 us	1.32x slower	Significant (t=-86.85)
telco	6.72 ms	9.28 ms	1.38x slower	Significant (t=-96.69)
tomli_loads	1.73 sec	2.18 sec	1.26x slower	Significant (t=-37.94)
typing_runtime_protocols	141 us	190 us	1.35x slower	Significant (t=-70.98)
unpack_sequence	42.2 ns	52.4 ns	1.24x slower	Significant (t=-16.62)
unpickle	10.7 us	14.4 us	1.34x slower	Significant (t=-111.09)
unpickle_list	3.41 us	4.39 us	1.29x slower	Significant (t=-53.52)
unpickle_pure_python	180 us	238 us	1.32x slower	Significant (t=-48.67)
xml_etree_generate	67.8 ms	91.5 ms	1.35x slower	Significant (t=-127.05)
xml_etree_iterparse	54.5 ms	71.5 ms	1.31x slower	Significant (t=-82.95)
xml_etree_parse	83.6 ms	113 ms	1.35x slower	Significant (t=-75.07)
xml_etree_process	48.3 ms	66.8 ms	1.38x slower	Significant (t=-59.14)

Considering this approach is pretty non-intrusive, I suspect automatic DRC just isn't feasible right now, because there will always be false positive "shared" objects no matter how we implement it, and those false positives will experience an apparent 30% hit in performance. I think that individual patches, like #134762, will turn out to work pretty well.

ZeroIntensity added 2 commits May 28, 2025 13:31

Thin POC.

ff23f21

Only check the shared field.

091429e

ZeroIntensity requested review from corona10 and Fidget-Spinner May 29, 2025 11:15

bedevere-app bot mentioned this pull request May 29, 2025

Automatically defer heavily shared objects in free-threading. #134821

Open

corona10 reviewed Jun 1, 2025

View reviewed changes

ZeroIntensity closed this Jun 9, 2025

github-project-automation bot moved this to Done in lavitaconnect@MOSTAFAAMMER Jun 9, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

gh-134821: Automatically enable deferred reference counting on shared objects #134880

gh-134821: Automatically enable deferred reference counting on shared objects #134880

Uh oh!

ZeroIntensity commented May 29, 2025 •

edited by bedevere-app bot

Loading

Uh oh!

corona10 left a comment

Uh oh!

ZeroIntensity commented Jun 9, 2025

Uh oh!

Uh oh!

Uh oh!

gh-134821: Automatically enable deferred reference counting on shared objects #134880

gh-134821: Automatically enable deferred reference counting on shared objects #134880

Uh oh!

Conversation

ZeroIntensity commented May 29, 2025 • edited by bedevere-app bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

corona10 left a comment

Choose a reason for hiding this comment

Uh oh!

ZeroIntensity commented Jun 9, 2025

Uh oh!

Uh oh!

ZeroIntensity commented May 29, 2025 •

edited by bedevere-app bot

Loading