-
-
Notifications
You must be signed in to change notification settings - Fork 182
Statically link libpython into interpreter (but keep building libpython3.x.so) #592
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
d8838a7
to
2af08c0
Compare
No obvious performance impact in any direction, really. In the following:
|
... which is a little surprising, come to think of it, given that the primary motivation for this was performance. Just to confirm, the build change is actually doing what we expect:
I wonder if the problem is my AWS VM has too fast of a filesystem! If I drop caches before each benchmark run, then we see a more defensible couple-percent speedup from this change:
|
We'd also expect to see some speedup on thread-local storage. I can confirm that we are getting the faster TLS model but I can't actually measure a difference. i've tried doing moar threading with this silly benchmark, but only got a 1-2% performance difference. import pystone
import concurrent.futures
t = concurrent.futures.ThreadPoolExecutor()
for i in range(500):
t.submit(pystone.pystones, 1000)
t.shutdown() However, on the free-threaded variants (
So I think the change is defensible on those grounds. (And the non-BOLT libpython isn't slower, and is actually apparently faster, which is maybe concerning....) Details of thread-local storage code generation differences:From 1ac233: e8 98 be ff ff call 1a80d0 <PyUnicode_DecodeUTF8@plt>
1ac234: R_X86_64_PLT32 PyUnicode_DecodeUTF8-0x4
1ac238: 48 89 44 24 08 mov %rax,0x8(%rsp)
1ac23d: 48 85 c0 test %rax,%rax
1ac240: 0f 84 79 01 00 00 je 1ac3bf <_PyPegen_new_identifier+0x1af>
1ac246: f6 40 20 40 testb $0x40,0x20(%rax)
1ac24a: 74 4c je 1ac298 <_PyPegen_new_identifier+0x88>
1ac24c: 48 8d 3d 6d 71 4c 01 lea 0x14c716d(%rip),%rdi # 16733c0 <.got>
1ac24f: R_X86_64_TLSLD _Py_tss_tstate-0x4
1ac253: e8 08 eb ff ff call 1aad60 <__tls_get_addr@plt>
1ac254: R_X86_64_PLT32 __tls_get_addr@GLIBC_2.3-0x4
1ac258: 48 8b 80 18 00 00 00 mov 0x18(%rax),%rax
1ac25b: R_X86_64_DTPOFF32 _Py_tss_tstate
1ac25f: 48 8b 78 10 mov 0x10(%rax),%rdi
1ac263: 4c 8d 74 24 08 lea 0x8(%rsp),%r14
1ac268: 4c 89 f6 mov %r14,%rsi
1ac26b: e8 d0 d2 ff ff call 1a9540 <_PyUnicode_InternImmortal@plt>
1ac26c: R_X86_64_PLT32 _PyUnicode_InternImmortal-0x4
1ac270: 48 8b 7b 20 mov 0x20(%rbx),%rdi
1ac274: 4d 8b 36 mov (%r14),%r14
1ac277: 4c 89 f6 mov %r14,%rsi
1ac27a: e8 31 bb ff ff call 1a7db0 <_PyArena_AddPyObject@plt>
1ac27b: R_X86_64_PLT32 _PyArena_AddPyObject-0x4 From 1c7bdb3: e8 9e 39 dc ff call 1a3f756 <PyUnicode_DecodeUTF8>
1c7bdb8: 48 89 44 24 08 mov %rax,0x8(%rsp)
1c7bdbd: 48 85 c0 test %rax,%rax
1c7bdc0: 0f 84 d6 d0 2c 00 je 1f48e9c <_PyPegen_new_identifier.cold+0x108>
1c7bdc6: f6 40 20 40 testb $0x40,0x20(%rax)
1c7bdca: 0f 84 c4 cf 2c 00 je 1f48d94 <_PyPegen_new_identifier.cold>
1c7bdd0: 64 48 8b 04 25 f8 ff mov %fs:0xfffffffffffffff8,%rax
1c7bdd7: ff ff
1c7bdd9: 48 8b 78 10 mov 0x10(%rax),%rdi
1c7bddd: 4c 8d 74 24 08 lea 0x8(%rsp),%r14
1c7bde2: 4c 89 f6 mov %r14,%rsi
1c7bde5: e8 e0 32 dc ff call 1a3f0ca <_PyUnicode_InternImmortal>
1c7bdea: 48 8b 7b 20 mov 0x20(%rbx),%rdi
1c7bdee: 4d 8b 36 mov (%r14),%r14
1c7bdf1: 4c 89 f6 mov %r14,%rsi
1c7bdf4: e8 e3 1a dc ff call 1a3d8dc <_PyArena_AddPyObject> Note the function call to |
4e73aa3
to
f073cef
Compare
The riscv failures look like
|
cpython-unix/build-cpython.sh
Outdated
# and some of the defaults are wrong or unwanted, so we have to fix | ||
# those manually. | ||
# TODO(geofft): This cannot be the only thing. | ||
patch -p1 -i "${ROOT}/patch-python-configure-cross-assume-pthread.patch" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we pass an option to configure
that set this to yes instead of patching?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Should this be limited to cross-compiles?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Should this be limited to cross-compiles?)
The patch as written is limited to cross-compiles, in that it changes the default behavior when ./configure is running in cross-compile mode and therefore (thinks that) it cannot run test programs. It leaves the configure test in place with the current behavior for the non-cross-compiling case. See #599.
Can we pass an option to
configure
that set this to yes instead of patching?
Looks like yes, with the test ./configure script mentioned in #599:
$ ./configure cross_compiling=yes ac_cv_pthread=yes
[...]
checking whether gcc accepts -pthread... (cached) yes
My vague intuition is that a patch is the right thing here because this patch is actually suitable for upstream—it is a better guess in 2025 that a compiler supports -pthread
than that it doesn't (just as it's a better guess that a compiler supports C99 strftime
, etc.). But I don't have a strong opinion. If you prefer, yes, we can do
if [ -n "${CROSS_COMPILING}" ]; then
EXTRA_CONFIGURE_FLAGS="${EXTRA_CONFIGURE_FLAGS} ac_cv_pthread=yes"
fi
(As mentioned in #599 we can also pass cross_compiling=yes
on the command line and drop the existing patch-force-cross-compile.patch.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do think
if [ -n "${CROSS_COMPILING}" ]; then
EXTRA_CONFIGURE_FLAGS="${EXTRA_CONFIGURE_FLAGS} ac_cv_pthread=yes"
fi
is generally better than a patch, since it removes a layer of indirection and is a common ./configure
pattern. It also makes it much more obvious that it's a cross-compile only change. I'm also supportive of you trying to contribute the patch upstream. I don't feel strongly here, but I was surprised / confused.
Yeah, I think this needs a small tweak to the validation script, let me do that. I'm going to tag this PR as riscv only to avoid rerunning CI on the successful arches, for future reference here's the current Actions run with those results. |
Also, switch to using cross_compiling=yes instead of patching ./configure in place, which allows us to move rerunning autoconf to right before running ./configure, avoiding the risk of patching ./configure.ac too late. See astral-sh#599.
f073cef
to
317d430
Compare
Nice! Those last few fixes look good to me. |
Cherry-pick python/cpython#133313 and apply it. The general motivation is performance; see the patch commit message for more details.