-
-
Notifications
You must be signed in to change notification settings - Fork 11.1k
MNT: build numpy with link-time optimization in benchmarks #26616
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sounds like a good idea if you have observed that it helps.
It also raised the question for me about whether we need to use LTO anywhere else. IIRC building CPython from source doesn't use LTO, but all the installers on python.org do use it. It's not clear to me why that setup was chosen though - maybe backwards compat, portability, or to not make the choice for distros?
@@ -18,7 +18,7 @@ | |||
"branches": ["HEAD"], | |||
|
|||
"build_command": [ | |||
"python -m build --wheel -o {build_cache_dir} {build_dir}" | |||
"python -m build --wheel -C'setup-args=-Db_lto=true' -C'setup-args=-Dbuildtype=release' -o {build_cache_dir} {build_dir}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"python -m build --wheel -C'setup-args=-Db_lto=true' -C'setup-args=-Dbuildtype=release' -o {build_cache_dir} {build_dir}" | |
"python -m build --wheel -Csetup-args=-Db_lto=true -o {build_cache_dir} {build_dir}" |
Trying to write this in a slightly more idiomatic way. Release buildtype is the default when building a wheel, and the quotes weren't necessary
Hi, @vstinner here :-) I listed my articles there: https://vstinner.readthedocs.io/benchmark.html Maybe you're thinking at this article which is about PGO: https://vstinner.github.io/journey-to-stable-benchmark-deadcode.html |
I wonder if it has to to with reproducible builds. |
Python installers are built with LTO+PGO, as do Linux distributions such as Ubuntu and Fedora. It's not done by default by |
@ngoldbaum on what platform(s) did you test this? It doesn't seem to work for me with conda compilers on Linux x86-64:
Compiler version:
The problem here seems to be that [binaries]
ar = '/home/rgommers/mambaforge/envs/numpy-dev/bin/gcc-ar' and passing that to
That's not automatic though (I haven't checked why that is the case, probably lots of history in issue trackers), so this PR is going to break things. |
Also, a few
|
Darn, in that case I think I'd like to close this PR. If anyone would like to figure out how to get this working nicely under GCC please feel free to take this up again. I don't have time right now to figure out how to get this working under GCC. I was using apple's clang on an ARM Mac and didn't try under gcc before making the PR. I'm going to keep this in my back pocket as a way to reduce noise in the benchmarks.
I think work on enabling LTO and PGO in our wheel builds will also lead to improved performance in the binaries we distribute. Definitely not urgent though. |
I've been trying to benchmark a big refactor of numpy in #26607. I remembered a blog post from @vstinner years ago on python performance work, where he found turning on LTO improved reproducibility. See e.g. this LWN summary. Can't seem to find the original blog post...
Now that we have meson, it's easy to use LTO. It also doesn't take that much more time to build numpy with LTO, at least on my ARM Macbook.
This makes benchmarks where I was seeing substantial changes in speed comparing with main have no difference at all. I haven't tried the full benchmark suite but I suspect the remaining changes will be "real" differences and not just due to code shuffling in the binary or changing size slightly.