Build libgomp (gcc-13) from src on AArch64 #152361

fadara01 · 2025-04-28T20:22:19Z

Stack from ghstack (oldest at bottom):

-> Build libgomp (gcc-13) from src on AArch64 #152361

Context: see #155795

To understand the effect of this libgomp version update, I benchmarked all the following combinations on Arm Neoverse-V1 CPUs:

models: distilbert, bert-base, bert-large
modes: eager, compile
context_length: 32, 64, 128, 256, 512
dtype: float32, bfloat16 (autocast), int8 (dynamically quantized)
threads: 4, 8, 16, 32, 48, 64
pytorch builds: without Build libgomp (gcc-13) from src on AArch64 #152361, with Build libgomp (gcc-13) from src on AArch64 #152361

Below are sample plots showing the percentage of speedup (i.e. 100% means 2x speedup) with the new libgomp (built from source) vs. the current state
We can see that updating libgomp:

Either improves performance or keep it the same
It does not cause any meaningful regressions
the speedups peak at small input lengths, high threads.

benchmark script:

# SPDX-FileCopyrightText: Copyright 2025 Arm Limited and/or its affiliate <open-source-office@arm.com>
# SPDX-License-Identifier: BSD-3-Clause
import torch
from transformers import AutoModel, AutoConfig
import time
import numpy as np
from argparse import ArgumentParser
from contextlib import nullcontext


class ModelArgumentParser(ArgumentParser):
    def __init__(self) -> None:
        super().__init__(description="benchmark args")
        self.add_argument("--context_length", help="context length - number of input tokens", type=int, default=64)
        self.add_argument("--model", help="model checkpoint - i.e. 'bert-base-uncased'", type=str, default='bert-base-uncased')
        self.add_argument("--iters", help="benchmark iterations", type=int, default=500)
        self.add_argument('--dtype', choices=['bfloat16', 'float32', 'int8'], default='float32', help="datatype")
        self.add_argument('--mode', choices=['compile', 'eager'], default='eager', help="datatype")

if __name__ == "__main__":
    parser = ModelArgumentParser()
    args = parser.parse_args()
    assert not (args.dtype == "compile" and args.dtype == "int8")
    
    model_name = args.model
    config = AutoConfig.from_pretrained(model_name)
    batch_size = 1
    model = AutoModel.from_pretrained(model_name)

    maybe_autocast_ctx = torch.autocast(device_type="cpu", dtype=torch.bfloat16) if args.dtype == "bfloat16" else nullcontext()
    maybe_disable_torch_function = torch._C.DisableTorchFunction() if args.dtype == "int8" else nullcontext()

    with torch.no_grad(),maybe_autocast_ctx,maybe_disable_torch_function:
        if args.dtype == "int8":
            model = torch.quantization.quantize_dynamic(model, {torch.nn.Linear}, dtype=torch.qint8)
            
        model.eval()
        inputs = torch.randint(config.vocab_size, (batch_size, args.context_length), dtype=torch.long, device="cpu")

        times = []

        if args.mode == "compile":
            model = torch.compile(model, fullgraph=True)

        # warmup
        for _ in range(10):
            model(inputs)
        # benchmark
        for _ in range(args.iters):
            s = time.time_ns()
            model(inputs)
            times.append((time.time_ns() - s) / 1e6)

        avg_time = np.mean(times)
        print("Time: " + f"{avg_time:.2f} ms")

cc @malfet @snadampal @milpuz01 @aditew01 @nikhil-arm @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov

[ghstack-poisoned]

pytorch-bot · 2025-04-28T20:22:23Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/152361

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (2 Unrelated Failures)

As of commit c4c280e with merge base d9426a8 ():

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

inductor / cuda12.8-py3.10-gcc9-sm86 / test (inductor_torchbench, 1, 2, linux.g5.4xlarge.nvidia.gpu) (gh) (trunk failure)
torchrec_dlrm

UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:

pull / cuda12.8-py3.10-gcc9-sm75 / test (pr_time_benchmarks, 1, 1, linux.g4dn.metal.nvidia.gpu, unstable) (gh) (#153987)
MISSING REGRESSION TEST

This comment was automatically generated by Dr. CI and updates every 15 minutes.

ghstack-source-id: d41b126 Pull-Request-resolved: #152361

fadara01 · 2025-04-28T20:23:34Z

@pytorchbot label "module: arm"

fadara01 · 2025-04-28T20:23:58Z

@pytorchbot label "ciflow/linux-aarch64"

.ci/docker/common/install_libgomp.sh

jondea · 2025-04-29T06:46:36Z

It generally looks good to me, I just have a few questions. What's the motivation for this? Also, it would be great to motivate the build/link flags with some comments. Did you get them from somewhere else? If so it would be worth a link so that we can keep them updated.

[ghstack-poisoned]

ghstack-source-id: 316c834 Pull-Request-resolved: #152361

jondea

A couple of minor comments, but generally: this looks great, thank you for the detailed example and graphs!

.ci/docker/manywheel/Dockerfile_2_28_aarch64

.ci/docker/common/install_libgomp.sh

jondea · 2025-06-12T11:55:15Z

.ci/docker/common/install_libgomp.sh

+#    rpm --eval '%{optflags}'
+#    rpm --eval '%{build_ldflags}'
+#
+# I had to remove the following flags because they didn't compile for this version of libgomp:


Do we know what impact this might have?

[ghstack-poisoned]

fadara01 · 2025-06-12T16:03:22Z

@pytorchbot rebase -i

pytorch-bot · 2025-06-12T16:03:24Z

❌ 🤖 pytorchbot command failed:

@pytorchbot: error: unrecognized arguments: -i

usage: @pytorchbot [-h] {merge,revert,rebase,label,drci,cherry-pick,close} ...

Try @pytorchbot --help for more info.

fadara01 · 2025-06-12T16:03:42Z

@pytorchbot rebase

pytorchmergebot · 2025-06-12T16:05:08Z

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

[ghstack-poisoned]

pytorchmergebot · 2025-06-12T16:05:21Z

Successfully rebased gh/fadara01/1/orig onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via ghstack checkout https://github.com/pytorch/pytorch/pull/152361)

ghstack-source-id: 7d53332 Pull-Request-resolved: #152361

fadara01 · 2025-06-20T14:27:50Z

Hi @malfet - it would be great to get your feedback/insights for this change.
#155795 contains context about the problem it aims to solve.

malfet

I'm somewhat conflicted about this change. I.e. perf benefits are clear, so we should update, but on the other hand:

We should not be in business of building basic OS components, like compiler, OpenMP runtime, etc, but rather rely on system vendors to provide them
I.e. it would be good to make similar change upstream against https://github.com/pypa/manylinux (unless it's already there for later versions)
Something tells me there probably already a binary copy available inside /opt/rh/gcc-toolset-${GCCTOOLSET_VERSION}/root/usr/lib64
Also see my comment about systems with 64kb pagesize (RHEL I guess) (@atalman do we run any AlmaLinux tests on aarch64?)
If you are building from source updating to a later version of the library, why not update to the latest currently available?
To have better confidence in the update, it would be good to make sure that same library is also used for all CI tests
And not to mention that it would be good to avoid version discrepancies across arches (i.e. both arm, powerpc and x86 should link again the same version of OpenMP library)

malfet · 2025-06-20T15:23:11Z

.ci/docker/common/install_libgomp.sh

+
+cd /usr/local/src
+# fetch source for gcc 11
+curl -LO https://ftp.gnu.org/gnu/gcc/gcc-11.4.0/gcc-11.4.0.tar.xz


ftp.gnu.org are known to be unstable and somewhat antagonistic towards CI systems (i.e. it often rejects requests to download with HTTP/503, but it never happens when user downloads the file)

Fair, it's very slow but hasn't been failing in our downstream CI.
I now clone from https://gcc.gnu.org/git/gcc.git which is much faster and more secure

.ci/docker/common/install_libgomp.sh

fadara01 · 2025-07-17T16:00:44Z

@pytorchbot rebase

pytorchmergebot · 2025-07-17T16:02:21Z

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

[ghstack-poisoned]

pytorchmergebot · 2025-07-17T16:02:35Z

Successfully rebased gh/fadara01/1/orig onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via ghstack checkout https://github.com/pytorch/pytorch/pull/152361)

ghstack-source-id: 8078659 Pull-Request-resolved: #152361

[ghstack-poisoned]

ghstack-source-id: 4783465 Pull-Request-resolved: #152361

fadara01 · 2025-07-18T09:06:46Z

We should not be in business of building basic OS components, like compiler, OpenMP runtime, etc, but rather rely on system vendors to provide them

I agree, with you on this, I only did this because I couldn't find any other option.

I.e. it would be good to make similar change upstream against https://github.com/pypa/manylinux (unless it's already there for later versions)

Yeah, later versions - i.e. manylinux 2.34 (AlmaLinux 9) has a newer version of libgomp which yields the same gains compared to the older version, but I assumed PyTorch are not ready to move to that version yet because it won't be compatible with Ubuntu versions < 21.10.

Something tells me there probably already a binary copy available inside /opt/rh/gcc-toolset-${GCCTOOLSET_VERSION}/root/usr/lib64

Yeah that was my impression too, but there's only a libgomp.so under /opt/rh/gcc-toolset-11/root/usr/lib/gcc/aarch64-redhat-linux/11 which is linker script pointing to the libgomp in /usr/lib64/libgomp.so.1.
If you uninstall libgomp and then try to install gcc-toolset-11-gcc, that will also install libgomp 8.5 since it's listed as a dependency for gcc-toolset-11-gcc indicating it's not part of that package.

[root@685358f9c0ad 11]# dnf install gcc-toolset-11-gcc
Last metadata expiration check: 18:51:04 ago on Thu 17 Jul 2025 01:51:09 PM UTC.
Dependencies resolved.
=================================================================================================================================================
 Package                               Architecture               Version                                     Repository                    Size
=================================================================================================================================================
Installing:
 gcc-toolset-11-gcc                    aarch64                    11.2.1-9.2.el8_6.alma.1                     appstream                     28 M
Installing dependencies:
 libgomp                               aarch64                    8.5.0-26.el8_10.alma.1                      baseos                       200 k

Transaction Summary
=================================================================================================================================================
Install  2 Packages

If we list all available libgomp packages in our AlmaLinux 8 image we only get version 8.5

[root@685358f9c0ad 11]# dnf list --showduplicates libgomp
Last metadata expiration check: 18:57:32 ago on Thu 17 Jul 2025 01:51:09 PM UTC.
Available Packages
libgomp.aarch64                                                   8.5.0-22.el8_10                                                          baseos
libgomp.aarch64                                                   8.5.0-23.el8_10.alma.1                                                   baseos
libgomp.aarch64                                                   8.5.0-24.el8_10.alma.1                                                   baseos
libgomp.aarch64                                                   8.5.0-26.el8_10.alma.1                                                   baseos

Also see my comment about systems with 64kb pagesize (RHEL I guess)

I replied as a review comment. 64kb pagesize will be handled by default.

If you are building from source updating to a later version of the library, why not update to the latest currently available?

Good point, when I raised this (I think) we were using gcc-11 to build pytorch and I thought I'd keep the libgomp version to match the compiler versions. I updated to libgomp 13 as now build with gcc-13.

To have better confidence in the update, it would be good to make sure that same library is also used for all CI tests (@atalman do we run any AlmaLinux tests on aarch64?)

Unfortunately, as it currently stands, the linux-aarch64 workflow builds the pytorch wheel in jammy-linux which will not test this change, as this targets wheels built with manylinux (i.e. release wheels).
We tested this in our downstream CI which builds pytorch wheel in manylinux and runs all upstream tests on that wheel (shoutout to @robert-hardwick). We've also been testing this for a few months as part of https://github.com/ARM-software/Tool-Solutions and we haven't seen any issues from this.
Is this enough to get this merged?

On a related note, we're planning to address the issue with building wheels in manylinux for releases vs building wheels in jammy for pre-commit testing. Please let me know if there's a good reason for why CI doesn't currently build its wheel in manylinux.

robert-hardwick · 2025-08-07T11:23:19Z

I'm not sure what is happening with this PR but just to let you know that this stack of PR's i've created might affect this #160079 . Not sure if this is installing libgomp to a different location, or if there are other libgomp versions that the linker might find.

Currently AArch64 manylinux uses auditwheel repair to package shared object files in the wheel file, but we are trying to remove AArch64 specific code and to align with other platforms you will now be required to manually define these dependency locations.

Update

7c54b6b

[ghstack-poisoned]

fadara01 requested a review from jeffdaily as a code owner April 28, 2025 20:22

fadara01 added a commit that referenced this pull request Apr 28, 2025

[Will This Work?] Build libgomp (gcc-11) from src on AArch64

526b1eb

ghstack-source-id: d41b126 Pull-Request-resolved: #152361

pytorch-bot bot added the topic: not user facing topic category label Apr 28, 2025

pytorch-bot bot added the module: arm Related to ARM architectures builds of PyTorch. Includes Apple M1 label Apr 28, 2025

pytorch-bot bot added the ciflow/linux-aarch64 linux aarch64 CI workflow label Apr 28, 2025

nikhil-arm marked this pull request as draft April 28, 2025 20:27

pytorchbot added the open source label Apr 28, 2025

fadara01 added module: inductor ciflow/inductor labels Apr 28, 2025

fadara01 commented Apr 28, 2025

View reviewed changes

.ci/docker/common/install_libgomp.sh Outdated Show resolved Hide resolved

nSircombe mentioned this pull request May 9, 2025

Adds patch to build libgomp from sources ARM-software/Tool-Solutions#328

Merged

fadara01 mentioned this pull request Jun 12, 2025

Poor scaling on AArch64 at high thread counts #155795

Open

Update

3e17ce1

[ghstack-poisoned]

fadara01 added a commit that referenced this pull request Jun 12, 2025

Build libgomp (gcc-11) from src on AArch64

7d93790

ghstack-source-id: 316c834 Pull-Request-resolved: #152361

fadara01 changed the title ~~[Will This Work?] Build libgomp (gcc-11) from src on AArch64~~ Build libgomp (gcc-11) from src on AArch64 Jun 12, 2025

jondea suggested changes Jun 12, 2025

View reviewed changes

fadara01 marked this pull request as ready for review June 12, 2025 12:18

Update

2c884c2

[ghstack-poisoned]

Update

91ce865

[ghstack-poisoned]

pytorchmergebot pushed a commit that referenced this pull request Jun 12, 2025

Build libgomp (gcc-11) from src on AArch64

0ead716

ghstack-source-id: 7d53332 Pull-Request-resolved: #152361

aditew01 approved these changes Jun 13, 2025

View reviewed changes

malfet approved these changes Jun 20, 2025

View reviewed changes

Update

bc0b14b

[ghstack-poisoned]

pytorchmergebot pushed a commit that referenced this pull request Jul 17, 2025

Build libgomp (gcc-11) from src on AArch64

dd0ba93

ghstack-source-id: 8078659 Pull-Request-resolved: #152361

fadara01 added a commit that referenced this pull request Jul 17, 2025

Build libgomp (gcc-11) from src on AArch64

109f1d1

ghstack-source-id: 8078659 Pull-Request-resolved: #152361

Update

c4c280e

[ghstack-poisoned]

fadara01 added a commit that referenced this pull request Jul 17, 2025

Build libgomp (gcc-11) from src on AArch64

3130be8

ghstack-source-id: 4783465 Pull-Request-resolved: #152361

fadara01 changed the title ~~Build libgomp (gcc-11) from src on AArch64~~ Build libgomp (gcc-13) from src on AArch64 Jul 18, 2025

fadara01 mentioned this pull request Jul 18, 2025

Scaling Issue in new Pytorch Build method ARM-software/Tool-Solutions#258

Open

Build libgomp (gcc-13) from src on AArch64 #152361

Are you sure you want to change the base?

Build libgomp (gcc-13) from src on AArch64 #152361

Uh oh!

Conversation

fadara01 commented Apr 28, 2025 • edited by pytorchmergebot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Apr 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/152361

✅ You can merge normally! (2 Unrelated Failures)

Uh oh!

fadara01 commented Apr 28, 2025

Uh oh!

fadara01 commented Apr 28, 2025

Uh oh!

Uh oh!

jondea commented Apr 29, 2025

Uh oh!

jondea left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

jondea Jun 12, 2025

Choose a reason for hiding this comment

Uh oh!

fadara01 commented Jun 12, 2025

Uh oh!

pytorch-bot bot commented Jun 12, 2025

Uh oh!

fadara01 commented Jun 12, 2025

Uh oh!

pytorchmergebot commented Jun 12, 2025

Uh oh!

pytorchmergebot commented Jun 12, 2025

Uh oh!

fadara01 commented Jun 20, 2025

Uh oh!

malfet left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

malfet Jun 20, 2025

Choose a reason for hiding this comment

Uh oh!

fadara01 Jul 18, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

fadara01 commented Jul 17, 2025

Uh oh!

pytorchmergebot commented Jul 17, 2025

Uh oh!

pytorchmergebot commented Jul 17, 2025

Uh oh!

fadara01 commented Jul 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

robert-hardwick commented Aug 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

fadara01 commented Apr 28, 2025 •

edited by pytorchmergebot

Loading

pytorch-bot bot commented Apr 28, 2025 •

edited

Loading

malfet left a comment •

edited

Loading

fadara01 commented Jul 18, 2025 •

edited

Loading

robert-hardwick commented Aug 7, 2025 •

edited

Loading