Skip to content

Build libgomp (gcc-13) from src on AArch64 #152361

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: gh/fadara01/1/base
Choose a base branch
from

Conversation

fadara01
Copy link
Collaborator

@fadara01 fadara01 commented Apr 28, 2025

Stack from ghstack (oldest at bottom):

Context: see #155795

To understand the effect of this libgomp version update, I benchmarked all the following combinations on Arm Neoverse-V1 CPUs:

Below are sample plots showing the percentage of speedup (i.e. 100% means 2x speedup) with the new libgomp (built from source) vs. the current state
We can see that updating libgomp:

  • Either improves performance or keep it the same
  • It does not cause any meaningful regressions
  • the speedups peak at small input lengths, high threads.

old_vs_new_libgomp_speedup_perc

benchmark script:

# SPDX-FileCopyrightText: Copyright 2025 Arm Limited and/or its affiliate <open-source-office@arm.com>
# SPDX-License-Identifier: BSD-3-Clause
import torch
from transformers import AutoModel, AutoConfig
import time
import numpy as np
from argparse import ArgumentParser
from contextlib import nullcontext


class ModelArgumentParser(ArgumentParser):
    def __init__(self) -> None:
        super().__init__(description="benchmark args")
        self.add_argument("--context_length", help="context length - number of input tokens", type=int, default=64)
        self.add_argument("--model", help="model checkpoint - i.e. 'bert-base-uncased'", type=str, default='bert-base-uncased')
        self.add_argument("--iters", help="benchmark iterations", type=int, default=500)
        self.add_argument('--dtype', choices=['bfloat16', 'float32', 'int8'], default='float32', help="datatype")
        self.add_argument('--mode', choices=['compile', 'eager'], default='eager', help="datatype")

if __name__ == "__main__":
    parser = ModelArgumentParser()
    args = parser.parse_args()
    assert not (args.dtype == "compile" and args.dtype == "int8")
    
    model_name = args.model
    config = AutoConfig.from_pretrained(model_name)
    batch_size = 1
    model = AutoModel.from_pretrained(model_name)

    maybe_autocast_ctx = torch.autocast(device_type="cpu", dtype=torch.bfloat16) if args.dtype == "bfloat16" else nullcontext()
    maybe_disable_torch_function = torch._C.DisableTorchFunction() if args.dtype == "int8" else nullcontext()

    with torch.no_grad(),maybe_autocast_ctx,maybe_disable_torch_function:
        if args.dtype == "int8":
            model = torch.quantization.quantize_dynamic(model, {torch.nn.Linear}, dtype=torch.qint8)
            
        model.eval()
        inputs = torch.randint(config.vocab_size, (batch_size, args.context_length), dtype=torch.long, device="cpu")

        times = []

        if args.mode == "compile":
            model = torch.compile(model, fullgraph=True)

        # warmup
        for _ in range(10):
            model(inputs)
        # benchmark
        for _ in range(args.iters):
            s = time.time_ns()
            model(inputs)
            times.append((time.time_ns() - s) / 1e6)

        avg_time = np.mean(times)
        print("Time: " + f"{avg_time:.2f} ms")

cc @malfet @snadampal @milpuz01 @aditew01 @nikhil-arm @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov

[ghstack-poisoned]
@fadara01 fadara01 requested a review from jeffdaily as a code owner April 28, 2025 20:22
Copy link

pytorch-bot bot commented Apr 28, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/152361

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (2 Unrelated Failures)

As of commit c4c280e with merge base d9426a8 (image):

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

fadara01 added a commit that referenced this pull request Apr 28, 2025
@pytorch-bot pytorch-bot bot added the topic: not user facing topic category label Apr 28, 2025
@fadara01
Copy link
Collaborator Author

@pytorchbot label "module: arm"

@pytorch-bot pytorch-bot bot added the module: arm Related to ARM architectures builds of PyTorch. Includes Apple M1 label Apr 28, 2025
@fadara01
Copy link
Collaborator Author

@pytorchbot label "ciflow/linux-aarch64"

@pytorch-bot pytorch-bot bot added the ciflow/linux-aarch64 linux aarch64 CI workflow label Apr 28, 2025
@nikhil-arm nikhil-arm marked this pull request as draft April 28, 2025 20:27
@jondea
Copy link
Contributor

jondea commented Apr 29, 2025

It generally looks good to me, I just have a few questions. What's the motivation for this? Also, it would be great to motivate the build/link flags with some comments. Did you get them from somewhere else? If so it would be worth a link so that we can keep them updated.

[ghstack-poisoned]
fadara01 added a commit that referenced this pull request Jun 12, 2025
ghstack-source-id: 316c834
Pull-Request-resolved: #152361
@fadara01 fadara01 changed the title [Will This Work?] Build libgomp (gcc-11) from src on AArch64 Build libgomp (gcc-11) from src on AArch64 Jun 12, 2025
Copy link
Contributor

@jondea jondea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of minor comments, but generally: this looks great, thank you for the detailed example and graphs!

# rpm --eval '%{optflags}'
# rpm --eval '%{build_ldflags}'
#
# I had to remove the following flags because they didn't compile for this version of libgomp:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we know what impact this might have?

@fadara01 fadara01 marked this pull request as ready for review June 12, 2025 12:18
[ghstack-poisoned]
@fadara01
Copy link
Collaborator Author

@pytorchbot rebase -i

Copy link

pytorch-bot bot commented Jun 12, 2025

❌ 🤖 pytorchbot command failed:

@pytorchbot: error: unrecognized arguments: -i

usage: @pytorchbot [-h] {merge,revert,rebase,label,drci,cherry-pick,close} ...

Try @pytorchbot --help for more info.

@fadara01
Copy link
Collaborator Author

@pytorchbot rebase

@pytorchmergebot
Copy link
Collaborator

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

[ghstack-poisoned]
@pytorchmergebot
Copy link
Collaborator

Successfully rebased gh/fadara01/1/orig onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via ghstack checkout https://github.com/pytorch/pytorch/pull/152361)

pytorchmergebot pushed a commit that referenced this pull request Jun 12, 2025
ghstack-source-id: 7d53332
Pull-Request-resolved: #152361
@fadara01
Copy link
Collaborator Author

Hi @malfet - it would be great to get your feedback/insights for this change.
#155795 contains context about the problem it aims to solve.

Copy link
Contributor

@malfet malfet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm somewhat conflicted about this change. I.e. perf benefits are clear, so we should update, but on the other hand:

  • We should not be in business of building basic OS components, like compiler, OpenMP runtime, etc, but rather rely on system vendors to provide them
  • I.e. it would be good to make similar change upstream against https://github.com/pypa/manylinux (unless it's already there for later versions)
  • Something tells me there probably already a binary copy available inside /opt/rh/gcc-toolset-${GCCTOOLSET_VERSION}/root/usr/lib64
  • Also see my comment about systems with 64kb pagesize (RHEL I guess) (@atalman do we run any AlmaLinux tests on aarch64?)
  • If you are building from source updating to a later version of the library, why not update to the latest currently available?
  • To have better confidence in the update, it would be good to make sure that same library is also used for all CI tests
  • And not to mention that it would be good to avoid version discrepancies across arches (i.e. both arm, powerpc and x86 should link again the same version of OpenMP library)


cd /usr/local/src
# fetch source for gcc 11
curl -LO https://ftp.gnu.org/gnu/gcc/gcc-11.4.0/gcc-11.4.0.tar.xz
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ftp.gnu.org are known to be unstable and somewhat antagonistic towards CI systems (i.e. it often rejects requests to download with HTTP/503, but it never happens when user downloads the file)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair, it's very slow but hasn't been failing in our downstream CI.
I now clone from https://gcc.gnu.org/git/gcc.git which is much faster and more secure

@fadara01
Copy link
Collaborator Author

@pytorchbot rebase

@pytorchmergebot
Copy link
Collaborator

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

[ghstack-poisoned]
@pytorchmergebot
Copy link
Collaborator

Successfully rebased gh/fadara01/1/orig onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via ghstack checkout https://github.com/pytorch/pytorch/pull/152361)

pytorchmergebot pushed a commit that referenced this pull request Jul 17, 2025
ghstack-source-id: 8078659
Pull-Request-resolved: #152361
fadara01 added a commit that referenced this pull request Jul 17, 2025
ghstack-source-id: 8078659
Pull-Request-resolved: #152361
[ghstack-poisoned]
fadara01 added a commit that referenced this pull request Jul 17, 2025
ghstack-source-id: 4783465
Pull-Request-resolved: #152361
@fadara01
Copy link
Collaborator Author

fadara01 commented Jul 18, 2025

We should not be in business of building basic OS components, like compiler, OpenMP runtime, etc, but rather rely on system vendors to provide them

I agree, with you on this, I only did this because I couldn't find any other option.

I.e. it would be good to make similar change upstream against https://github.com/pypa/manylinux (unless it's already there for later versions)

Yeah, later versions - i.e. manylinux 2.34 (AlmaLinux 9) has a newer version of libgomp which yields the same gains compared to the older version, but I assumed PyTorch are not ready to move to that version yet because it won't be compatible with Ubuntu versions < 21.10.

Something tells me there probably already a binary copy available inside /opt/rh/gcc-toolset-${GCCTOOLSET_VERSION}/root/usr/lib64

Yeah that was my impression too, but there's only a libgomp.so under /opt/rh/gcc-toolset-11/root/usr/lib/gcc/aarch64-redhat-linux/11 which is linker script pointing to the libgomp in /usr/lib64/libgomp.so.1.
If you uninstall libgomp and then try to install gcc-toolset-11-gcc, that will also install libgomp 8.5 since it's listed as a dependency for gcc-toolset-11-gcc indicating it's not part of that package.

[root@685358f9c0ad 11]# dnf install gcc-toolset-11-gcc
Last metadata expiration check: 18:51:04 ago on Thu 17 Jul 2025 01:51:09 PM UTC.
Dependencies resolved.
=================================================================================================================================================
 Package                               Architecture               Version                                     Repository                    Size
=================================================================================================================================================
Installing:
 gcc-toolset-11-gcc                    aarch64                    11.2.1-9.2.el8_6.alma.1                     appstream                     28 M
Installing dependencies:
 libgomp                               aarch64                    8.5.0-26.el8_10.alma.1                      baseos                       200 k

Transaction Summary
=================================================================================================================================================
Install  2 Packages

If we list all available libgomp packages in our AlmaLinux 8 image we only get version 8.5

[root@685358f9c0ad 11]# dnf list --showduplicates libgomp
Last metadata expiration check: 18:57:32 ago on Thu 17 Jul 2025 01:51:09 PM UTC.
Available Packages
libgomp.aarch64                                                   8.5.0-22.el8_10                                                          baseos
libgomp.aarch64                                                   8.5.0-23.el8_10.alma.1                                                   baseos
libgomp.aarch64                                                   8.5.0-24.el8_10.alma.1                                                   baseos
libgomp.aarch64                                                   8.5.0-26.el8_10.alma.1                                                   baseos

Also see my comment about systems with 64kb pagesize (RHEL I guess)

I replied as a review comment. 64kb pagesize will be handled by default.

If you are building from source updating to a later version of the library, why not update to the latest currently available?

Good point, when I raised this (I think) we were using gcc-11 to build pytorch and I thought I'd keep the libgomp version to match the compiler versions. I updated to libgomp 13 as now build with gcc-13.

To have better confidence in the update, it would be good to make sure that same library is also used for all CI tests (@atalman do we run any AlmaLinux tests on aarch64?)

Unfortunately, as it currently stands, the linux-aarch64 workflow builds the pytorch wheel in jammy-linux which will not test this change, as this targets wheels built with manylinux (i.e. release wheels).
We tested this in our downstream CI which builds pytorch wheel in manylinux and runs all upstream tests on that wheel (shoutout to @robert-hardwick). We've also been testing this for a few months as part of https://github.com/ARM-software/Tool-Solutions and we haven't seen any issues from this.
Is this enough to get this merged?

On a related note, we're planning to address the issue with building wheels in manylinux for releases vs building wheels in jammy for pre-commit testing. Please let me know if there's a good reason for why CI doesn't currently build its wheel in manylinux.

@fadara01 fadara01 changed the title Build libgomp (gcc-11) from src on AArch64 Build libgomp (gcc-13) from src on AArch64 Jul 18, 2025
@robert-hardwick
Copy link
Collaborator

robert-hardwick commented Aug 7, 2025

I'm not sure what is happening with this PR but just to let you know that this stack of PR's i've created might affect this #160079 . Not sure if this is installing libgomp to a different location, or if there are other libgomp versions that the linker might find.

Currently AArch64 manylinux uses auditwheel repair to package shared object files in the wheel file, but we are trying to remove AArch64 specific code and to align with other platforms you will now be required to manually define these dependency locations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/inductor ciflow/linux-aarch64 linux aarch64 CI workflow module: arm Related to ARM architectures builds of PyTorch. Includes Apple M1 module: inductor open source topic: not user facing topic category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants