[AMDGPU][clang] provide device implementation for __builtin_logb and … #129347

choikwa · 2025-03-01T02:16:44Z

…__builtin_scalbn

Clang generates library calls for _builtin* functions which can be a problem for GPUs that cannot handle them. This patch generates call to device implementation for __builtin_logb and ldexp intrinsic for __builtin_scalbn.

llvmbot · 2025-03-01T02:17:16Z

@llvm/pr-subscribers-backend-amdgpu
@llvm/pr-subscribers-clang

@llvm/pr-subscribers-clang-codegen

Author: choikwa (choikwa)

Changes

…__builtin_scalbn

Clang generates library calls for _builtin* functions which can be a problem for GPUs that cannot handle them. This patch generates call to device implementation for __builtin_logb and ldexp intrinsic for __builtin_scalbn.

Full diff: https://github.com/llvm/llvm-project/pull/129347.diff

2 Files Affected:

(modified) clang/lib/CodeGen/CGBuiltin.cpp (+48-1)
(modified) clang/lib/CodeGen/CodeGenModule.h (+5)

diff --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp
index 03b8d16b76e0d..6a0497df7acfb 100644
--- a/clang/lib/CodeGen/CGBuiltin.cpp
+++ b/clang/lib/CodeGen/CGBuiltin.cpp
@@ -259,6 +259,26 @@ Value *readX18AsPtr(CodeGenFunction &CGF) {
   return CGF.Builder.CreateIntToPtr(X18, CGF.Int8PtrTy);
 }
 
+llvm::Constant *CodeGenModule::getDeviceLibFunction(const FunctionDecl *FD,
+                                                    unsigned BuiltinID) {
+  GlobalDecl D(FD);
+  llvm::SmallString<64> Name;
+  if (getTarget().getTriple().isAMDGCN()) {
+    switch (BuiltinID) {
+    default: return nullptr;
+    case Builtin::BIlogb:
+    case Builtin::BI__builtin_logb:
+      Name = "__ocml_logb_f64";
+    }
+  }
+  if (Name.empty())
+    return nullptr;
+
+  llvm::FunctionType *Ty =
+    cast<llvm::FunctionType>(getTypes().ConvertType(FD->getType()));
+  return GetOrCreateLLVMFunction(Name, Ty, D, /*ForVTable*/false);
+}
+
 /// getBuiltinLibFunction - Given a builtin id for a function like
 /// "__builtin_fabsf", return a Function* for "fabsf".
 llvm::Constant *CodeGenModule::getBuiltinLibFunction(const FunctionDecl *FD,
@@ -6579,10 +6599,32 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID,
   }
   }
 
+  // Some targets like GPUs do not support library call and must provide
+  // device overload implementation.
+  if (getTarget().getTriple().isAMDGCN())
+    // Emit library call to device-lib implementation
+    if (auto *DevLibFunc = CGM.getDeviceLibFunction(FD, BuiltinID))
+      return emitLibraryCall(*this, FD, E, DevLibFunc);
+
+  // These will be emitted as Intrinsic later.
+  auto NeedsDeviceOverloadToIntrin = [&](unsigned BuiltinID) {
+    if (getTarget().getTriple().isAMDGCN()) {
+      switch (BuiltinID) {
+      default:
+        return false;
+      case Builtin::BIscalbn:
+      case Builtin::BI__builtin_scalbn:
+        return true;
+      }
+    }
+    return false;
+  };
+
   // If this is an alias for a lib function (e.g. __builtin_sin), emit
   // the call using the normal call path, but using the unmangled
   // version of the function name.
-  if (getContext().BuiltinInfo.isLibFunction(BuiltinID))
+  if (!NeedsDeviceOverloadToIntrin(BuiltinID) &&
+      getContext().BuiltinInfo.isLibFunction(BuiltinID))
     return emitLibraryCall(*this, FD, E,
                            CGM.getBuiltinLibFunction(FD, BuiltinID));
 
@@ -20804,6 +20846,11 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned BuiltinID,
   case AMDGPU::BI__builtin_amdgcn_s_prefetch_data:
     return emitBuiltinWithOneOverloadedType<2>(
         *this, E, Intrinsic::amdgcn_s_prefetch_data);
+  case Builtin::BIscalbn:
+  case Builtin::BI__builtin_scalbn:
+    return emitBinaryExpMaybeConstrainedFPBuiltin(
+        *this, E, Intrinsic::ldexp,
+        Intrinsic::experimental_constrained_ldexp);
   default:
     return nullptr;
   }
diff --git a/clang/lib/CodeGen/CodeGenModule.h b/clang/lib/CodeGen/CodeGenModule.h
index 4a269f622ece4..890dc8556cc1c 100644
--- a/clang/lib/CodeGen/CodeGenModule.h
+++ b/clang/lib/CodeGen/CodeGenModule.h
@@ -1231,6 +1231,11 @@ class CodeGenModule : public CodeGenTypeCache {
       llvm::FunctionType *FnType = nullptr, bool DontDefer = false,
       ForDefinition_t IsForDefinition = NotForDefinition);
 
+  /// Given a builtin id for a function, return a Function* for device
+  /// overload implementation.
+  llvm::Constant *getDeviceLibFunction(const FunctionDecl *FD,
+                                       unsigned BuiltinID);
+
   /// Given a builtin id for a function like "__builtin_fabsf", return a
   /// Function* for "fabsf".
   llvm::Constant *getBuiltinLibFunction(const FunctionDecl *FD,

github-actions · 2025-03-01T02:20:10Z

✅ With the latest revision this PR passed the C/C++ code formatter.

jhuber6 · 2025-03-01T02:22:19Z

clang/lib/CodeGen/CGBuiltin.cpp

+    default: return nullptr;
+    case Builtin::BIlogb:
+    case Builtin::BI__builtin_logb:
+      Name = "__ocml_logb_f64";


I don't think the solution is to start hard-coding ROCm Device Libs names in the compiler. Coincidentally I actually created an RFC about these kinds of problems just a few minutes ago https://discourse.llvm.org/t/rfc-lto-handling-math-libcalls-with-lto/84884.

arsenm

We do not and should never emit ocml calls in clang. There is also no reason to ever use scalbn for binary float types, it is identical to llvm.ldexp

choikwa · 2025-03-01T03:53:49Z

The motivation came from CUDA cccl code containing __builtin_logb and __builtin_scalbn and NVPTX being able to handle them.

jhuber6 · 2025-03-01T03:55:42Z

The motivation came from CUDA cccl code containing __builtin_logb and __builtin_scalbn and being able to handle them.

In my ideal world, unhandled builtins would be lowered before the linker and we'd have a working library called libm.a that matches the platform. This is a similar problem with a lot of libc++ code that prefers to use the builtin versions because they're assumed to be 'fast'.

choikwa · 2025-03-01T04:08:04Z

I would have preferred to do the overloads in the target specific header file, but __builtin_* methods couldn't be overloaded. I noticed that hip math header already overloads 'double logb(double)' by calling ocml version, so I mirrored that in clang. If there is better approach, I am interested.

jhuber6 · 2025-03-01T04:09:39Z

I would have preferred to do the overloads in the target specific header file, but _builtin* methods couldn't be overloaded. I noticed that hip math header already overloads 'double logb(double)' by calling ocml version, so I mirrored that in clang. If there is better approach, I am interested.

Long term, I think it's as I outlined in my RFC that I linked earlier.

arsenm

clang should never emit a call to an ocml function

choikwa · 2025-03-01T06:25:16Z

clang should never emit a call to an ocml function

Would IR body of the ocml function be a suitable replacement to emit (for now)?

I understand once we have libm/libc implementation, clang can decide between the two.

arsenm · 2025-03-02T02:55:53Z

I understand once we have libm/libc implementation, clang can decide between the two.

The behavior should be to just emit the call to the libm function. The real problem is we don't link a libm, and have a shim in a header file which isn't a system property

choikwa · 2025-03-03T21:29:43Z

I understand once we have libm/libc implementation, clang can decide between the two.

The behavior should be to just emit the call to the libm function. The real problem is we don't link a libm, and have a shim in a header file which isn't a system property

There were some stakeholders who are trying to compile cccl on AMDGPU and were looking for a solution in clang. If this short term fix isn't desirable, is there any alternative solutions to explore? I understand long term solution keeping it as libcall and having libm, but we may be some time away from achieving that.

jhuber6 · 2025-03-03T21:37:55Z

There were some stakeholders who are trying to compile cccl on AMDGPU and were looking for a solution in clang. If this short term fix isn't desirable, is there any alternative solutions to explore? I understand long term solution keeping it as libcall and having libm, but we may be some time away from achieving that.

For now these don't seem to hit any LLVM-IR intrinsics https://godbolt.org/z/zasTefo53. So you should be able to link them with the libm.a I already have from the llvm libc I already provide. That might change in the future however since AFAIK we're going to have every math call as an intrinsic eventually.

There's libm.bc in the install which you can cram into your compile via your favorite method. Or just use -Xoffload-linker -lm if you're using the new driver on HIP (I'll make it the default eventually). Alternatively just make your own libm.o that contains the ROCm DL implementation and link that in. And if absolutely none of that works, make a custom IR lowering pass.

choikwa · 2025-03-05T17:22:20Z

There were some stakeholders who are trying to compile cccl on AMDGPU and were looking for a solution in clang. If this short term fix isn't desirable, is there any alternative solutions to explore? I understand long term solution keeping it as libcall and having libm, but we may be some time away from achieving that.

For now these don't seem to hit any LLVM-IR intrinsics https://godbolt.org/z/zasTefo53. So you should be able to link them with the libm.a I already have from the llvm libc I already provide. That might change in the future however since AFAIK we're going to have every math call as an intrinsic eventually.

There's libm.bc in the install which you can cram into your compile via your favorite method. Or just use -Xoffload-linker -lm if you're using the new driver on HIP (I'll make it the default eventually). Alternatively just make your own libm.o that contains the ROCm DL implementation and link that in. And if absolutely none of that works, make a custom IR lowering pass.

After clang emits builtins as LibCall, it just appears as an external call, and it would be odd for LLVM to reserve and transform certain functions even if it aliases libc builtin function names as -fno-builtin exists and user could've defined their own. I appreciate the suggestions given here and can forward it to the stakeholders. Is there no desirable, acceptable solution in clang-side then?

jhuber6

Are you trying to do an IR lowering? It'd probably make more sense to handle that in LLVM than clang, but I guess Matt would be the expert there.

choikwa · 2025-03-14T20:16:20Z

Are you trying to do an IR lowering? It'd probably make more sense to handle that in LLVM than clang, but I guess Matt would be the expert there.

I had considered it but it has the issue of name aliasing since it just appears as an external call in LLVM after it's emitted by clang.

choikwa · 2025-03-24T20:17:16Z

gentle ping

jhuber6

So this is just an IR lowering? We'd probably need some more conclusive tests that it works as expected for exceptional values, but I'm going to guess it was copied from an existing implementation so we could just refer to that one.

clang/test/CodeGenHIP/logb_scalbn.hip

jhuber6

I'll defer to @arsenm for the final verdict.

clang/test/CodeGenHIP/logb_scalbn.cpp

choikwa · 2025-03-31T20:23:22Z

gentle ping

arsenm · 2025-04-08T02:39:11Z

clang/lib/CodeGen/CGBuiltin.cpp

+  // These will be emitted as Intrinsic later.
+  auto NeedsDeviceOverload = [&](unsigned BuiltinID) {
+    if (getTarget().getTriple().isAMDGCN()) {
+      switch (BuiltinID) {
+      default:
+        return false;
+      case Builtin::BIlogb:
+      case Builtin::BI__builtin_logb:
+      case Builtin::BIscalbn:
+      case Builtin::BI__builtin_scalbn:
+        return true;
+      }
+    }
+    return false;
+  };


Should not be a lambda, and should not hardcode the triple. and needs documentation, and a fixme to remove it. Can we check if the libcall is supported or something? Really we should have libm

There is llvm::isLibFuncEmittable(const Module *M, const TargetLibraryInfo *TLI, LibFunc TheLibFunc) but unsure how to access TLI from CGBuiltin's CodeGenFunction::EmitBuiltinExpr.

clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp

clang/test/CodeGen/logb_scalbn.c

choikwa · 2025-04-08T06:47:39Z

Updated and addressed feedback except the triplet check. At the moment, I'm unsure if there is better way than to query using llvm::isLibFuncEmittable, but that requires TLI.

clang/test/CodeGen/logb_scalbn.c

clang/lib/CodeGen/TargetBuiltins/AMDGPU.cpp

clang/lib/CodeGen/CGBuiltin.cpp

…__builtin_scalbn Clang generates library calls for __builtin_* functions which can be a problem for GPUs that cannot handle them. This patch generates a device implementations for __builtin_logb and __builtin_scalbn by emitting LLVM IRs. Only emit IRs when FP exceptions are disabled and math-errno is unset.

choikwa · 2025-04-28T20:25:05Z

Gentle ping

yxsamliu · 2025-04-28T22:51:33Z

can we add a llvm-test-suite test for these builtin functions? we can compare their value with host results for typical and corner inputs, like the test added in this PR llvm/llvm-test-suite#230 . This will give us some confidence that the clang codegen works.

choikwa · 2025-04-29T16:56:55Z

Created llvm/llvm-test-suite#240

llvmbot added clang Clang issues not falling into any other category clang:codegen IR generation bugs: mangling, exceptions, etc. labels Mar 1, 2025

choikwa requested review from jhuber6 and arsenm March 1, 2025 02:17

jhuber6 reviewed Mar 1, 2025

View reviewed changes

arsenm requested changes Mar 1, 2025

View reviewed changes

choikwa force-pushed the builtin branch 5 times, most recently from a1448c6 to adfb9c0 Compare March 14, 2025 19:27

jhuber6 reviewed Mar 14, 2025

View reviewed changes

choikwa force-pushed the builtin branch from adfb9c0 to 2da966d Compare March 14, 2025 20:11

choikwa force-pushed the builtin branch 2 times, most recently from e4d3140 to 635cbb7 Compare March 14, 2025 21:03

choikwa requested a review from arsenm March 17, 2025 20:19

choikwa force-pushed the builtin branch from 635cbb7 to 16ce68d Compare March 21, 2025 22:59

llvmbot added the backend:AMDGPU label Mar 21, 2025

choikwa force-pushed the builtin branch from 16ce68d to 7c2e0cb Compare March 21, 2025 23:09

jhuber6 reviewed Mar 24, 2025

View reviewed changes

clang/test/CodeGenHIP/logb_scalbn.hip Outdated Show resolved Hide resolved

choikwa force-pushed the builtin branch from 7c2e0cb to 0901af8 Compare March 24, 2025 20:33

jhuber6 reviewed Mar 24, 2025

View reviewed changes

clang/test/CodeGenHIP/logb_scalbn.cpp Outdated Show resolved Hide resolved

choikwa force-pushed the builtin branch 3 times, most recently from 00827e0 to 54076ba Compare March 26, 2025 05:36

arsenm reviewed Apr 8, 2025

View reviewed changes

choikwa force-pushed the builtin branch 2 times, most recently from a7f2b79 to b8b389d Compare April 8, 2025 06:36

arsenm requested a review from yxsamliu April 8, 2025 09:08

arsenm reviewed Apr 8, 2025

View reviewed changes

choikwa force-pushed the builtin branch 3 times, most recently from 36bdf55 to 46da769 Compare April 8, 2025 19:13

yxsamliu reviewed Apr 9, 2025

View reviewed changes

clang/lib/CodeGen/CGBuiltin.cpp Outdated Show resolved Hide resolved

choikwa force-pushed the builtin branch from 46da769 to dfca8ec Compare April 9, 2025 15:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AMDGPU][clang] provide device implementation for __builtin_logb and … #129347

[AMDGPU][clang] provide device implementation for __builtin_logb and … #129347

choikwa commented Mar 1, 2025 •

edited

Loading

llvmbot commented Mar 1, 2025 •

edited

Loading

github-actions bot commented Mar 1, 2025 •

edited

Loading

jhuber6 Mar 1, 2025 •

edited

Loading

arsenm left a comment

choikwa commented Mar 1, 2025 •

edited

Loading

jhuber6 commented Mar 1, 2025 •

edited

Loading

choikwa commented Mar 1, 2025 •

edited

Loading

jhuber6 commented Mar 1, 2025

arsenm left a comment

choikwa commented Mar 1, 2025

arsenm commented Mar 2, 2025

choikwa commented Mar 3, 2025

jhuber6 commented Mar 3, 2025 •

edited

Loading

choikwa commented Mar 5, 2025

jhuber6 left a comment

choikwa commented Mar 14, 2025

choikwa commented Mar 24, 2025

jhuber6 left a comment

jhuber6 left a comment

choikwa commented Mar 31, 2025

arsenm Apr 8, 2025

choikwa Apr 8, 2025

choikwa commented Apr 8, 2025

choikwa commented Apr 28, 2025

yxsamliu commented Apr 28, 2025

choikwa commented Apr 29, 2025

[AMDGPU][clang] provide device implementation for __builtin_logb and … #129347

Are you sure you want to change the base?

[AMDGPU][clang] provide device implementation for __builtin_logb and … #129347

Conversation

choikwa commented Mar 1, 2025 • edited Loading

llvmbot commented Mar 1, 2025 • edited Loading

github-actions bot commented Mar 1, 2025 • edited Loading

jhuber6 Mar 1, 2025 • edited Loading

Choose a reason for hiding this comment

arsenm left a comment

Choose a reason for hiding this comment

choikwa commented Mar 1, 2025 • edited Loading

jhuber6 commented Mar 1, 2025 • edited Loading

choikwa commented Mar 1, 2025 • edited Loading

jhuber6 commented Mar 1, 2025

arsenm left a comment

Choose a reason for hiding this comment

choikwa commented Mar 1, 2025

arsenm commented Mar 2, 2025

choikwa commented Mar 3, 2025

jhuber6 commented Mar 3, 2025 • edited Loading

choikwa commented Mar 5, 2025

jhuber6 left a comment

Choose a reason for hiding this comment

choikwa commented Mar 14, 2025

choikwa commented Mar 24, 2025

jhuber6 left a comment

Choose a reason for hiding this comment

jhuber6 left a comment

Choose a reason for hiding this comment

choikwa commented Mar 31, 2025

arsenm Apr 8, 2025

Choose a reason for hiding this comment

choikwa Apr 8, 2025

Choose a reason for hiding this comment

choikwa commented Apr 8, 2025

choikwa commented Apr 28, 2025

yxsamliu commented Apr 28, 2025

choikwa commented Apr 29, 2025

choikwa commented Mar 1, 2025 •

edited

Loading

llvmbot commented Mar 1, 2025 •

edited

Loading

github-actions bot commented Mar 1, 2025 •

edited

Loading

jhuber6 Mar 1, 2025 •

edited

Loading

choikwa commented Mar 1, 2025 •

edited

Loading

jhuber6 commented Mar 1, 2025 •

edited

Loading

choikwa commented Mar 1, 2025 •

edited

Loading

jhuber6 commented Mar 3, 2025 •

edited

Loading