Skip to content

[AMDGPU][clang] provide device implementation for __builtin_logb and … #129347

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

choikwa
Copy link
Contributor

@choikwa choikwa commented Mar 1, 2025

…__builtin_scalbn

Clang generates library calls for _builtin* functions which can be a problem for GPUs that cannot handle them. This patch generates call to device implementation for __builtin_logb and ldexp intrinsic for __builtin_scalbn.

@llvmbot llvmbot added clang Clang issues not falling into any other category clang:codegen IR generation bugs: mangling, exceptions, etc. labels Mar 1, 2025
@llvmbot
Copy link
Member

llvmbot commented Mar 1, 2025

@llvm/pr-subscribers-backend-amdgpu
@llvm/pr-subscribers-clang

@llvm/pr-subscribers-clang-codegen

Author: choikwa (choikwa)

Changes

…__builtin_scalbn

Clang generates library calls for _builtin* functions which can be a problem for GPUs that cannot handle them. This patch generates call to device implementation for __builtin_logb and ldexp intrinsic for __builtin_scalbn.


Full diff: https://github.com/llvm/llvm-project/pull/129347.diff

2 Files Affected:

  • (modified) clang/lib/CodeGen/CGBuiltin.cpp (+48-1)
  • (modified) clang/lib/CodeGen/CodeGenModule.h (+5)
diff --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp
index 03b8d16b76e0d..6a0497df7acfb 100644
--- a/clang/lib/CodeGen/CGBuiltin.cpp
+++ b/clang/lib/CodeGen/CGBuiltin.cpp
@@ -259,6 +259,26 @@ Value *readX18AsPtr(CodeGenFunction &CGF) {
   return CGF.Builder.CreateIntToPtr(X18, CGF.Int8PtrTy);
 }
 
+llvm::Constant *CodeGenModule::getDeviceLibFunction(const FunctionDecl *FD,
+                                                    unsigned BuiltinID) {
+  GlobalDecl D(FD);
+  llvm::SmallString<64> Name;
+  if (getTarget().getTriple().isAMDGCN()) {
+    switch (BuiltinID) {
+    default: return nullptr;
+    case Builtin::BIlogb:
+    case Builtin::BI__builtin_logb:
+      Name = "__ocml_logb_f64";
+    }
+  }
+  if (Name.empty())
+    return nullptr;
+
+  llvm::FunctionType *Ty =
+    cast<llvm::FunctionType>(getTypes().ConvertType(FD->getType()));
+  return GetOrCreateLLVMFunction(Name, Ty, D, /*ForVTable*/false);
+}
+
 /// getBuiltinLibFunction - Given a builtin id for a function like
 /// "__builtin_fabsf", return a Function* for "fabsf".
 llvm::Constant *CodeGenModule::getBuiltinLibFunction(const FunctionDecl *FD,
@@ -6579,10 +6599,32 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID,
   }
   }
 
+  // Some targets like GPUs do not support library call and must provide
+  // device overload implementation.
+  if (getTarget().getTriple().isAMDGCN())
+    // Emit library call to device-lib implementation
+    if (auto *DevLibFunc = CGM.getDeviceLibFunction(FD, BuiltinID))
+      return emitLibraryCall(*this, FD, E, DevLibFunc);
+
+  // These will be emitted as Intrinsic later.
+  auto NeedsDeviceOverloadToIntrin = [&](unsigned BuiltinID) {
+    if (getTarget().getTriple().isAMDGCN()) {
+      switch (BuiltinID) {
+      default:
+        return false;
+      case Builtin::BIscalbn:
+      case Builtin::BI__builtin_scalbn:
+        return true;
+      }
+    }
+    return false;
+  };
+
   // If this is an alias for a lib function (e.g. __builtin_sin), emit
   // the call using the normal call path, but using the unmangled
   // version of the function name.
-  if (getContext().BuiltinInfo.isLibFunction(BuiltinID))
+  if (!NeedsDeviceOverloadToIntrin(BuiltinID) &&
+      getContext().BuiltinInfo.isLibFunction(BuiltinID))
     return emitLibraryCall(*this, FD, E,
                            CGM.getBuiltinLibFunction(FD, BuiltinID));
 
@@ -20804,6 +20846,11 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned BuiltinID,
   case AMDGPU::BI__builtin_amdgcn_s_prefetch_data:
     return emitBuiltinWithOneOverloadedType<2>(
         *this, E, Intrinsic::amdgcn_s_prefetch_data);
+  case Builtin::BIscalbn:
+  case Builtin::BI__builtin_scalbn:
+    return emitBinaryExpMaybeConstrainedFPBuiltin(
+        *this, E, Intrinsic::ldexp,
+        Intrinsic::experimental_constrained_ldexp);
   default:
     return nullptr;
   }
diff --git a/clang/lib/CodeGen/CodeGenModule.h b/clang/lib/CodeGen/CodeGenModule.h
index 4a269f622ece4..890dc8556cc1c 100644
--- a/clang/lib/CodeGen/CodeGenModule.h
+++ b/clang/lib/CodeGen/CodeGenModule.h
@@ -1231,6 +1231,11 @@ class CodeGenModule : public CodeGenTypeCache {
       llvm::FunctionType *FnType = nullptr, bool DontDefer = false,
       ForDefinition_t IsForDefinition = NotForDefinition);
 
+  /// Given a builtin id for a function, return a Function* for device
+  /// overload implementation.
+  llvm::Constant *getDeviceLibFunction(const FunctionDecl *FD,
+                                       unsigned BuiltinID);
+
   /// Given a builtin id for a function like "__builtin_fabsf", return a
   /// Function* for "fabsf".
   llvm::Constant *getBuiltinLibFunction(const FunctionDecl *FD,

@choikwa choikwa requested review from jhuber6 and arsenm March 1, 2025 02:17
Copy link

github-actions bot commented Mar 1, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

default: return nullptr;
case Builtin::BIlogb:
case Builtin::BI__builtin_logb:
Name = "__ocml_logb_f64";
Copy link
Contributor

@jhuber6 jhuber6 Mar 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the solution is to start hard-coding ROCm Device Libs names in the compiler. Coincidentally I actually created an RFC about these kinds of problems just a few minutes ago https://discourse.llvm.org/t/rfc-lto-handling-math-libcalls-with-lto/84884.

Copy link
Contributor

@arsenm arsenm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do not and should never emit ocml calls in clang. There is also no reason to ever use scalbn for binary float types, it is identical to llvm.ldexp

@choikwa
Copy link
Contributor Author

choikwa commented Mar 1, 2025

The motivation came from CUDA cccl code containing __builtin_logb and __builtin_scalbn and NVPTX being able to handle them.

@jhuber6
Copy link
Contributor

jhuber6 commented Mar 1, 2025

The motivation came from CUDA cccl code containing __builtin_logb and __builtin_scalbn and being able to handle them.

In my ideal world, unhandled builtins would be lowered before the linker and we'd have a working library called libm.a that matches the platform. This is a similar problem with a lot of libc++ code that prefers to use the builtin versions because they're assumed to be 'fast'.

@choikwa
Copy link
Contributor Author

choikwa commented Mar 1, 2025

I would have preferred to do the overloads in the target specific header file, but __builtin_* methods couldn't be overloaded. I noticed that hip math header already overloads 'double logb(double)' by calling ocml version, so I mirrored that in clang. If there is better approach, I am interested.

@jhuber6
Copy link
Contributor

jhuber6 commented Mar 1, 2025

I would have preferred to do the overloads in the target specific header file, but _builtin* methods couldn't be overloaded. I noticed that hip math header already overloads 'double logb(double)' by calling ocml version, so I mirrored that in clang. If there is better approach, I am interested.

Long term, I think it's as I outlined in my RFC that I linked earlier.

Copy link
Contributor

@arsenm arsenm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang should never emit a call to an ocml function

@choikwa
Copy link
Contributor Author

choikwa commented Mar 1, 2025

clang should never emit a call to an ocml function

Would IR body of the ocml function be a suitable replacement to emit (for now)?

I understand once we have libm/libc implementation, clang can decide between the two.

@arsenm
Copy link
Contributor

arsenm commented Mar 2, 2025

I understand once we have libm/libc implementation, clang can decide between the two.

The behavior should be to just emit the call to the libm function. The real problem is we don't link a libm, and have a shim in a header file which isn't a system property

@choikwa
Copy link
Contributor Author

choikwa commented Mar 3, 2025

I understand once we have libm/libc implementation, clang can decide between the two.

The behavior should be to just emit the call to the libm function. The real problem is we don't link a libm, and have a shim in a header file which isn't a system property

There were some stakeholders who are trying to compile cccl on AMDGPU and were looking for a solution in clang. If this short term fix isn't desirable, is there any alternative solutions to explore? I understand long term solution keeping it as libcall and having libm, but we may be some time away from achieving that.

@jhuber6
Copy link
Contributor

jhuber6 commented Mar 3, 2025

There were some stakeholders who are trying to compile cccl on AMDGPU and were looking for a solution in clang. If this short term fix isn't desirable, is there any alternative solutions to explore? I understand long term solution keeping it as libcall and having libm, but we may be some time away from achieving that.

For now these don't seem to hit any LLVM-IR intrinsics https://godbolt.org/z/zasTefo53. So you should be able to link them with the libm.a I already have from the llvm libc I already provide. That might change in the future however since AFAIK we're going to have every math call as an intrinsic eventually.

There's libm.bc in the install which you can cram into your compile via your favorite method. Or just use -Xoffload-linker -lm if you're using the new driver on HIP (I'll make it the default eventually). Alternatively just make your own libm.o that contains the ROCm DL implementation and link that in. And if absolutely none of that works, make a custom IR lowering pass.

@choikwa
Copy link
Contributor Author

choikwa commented Mar 5, 2025

There were some stakeholders who are trying to compile cccl on AMDGPU and were looking for a solution in clang. If this short term fix isn't desirable, is there any alternative solutions to explore? I understand long term solution keeping it as libcall and having libm, but we may be some time away from achieving that.

For now these don't seem to hit any LLVM-IR intrinsics https://godbolt.org/z/zasTefo53. So you should be able to link them with the libm.a I already have from the llvm libc I already provide. That might change in the future however since AFAIK we're going to have every math call as an intrinsic eventually.

There's libm.bc in the install which you can cram into your compile via your favorite method. Or just use -Xoffload-linker -lm if you're using the new driver on HIP (I'll make it the default eventually). Alternatively just make your own libm.o that contains the ROCm DL implementation and link that in. And if absolutely none of that works, make a custom IR lowering pass.

After clang emits builtins as LibCall, it just appears as an external call, and it would be odd for LLVM to reserve and transform certain functions even if it aliases libc builtin function names as -fno-builtin exists and user could've defined their own. I appreciate the suggestions given here and can forward it to the stakeholders. Is there no desirable, acceptable solution in clang-side then?

@choikwa choikwa force-pushed the builtin branch 5 times, most recently from a1448c6 to adfb9c0 Compare March 14, 2025 19:27
Copy link
Contributor

@jhuber6 jhuber6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you trying to do an IR lowering? It'd probably make more sense to handle that in LLVM than clang, but I guess Matt would be the expert there.

@choikwa
Copy link
Contributor Author

choikwa commented Mar 14, 2025

Are you trying to do an IR lowering? It'd probably make more sense to handle that in LLVM than clang, but I guess Matt would be the expert there.

I had considered it but it has the issue of name aliasing since it just appears as an external call in LLVM after it's emitted by clang.

@choikwa choikwa force-pushed the builtin branch 2 times, most recently from e4d3140 to 635cbb7 Compare March 14, 2025 21:03
@choikwa choikwa requested a review from arsenm March 17, 2025 20:19
@choikwa
Copy link
Contributor Author

choikwa commented Mar 24, 2025

gentle ping

Copy link
Contributor

@jhuber6 jhuber6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this is just an IR lowering? We'd probably need some more conclusive tests that it works as expected for exceptional values, but I'm going to guess it was copied from an existing implementation so we could just refer to that one.

Copy link
Contributor

@jhuber6 jhuber6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll defer to @arsenm for the final verdict.

@choikwa choikwa force-pushed the builtin branch 3 times, most recently from 00827e0 to 54076ba Compare March 26, 2025 05:36
@choikwa
Copy link
Contributor Author

choikwa commented Mar 31, 2025

gentle ping

Comment on lines 6014 to 6028
// These will be emitted as Intrinsic later.
auto NeedsDeviceOverload = [&](unsigned BuiltinID) {
if (getTarget().getTriple().isAMDGCN()) {
switch (BuiltinID) {
default:
return false;
case Builtin::BIlogb:
case Builtin::BI__builtin_logb:
case Builtin::BIscalbn:
case Builtin::BI__builtin_scalbn:
return true;
}
}
return false;
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should not be a lambda, and should not hardcode the triple. and needs documentation, and a fixme to remove it. Can we check if the libcall is supported or something? Really we should have libm

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is llvm::isLibFuncEmittable(const Module *M, const TargetLibraryInfo *TLI, LibFunc TheLibFunc) but unsure how to access TLI from CGBuiltin's CodeGenFunction::EmitBuiltinExpr.

@choikwa choikwa force-pushed the builtin branch 2 times, most recently from a7f2b79 to b8b389d Compare April 8, 2025 06:36
@choikwa
Copy link
Contributor Author

choikwa commented Apr 8, 2025

Updated and addressed feedback except the triplet check. At the moment, I'm unsure if there is better way than to query using llvm::isLibFuncEmittable, but that requires TLI.

@arsenm arsenm requested a review from yxsamliu April 8, 2025 09:08
@choikwa choikwa force-pushed the builtin branch 3 times, most recently from 36bdf55 to 46da769 Compare April 8, 2025 19:13
…__builtin_scalbn

Clang generates library calls for __builtin_* functions which can be a problem for GPUs that cannot handle them.
This patch generates a device implementations for __builtin_logb and __builtin_scalbn by emitting LLVM IRs.
Only emit IRs when FP exceptions are disabled and math-errno is unset.
@choikwa
Copy link
Contributor Author

choikwa commented Apr 28, 2025

Gentle ping

@yxsamliu
Copy link
Collaborator

can we add a llvm-test-suite test for these builtin functions? we can compare their value with host results for typical and corner inputs, like the test added in this PR llvm/llvm-test-suite#230 . This will give us some confidence that the clang codegen works.

@choikwa
Copy link
Contributor Author

choikwa commented Apr 29, 2025

Created llvm/llvm-test-suite#240

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:AMDGPU clang:codegen IR generation bugs: mangling, exceptions, etc. clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants