-
Notifications
You must be signed in to change notification settings - Fork 13.4k
[AMDGPU][clang] provide device implementation for __builtin_logb and … #129347
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@llvm/pr-subscribers-backend-amdgpu @llvm/pr-subscribers-clang-codegen Author: choikwa (choikwa) Changes…__builtin_scalbn Clang generates library calls for _builtin* functions which can be a problem for GPUs that cannot handle them. This patch generates call to device implementation for __builtin_logb and ldexp intrinsic for __builtin_scalbn. Full diff: https://github.com/llvm/llvm-project/pull/129347.diff 2 Files Affected:
diff --git a/clang/lib/CodeGen/CGBuiltin.cpp b/clang/lib/CodeGen/CGBuiltin.cpp
index 03b8d16b76e0d..6a0497df7acfb 100644
--- a/clang/lib/CodeGen/CGBuiltin.cpp
+++ b/clang/lib/CodeGen/CGBuiltin.cpp
@@ -259,6 +259,26 @@ Value *readX18AsPtr(CodeGenFunction &CGF) {
return CGF.Builder.CreateIntToPtr(X18, CGF.Int8PtrTy);
}
+llvm::Constant *CodeGenModule::getDeviceLibFunction(const FunctionDecl *FD,
+ unsigned BuiltinID) {
+ GlobalDecl D(FD);
+ llvm::SmallString<64> Name;
+ if (getTarget().getTriple().isAMDGCN()) {
+ switch (BuiltinID) {
+ default: return nullptr;
+ case Builtin::BIlogb:
+ case Builtin::BI__builtin_logb:
+ Name = "__ocml_logb_f64";
+ }
+ }
+ if (Name.empty())
+ return nullptr;
+
+ llvm::FunctionType *Ty =
+ cast<llvm::FunctionType>(getTypes().ConvertType(FD->getType()));
+ return GetOrCreateLLVMFunction(Name, Ty, D, /*ForVTable*/false);
+}
+
/// getBuiltinLibFunction - Given a builtin id for a function like
/// "__builtin_fabsf", return a Function* for "fabsf".
llvm::Constant *CodeGenModule::getBuiltinLibFunction(const FunctionDecl *FD,
@@ -6579,10 +6599,32 @@ RValue CodeGenFunction::EmitBuiltinExpr(const GlobalDecl GD, unsigned BuiltinID,
}
}
+ // Some targets like GPUs do not support library call and must provide
+ // device overload implementation.
+ if (getTarget().getTriple().isAMDGCN())
+ // Emit library call to device-lib implementation
+ if (auto *DevLibFunc = CGM.getDeviceLibFunction(FD, BuiltinID))
+ return emitLibraryCall(*this, FD, E, DevLibFunc);
+
+ // These will be emitted as Intrinsic later.
+ auto NeedsDeviceOverloadToIntrin = [&](unsigned BuiltinID) {
+ if (getTarget().getTriple().isAMDGCN()) {
+ switch (BuiltinID) {
+ default:
+ return false;
+ case Builtin::BIscalbn:
+ case Builtin::BI__builtin_scalbn:
+ return true;
+ }
+ }
+ return false;
+ };
+
// If this is an alias for a lib function (e.g. __builtin_sin), emit
// the call using the normal call path, but using the unmangled
// version of the function name.
- if (getContext().BuiltinInfo.isLibFunction(BuiltinID))
+ if (!NeedsDeviceOverloadToIntrin(BuiltinID) &&
+ getContext().BuiltinInfo.isLibFunction(BuiltinID))
return emitLibraryCall(*this, FD, E,
CGM.getBuiltinLibFunction(FD, BuiltinID));
@@ -20804,6 +20846,11 @@ Value *CodeGenFunction::EmitAMDGPUBuiltinExpr(unsigned BuiltinID,
case AMDGPU::BI__builtin_amdgcn_s_prefetch_data:
return emitBuiltinWithOneOverloadedType<2>(
*this, E, Intrinsic::amdgcn_s_prefetch_data);
+ case Builtin::BIscalbn:
+ case Builtin::BI__builtin_scalbn:
+ return emitBinaryExpMaybeConstrainedFPBuiltin(
+ *this, E, Intrinsic::ldexp,
+ Intrinsic::experimental_constrained_ldexp);
default:
return nullptr;
}
diff --git a/clang/lib/CodeGen/CodeGenModule.h b/clang/lib/CodeGen/CodeGenModule.h
index 4a269f622ece4..890dc8556cc1c 100644
--- a/clang/lib/CodeGen/CodeGenModule.h
+++ b/clang/lib/CodeGen/CodeGenModule.h
@@ -1231,6 +1231,11 @@ class CodeGenModule : public CodeGenTypeCache {
llvm::FunctionType *FnType = nullptr, bool DontDefer = false,
ForDefinition_t IsForDefinition = NotForDefinition);
+ /// Given a builtin id for a function, return a Function* for device
+ /// overload implementation.
+ llvm::Constant *getDeviceLibFunction(const FunctionDecl *FD,
+ unsigned BuiltinID);
+
/// Given a builtin id for a function like "__builtin_fabsf", return a
/// Function* for "fabsf".
llvm::Constant *getBuiltinLibFunction(const FunctionDecl *FD,
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
clang/lib/CodeGen/CGBuiltin.cpp
Outdated
default: return nullptr; | ||
case Builtin::BIlogb: | ||
case Builtin::BI__builtin_logb: | ||
Name = "__ocml_logb_f64"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the solution is to start hard-coding ROCm Device Libs names in the compiler. Coincidentally I actually created an RFC about these kinds of problems just a few minutes ago https://discourse.llvm.org/t/rfc-lto-handling-math-libcalls-with-lto/84884.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do not and should never emit ocml calls in clang. There is also no reason to ever use scalbn for binary float types, it is identical to llvm.ldexp
The motivation came from CUDA cccl code containing __builtin_logb and __builtin_scalbn and NVPTX being able to handle them. |
In my ideal world, unhandled builtins would be lowered before the linker and we'd have a working library called |
I would have preferred to do the overloads in the target specific header file, but __builtin_* methods couldn't be overloaded. I noticed that hip math header already overloads 'double logb(double)' by calling ocml version, so I mirrored that in clang. If there is better approach, I am interested. |
Long term, I think it's as I outlined in my RFC that I linked earlier. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang should never emit a call to an ocml function
Would IR body of the ocml function be a suitable replacement to emit (for now)? I understand once we have libm/libc implementation, clang can decide between the two. |
The behavior should be to just emit the call to the libm function. The real problem is we don't link a libm, and have a shim in a header file which isn't a system property |
There were some stakeholders who are trying to compile cccl on AMDGPU and were looking for a solution in clang. If this short term fix isn't desirable, is there any alternative solutions to explore? I understand long term solution keeping it as libcall and having libm, but we may be some time away from achieving that. |
For now these don't seem to hit any LLVM-IR intrinsics https://godbolt.org/z/zasTefo53. So you should be able to link them with the There's |
After clang emits builtins as LibCall, it just appears as an external call, and it would be odd for LLVM to reserve and transform certain functions even if it aliases libc builtin function names as -fno-builtin exists and user could've defined their own. I appreciate the suggestions given here and can forward it to the stakeholders. Is there no desirable, acceptable solution in clang-side then? |
a1448c6
to
adfb9c0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you trying to do an IR lowering? It'd probably make more sense to handle that in LLVM than clang, but I guess Matt would be the expert there.
I had considered it but it has the issue of name aliasing since it just appears as an external call in LLVM after it's emitted by clang. |
e4d3140
to
635cbb7
Compare
gentle ping |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this is just an IR lowering? We'd probably need some more conclusive tests that it works as expected for exceptional values, but I'm going to guess it was copied from an existing implementation so we could just refer to that one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll defer to @arsenm for the final verdict.
00827e0
to
54076ba
Compare
gentle ping |
clang/lib/CodeGen/CGBuiltin.cpp
Outdated
// These will be emitted as Intrinsic later. | ||
auto NeedsDeviceOverload = [&](unsigned BuiltinID) { | ||
if (getTarget().getTriple().isAMDGCN()) { | ||
switch (BuiltinID) { | ||
default: | ||
return false; | ||
case Builtin::BIlogb: | ||
case Builtin::BI__builtin_logb: | ||
case Builtin::BIscalbn: | ||
case Builtin::BI__builtin_scalbn: | ||
return true; | ||
} | ||
} | ||
return false; | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should not be a lambda, and should not hardcode the triple. and needs documentation, and a fixme to remove it. Can we check if the libcall is supported or something? Really we should have libm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is llvm::isLibFuncEmittable(const Module *M, const TargetLibraryInfo *TLI, LibFunc TheLibFunc)
but unsure how to access TLI from CGBuiltin's CodeGenFunction::EmitBuiltinExpr.
a7f2b79
to
b8b389d
Compare
Updated and addressed feedback except the triplet check. At the moment, I'm unsure if there is better way than to query using llvm::isLibFuncEmittable, but that requires TLI. |
36bdf55
to
46da769
Compare
…__builtin_scalbn Clang generates library calls for __builtin_* functions which can be a problem for GPUs that cannot handle them. This patch generates a device implementations for __builtin_logb and __builtin_scalbn by emitting LLVM IRs. Only emit IRs when FP exceptions are disabled and math-errno is unset.
Gentle ping |
can we add a llvm-test-suite test for these builtin functions? we can compare their value with host results for typical and corner inputs, like the test added in this PR llvm/llvm-test-suite#230 . This will give us some confidence that the clang codegen works. |
Created llvm/llvm-test-suite#240 |
…__builtin_scalbn
Clang generates library calls for _builtin* functions which can be a problem for GPUs that cannot handle them. This patch generates call to device implementation for __builtin_logb and ldexp intrinsic for __builtin_scalbn.