-
Notifications
You must be signed in to change notification settings - Fork 13.4k
[clang][CodeGen] Emit atomic IR in place of optimized libcalls. #73176
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-clang-codegen @llvm/pr-subscribers-clang Author: None (Logikable) Changes…inters. Calling __atomic_fetch_op_n is undefined for misaligned pointers. Since the backend can handle atomic IR on misaligned pointers, emit that instead. To keep things simple, we make this change for all misaligned operations, not just the integral ones. There is an additional consequence of this change. Previously, libcalls were emitted for misaligned, misshapen (size != 2^n), and oversized objects. Since optimized libcalls only operate on 2^n sized members, removing the misaligned case means optimized libcalls will never be emitted, and all relevant codepaths can be cleaned up. A simple correctness test is to have one thread perform an arithmetic operation on a misaligned integer, and another thread perform a non-arithmetic operation (e.g. xchg) on the same value. Such a test exhibits incorrect behaviour currently. Patch is 62.44 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/73176.diff 11 Files Affected:
diff --git a/clang/lib/CodeGen/CGAtomic.cpp b/clang/lib/CodeGen/CGAtomic.cpp
index 6005d5c51c0e1ac..bc432afb6ee288b 100644
--- a/clang/lib/CodeGen/CGAtomic.cpp
+++ b/clang/lib/CodeGen/CGAtomic.cpp
@@ -785,27 +785,76 @@ static void EmitAtomicOp(CodeGenFunction &CGF, AtomicExpr *Expr, Address Dest,
Builder.SetInsertPoint(ContBB);
}
-static void
-AddDirectArgument(CodeGenFunction &CGF, CallArgList &Args,
- bool UseOptimizedLibcall, llvm::Value *Val, QualType ValTy,
- SourceLocation Loc, CharUnits SizeInChars) {
- if (UseOptimizedLibcall) {
- // Load value and pass it to the function directly.
- CharUnits Align = CGF.getContext().getTypeAlignInChars(ValTy);
- int64_t SizeInBits = CGF.getContext().toBits(SizeInChars);
- ValTy =
- CGF.getContext().getIntTypeForBitwidth(SizeInBits, /*Signed=*/false);
- llvm::Type *ITy = llvm::IntegerType::get(CGF.getLLVMContext(), SizeInBits);
- Address Ptr = Address(Val, ITy, Align);
- Val = CGF.EmitLoadOfScalar(Ptr, false,
- CGF.getContext().getPointerType(ValTy),
- Loc);
- // Coerce the value into an appropriately sized integer type.
- Args.add(RValue::get(Val), ValTy);
- } else {
- // Non-optimized functions always take a reference.
- Args.add(RValue::get(Val), CGF.getContext().VoidPtrTy);
+static bool
+isArithmeticOp(AtomicExpr::AtomicOp op) {
+ switch (op) {
+ case AtomicExpr::AO__atomic_add_fetch:
+ case AtomicExpr::AO__atomic_fetch_add:
+ case AtomicExpr::AO__c11_atomic_fetch_add:
+ case AtomicExpr::AO__hip_atomic_fetch_add:
+ case AtomicExpr::AO__opencl_atomic_fetch_add:
+ case AtomicExpr::AO__atomic_and_fetch:
+ case AtomicExpr::AO__atomic_fetch_and:
+ case AtomicExpr::AO__c11_atomic_fetch_and:
+ case AtomicExpr::AO__hip_atomic_fetch_and:
+ case AtomicExpr::AO__opencl_atomic_fetch_and:
+ case AtomicExpr::AO__atomic_or_fetch:
+ case AtomicExpr::AO__atomic_fetch_or:
+ case AtomicExpr::AO__c11_atomic_fetch_or:
+ case AtomicExpr::AO__hip_atomic_fetch_or:
+ case AtomicExpr::AO__opencl_atomic_fetch_or:
+ case AtomicExpr::AO__atomic_sub_fetch:
+ case AtomicExpr::AO__atomic_fetch_sub:
+ case AtomicExpr::AO__c11_atomic_fetch_sub:
+ case AtomicExpr::AO__hip_atomic_fetch_sub:
+ case AtomicExpr::AO__opencl_atomic_fetch_sub:
+ case AtomicExpr::AO__atomic_xor_fetch:
+ case AtomicExpr::AO__atomic_fetch_xor:
+ case AtomicExpr::AO__c11_atomic_fetch_xor:
+ case AtomicExpr::AO__hip_atomic_fetch_xor:
+ case AtomicExpr::AO__opencl_atomic_fetch_xor:
+ case AtomicExpr::AO__atomic_nand_fetch:
+ case AtomicExpr::AO__atomic_fetch_nand:
+ case AtomicExpr::AO__c11_atomic_fetch_nand:
+ case AtomicExpr::AO__atomic_min_fetch:
+ case AtomicExpr::AO__atomic_fetch_min:
+ case AtomicExpr::AO__c11_atomic_fetch_min:
+ case AtomicExpr::AO__hip_atomic_fetch_min:
+ case AtomicExpr::AO__opencl_atomic_fetch_min:
+ case AtomicExpr::AO__atomic_max_fetch:
+ case AtomicExpr::AO__atomic_fetch_max:
+ case AtomicExpr::AO__c11_atomic_fetch_max:
+ case AtomicExpr::AO__hip_atomic_fetch_max:
+ case AtomicExpr::AO__opencl_atomic_fetch_max:
+ return true;
+ case AtomicExpr::AO__c11_atomic_init:
+ case AtomicExpr::AO__opencl_atomic_init:
+ case AtomicExpr::AO__atomic_compare_exchange:
+ case AtomicExpr::AO__atomic_compare_exchange_n:
+ case AtomicExpr::AO__c11_atomic_compare_exchange_weak:
+ case AtomicExpr::AO__c11_atomic_compare_exchange_strong:
+ case AtomicExpr::AO__hip_atomic_compare_exchange_weak:
+ case AtomicExpr::AO__hip_atomic_compare_exchange_strong:
+ case AtomicExpr::AO__opencl_atomic_compare_exchange_weak:
+ case AtomicExpr::AO__opencl_atomic_compare_exchange_strong:
+ case AtomicExpr::AO__atomic_exchange:
+ case AtomicExpr::AO__atomic_exchange_n:
+ case AtomicExpr::AO__c11_atomic_exchange:
+ case AtomicExpr::AO__hip_atomic_exchange:
+ case AtomicExpr::AO__opencl_atomic_exchange:
+ case AtomicExpr::AO__atomic_store:
+ case AtomicExpr::AO__atomic_store_n:
+ case AtomicExpr::AO__c11_atomic_store:
+ case AtomicExpr::AO__hip_atomic_store:
+ case AtomicExpr::AO__opencl_atomic_store:
+ case AtomicExpr::AO__atomic_load:
+ case AtomicExpr::AO__atomic_load_n:
+ case AtomicExpr::AO__c11_atomic_load:
+ case AtomicExpr::AO__hip_atomic_load:
+ case AtomicExpr::AO__opencl_atomic_load:
+ return false;
}
+ llvm_unreachable("All atomic ops should be handled!");
}
RValue CodeGenFunction::EmitAtomicExpr(AtomicExpr *E) {
@@ -833,7 +882,8 @@ RValue CodeGenFunction::EmitAtomicExpr(AtomicExpr *E) {
bool Oversized = getContext().toBits(TInfo.Width) > MaxInlineWidthInBits;
bool Misaligned = (Ptr.getAlignment() % TInfo.Width) != 0;
- bool UseLibcall = Misaligned | Oversized;
+ bool PowerOf2Size = (Size & (Size - 1)) == 0;
+ bool UseLibcall = (!PowerOf2Size | Oversized) & !isArithmeticOp(E->getOp());
bool ShouldCastToIntPtrTy = true;
CharUnits MaxInlineWidth =
@@ -994,98 +1044,16 @@ RValue CodeGenFunction::EmitAtomicExpr(AtomicExpr *E) {
Dest = Atomics.castToAtomicIntPointer(Dest);
}
- // Use a library call. See: http://gcc.gnu.org/wiki/Atomic/GCCMM/LIbrary .
+ // Use a library call. See: http://gcc.gnu.org/wiki/Atomic/GCCMM/Library.
+ // Clang should never generate an optimized libcall -- it's better the backend
+ // handle it.
if (UseLibcall) {
- bool UseOptimizedLibcall = false;
- switch (E->getOp()) {
- case AtomicExpr::AO__c11_atomic_init:
- case AtomicExpr::AO__opencl_atomic_init:
- llvm_unreachable("Already handled above with EmitAtomicInit!");
-
- case AtomicExpr::AO__atomic_fetch_add:
- case AtomicExpr::AO__atomic_fetch_and:
- case AtomicExpr::AO__atomic_fetch_max:
- case AtomicExpr::AO__atomic_fetch_min:
- case AtomicExpr::AO__atomic_fetch_nand:
- case AtomicExpr::AO__atomic_fetch_or:
- case AtomicExpr::AO__atomic_fetch_sub:
- case AtomicExpr::AO__atomic_fetch_xor:
- case AtomicExpr::AO__atomic_add_fetch:
- case AtomicExpr::AO__atomic_and_fetch:
- case AtomicExpr::AO__atomic_max_fetch:
- case AtomicExpr::AO__atomic_min_fetch:
- case AtomicExpr::AO__atomic_nand_fetch:
- case AtomicExpr::AO__atomic_or_fetch:
- case AtomicExpr::AO__atomic_sub_fetch:
- case AtomicExpr::AO__atomic_xor_fetch:
- case AtomicExpr::AO__c11_atomic_fetch_add:
- case AtomicExpr::AO__c11_atomic_fetch_and:
- case AtomicExpr::AO__c11_atomic_fetch_max:
- case AtomicExpr::AO__c11_atomic_fetch_min:
- case AtomicExpr::AO__c11_atomic_fetch_nand:
- case AtomicExpr::AO__c11_atomic_fetch_or:
- case AtomicExpr::AO__c11_atomic_fetch_sub:
- case AtomicExpr::AO__c11_atomic_fetch_xor:
- case AtomicExpr::AO__hip_atomic_fetch_add:
- case AtomicExpr::AO__hip_atomic_fetch_and:
- case AtomicExpr::AO__hip_atomic_fetch_max:
- case AtomicExpr::AO__hip_atomic_fetch_min:
- case AtomicExpr::AO__hip_atomic_fetch_or:
- case AtomicExpr::AO__hip_atomic_fetch_sub:
- case AtomicExpr::AO__hip_atomic_fetch_xor:
- case AtomicExpr::AO__opencl_atomic_fetch_add:
- case AtomicExpr::AO__opencl_atomic_fetch_and:
- case AtomicExpr::AO__opencl_atomic_fetch_max:
- case AtomicExpr::AO__opencl_atomic_fetch_min:
- case AtomicExpr::AO__opencl_atomic_fetch_or:
- case AtomicExpr::AO__opencl_atomic_fetch_sub:
- case AtomicExpr::AO__opencl_atomic_fetch_xor:
- // For these, only library calls for certain sizes exist.
- UseOptimizedLibcall = true;
- break;
-
- case AtomicExpr::AO__atomic_load:
- case AtomicExpr::AO__atomic_store:
- case AtomicExpr::AO__atomic_exchange:
- case AtomicExpr::AO__atomic_compare_exchange:
- // Use the generic version if we don't know that the operand will be
- // suitably aligned for the optimized version.
- if (Misaligned)
- break;
- [[fallthrough]];
- case AtomicExpr::AO__atomic_load_n:
- case AtomicExpr::AO__atomic_store_n:
- case AtomicExpr::AO__atomic_exchange_n:
- case AtomicExpr::AO__atomic_compare_exchange_n:
- case AtomicExpr::AO__c11_atomic_load:
- case AtomicExpr::AO__c11_atomic_store:
- case AtomicExpr::AO__c11_atomic_exchange:
- case AtomicExpr::AO__c11_atomic_compare_exchange_weak:
- case AtomicExpr::AO__c11_atomic_compare_exchange_strong:
- case AtomicExpr::AO__hip_atomic_load:
- case AtomicExpr::AO__hip_atomic_store:
- case AtomicExpr::AO__hip_atomic_exchange:
- case AtomicExpr::AO__hip_atomic_compare_exchange_weak:
- case AtomicExpr::AO__hip_atomic_compare_exchange_strong:
- case AtomicExpr::AO__opencl_atomic_load:
- case AtomicExpr::AO__opencl_atomic_store:
- case AtomicExpr::AO__opencl_atomic_exchange:
- case AtomicExpr::AO__opencl_atomic_compare_exchange_weak:
- case AtomicExpr::AO__opencl_atomic_compare_exchange_strong:
- // Only use optimized library calls for sizes for which they exist.
- // FIXME: Size == 16 optimized library functions exist too.
- if (Size == 1 || Size == 2 || Size == 4 || Size == 8)
- UseOptimizedLibcall = true;
- break;
- }
-
CallArgList Args;
- if (!UseOptimizedLibcall) {
- // For non-optimized library calls, the size is the first parameter
- Args.add(RValue::get(llvm::ConstantInt::get(SizeTy, Size)),
- getContext().getSizeType());
- }
- // Atomic address is the first or second parameter
+ // For non-optimized library calls, the size is the first parameter.
+ Args.add(RValue::get(llvm::ConstantInt::get(SizeTy, Size)),
+ getContext().getSizeType());
+
+ // The atomic address is the second parameter.
// The OpenCL atomic library functions only accept pointer arguments to
// generic address space.
auto CastToGenericAddrSpace = [&](llvm::Value *V, QualType PT) {
@@ -1100,18 +1068,15 @@ RValue CodeGenFunction::EmitAtomicExpr(AtomicExpr *E) {
return getTargetHooks().performAddrSpaceCast(
*this, V, AS, LangAS::opencl_generic, DestType, false);
};
-
Args.add(RValue::get(CastToGenericAddrSpace(Ptr.getPointer(),
E->getPtr()->getType())),
getContext().VoidPtrTy);
+ // The next 1-3 parameters are op-dependent.
std::string LibCallName;
- QualType LoweredMemTy =
- MemTy->isPointerType() ? getContext().getIntPtrType() : MemTy;
QualType RetTy;
bool HaveRetTy = false;
- llvm::Instruction::BinaryOps PostOp = (llvm::Instruction::BinaryOps)0;
- bool PostOpMinMax = false;
+
switch (E->getOp()) {
case AtomicExpr::AO__c11_atomic_init:
case AtomicExpr::AO__opencl_atomic_init:
@@ -1122,8 +1087,6 @@ RValue CodeGenFunction::EmitAtomicExpr(AtomicExpr *E) {
// and exchange.
// bool __atomic_compare_exchange(size_t size, void *mem, void *expected,
// void *desired, int success, int failure)
- // bool __atomic_compare_exchange_N(T *mem, T *expected, T desired,
- // int success, int failure)
case AtomicExpr::AO__atomic_compare_exchange:
case AtomicExpr::AO__atomic_compare_exchange_n:
case AtomicExpr::AO__c11_atomic_compare_exchange_weak:
@@ -1138,25 +1101,25 @@ RValue CodeGenFunction::EmitAtomicExpr(AtomicExpr *E) {
Args.add(RValue::get(CastToGenericAddrSpace(Val1.getPointer(),
E->getVal1()->getType())),
getContext().VoidPtrTy);
- AddDirectArgument(*this, Args, UseOptimizedLibcall, Val2.getPointer(),
- MemTy, E->getExprLoc(), TInfo.Width);
+ Args.add(RValue::get(CastToGenericAddrSpace(Val2.getPointer(),
+ E->getVal2()->getType())),
+ getContext().VoidPtrTy);
Args.add(RValue::get(Order), getContext().IntTy);
Order = OrderFail;
break;
// void __atomic_exchange(size_t size, void *mem, void *val, void *return,
// int order)
- // T __atomic_exchange_N(T *mem, T val, int order)
case AtomicExpr::AO__atomic_exchange:
case AtomicExpr::AO__atomic_exchange_n:
case AtomicExpr::AO__c11_atomic_exchange:
case AtomicExpr::AO__hip_atomic_exchange:
case AtomicExpr::AO__opencl_atomic_exchange:
LibCallName = "__atomic_exchange";
- AddDirectArgument(*this, Args, UseOptimizedLibcall, Val1.getPointer(),
- MemTy, E->getExprLoc(), TInfo.Width);
+ Args.add(RValue::get(CastToGenericAddrSpace(Val1.getPointer(),
+ E->getVal1()->getType())),
+ getContext().VoidPtrTy);
break;
// void __atomic_store(size_t size, void *mem, void *val, int order)
- // void __atomic_store_N(T *mem, T val, int order)
case AtomicExpr::AO__atomic_store:
case AtomicExpr::AO__atomic_store_n:
case AtomicExpr::AO__c11_atomic_store:
@@ -1165,11 +1128,11 @@ RValue CodeGenFunction::EmitAtomicExpr(AtomicExpr *E) {
LibCallName = "__atomic_store";
RetTy = getContext().VoidTy;
HaveRetTy = true;
- AddDirectArgument(*this, Args, UseOptimizedLibcall, Val1.getPointer(),
- MemTy, E->getExprLoc(), TInfo.Width);
+ Args.add(RValue::get(CastToGenericAddrSpace(Val1.getPointer(),
+ E->getVal1()->getType())),
+ getContext().VoidPtrTy);
break;
// void __atomic_load(size_t size, void *mem, void *return, int order)
- // T __atomic_load_N(T *mem, int order)
case AtomicExpr::AO__atomic_load:
case AtomicExpr::AO__atomic_load_n:
case AtomicExpr::AO__c11_atomic_load:
@@ -1177,108 +1140,45 @@ RValue CodeGenFunction::EmitAtomicExpr(AtomicExpr *E) {
case AtomicExpr::AO__opencl_atomic_load:
LibCallName = "__atomic_load";
break;
- // T __atomic_add_fetch_N(T *mem, T val, int order)
- // T __atomic_fetch_add_N(T *mem, T val, int order)
case AtomicExpr::AO__atomic_add_fetch:
- PostOp = llvm::Instruction::Add;
- [[fallthrough]];
case AtomicExpr::AO__atomic_fetch_add:
case AtomicExpr::AO__c11_atomic_fetch_add:
case AtomicExpr::AO__hip_atomic_fetch_add:
case AtomicExpr::AO__opencl_atomic_fetch_add:
- LibCallName = "__atomic_fetch_add";
- AddDirectArgument(*this, Args, UseOptimizedLibcall, Val1.getPointer(),
- LoweredMemTy, E->getExprLoc(), TInfo.Width);
- break;
- // T __atomic_and_fetch_N(T *mem, T val, int order)
- // T __atomic_fetch_and_N(T *mem, T val, int order)
case AtomicExpr::AO__atomic_and_fetch:
- PostOp = llvm::Instruction::And;
- [[fallthrough]];
case AtomicExpr::AO__atomic_fetch_and:
case AtomicExpr::AO__c11_atomic_fetch_and:
case AtomicExpr::AO__hip_atomic_fetch_and:
case AtomicExpr::AO__opencl_atomic_fetch_and:
- LibCallName = "__atomic_fetch_and";
- AddDirectArgument(*this, Args, UseOptimizedLibcall, Val1.getPointer(),
- MemTy, E->getExprLoc(), TInfo.Width);
- break;
- // T __atomic_or_fetch_N(T *mem, T val, int order)
- // T __atomic_fetch_or_N(T *mem, T val, int order)
case AtomicExpr::AO__atomic_or_fetch:
- PostOp = llvm::Instruction::Or;
- [[fallthrough]];
case AtomicExpr::AO__atomic_fetch_or:
case AtomicExpr::AO__c11_atomic_fetch_or:
case AtomicExpr::AO__hip_atomic_fetch_or:
case AtomicExpr::AO__opencl_atomic_fetch_or:
- LibCallName = "__atomic_fetch_or";
- AddDirectArgument(*this, Args, UseOptimizedLibcall, Val1.getPointer(),
- MemTy, E->getExprLoc(), TInfo.Width);
- break;
- // T __atomic_sub_fetch_N(T *mem, T val, int order)
- // T __atomic_fetch_sub_N(T *mem, T val, int order)
case AtomicExpr::AO__atomic_sub_fetch:
- PostOp = llvm::Instruction::Sub;
- [[fallthrough]];
case AtomicExpr::AO__atomic_fetch_sub:
case AtomicExpr::AO__c11_atomic_fetch_sub:
case AtomicExpr::AO__hip_atomic_fetch_sub:
case AtomicExpr::AO__opencl_atomic_fetch_sub:
- LibCallName = "__atomic_fetch_sub";
- AddDirectArgument(*this, Args, UseOptimizedLibcall, Val1.getPointer(),
- LoweredMemTy, E->getExprLoc(), TInfo.Width);
- break;
- // T __atomic_xor_fetch_N(T *mem, T val, int order)
- // T __atomic_fetch_xor_N(T *mem, T val, int order)
case AtomicExpr::AO__atomic_xor_fetch:
- PostOp = llvm::Instruction::Xor;
- [[fallthrough]];
case AtomicExpr::AO__atomic_fetch_xor:
case AtomicExpr::AO__c11_atomic_fetch_xor:
case AtomicExpr::AO__hip_atomic_fetch_xor:
case AtomicExpr::AO__opencl_atomic_fetch_xor:
- LibCallName = "__atomic_fetch_xor";
- AddDirectArgument(*this, Args, UseOptimizedLibcall, Val1.getPointer(),
- MemTy, E->getExprLoc(), TInfo.Width);
- break;
+ case AtomicExpr::AO__atomic_nand_fetch:
+ case AtomicExpr::AO__atomic_fetch_nand:
+ case AtomicExpr::AO__c11_atomic_fetch_nand:
case AtomicExpr::AO__atomic_min_fetch:
- PostOpMinMax = true;
- [[fallthrough]];
case AtomicExpr::AO__atomic_fetch_min:
case AtomicExpr::AO__c11_atomic_fetch_min:
case AtomicExpr::AO__hip_atomic_fetch_min:
case AtomicExpr::AO__opencl_atomic_fetch_min:
- LibCallName = E->getValueType()->isSignedIntegerType()
- ? "__atomic_fetch_min"
- : "__atomic_fetch_umin";
- AddDirectArgument(*this, Args, UseOptimizedLibcall, Val1.getPointer(),
- LoweredMemTy, E->getExprLoc(), TInfo.Width);
- break;
case AtomicExpr::AO__atomic_max_fetch:
- PostOpMinMax = true;
- [[fallthrough]];
case AtomicExpr::AO__atomic_fetch_max:
case AtomicExpr::AO__c11_atomic_fetch_max:
case AtomicExpr::AO__hip_atomic_fetch_max:
case AtomicExpr::AO__opencl_atomic_fetch_max:
- LibCallName = E->getValueType()->isSignedIntegerType()
- ? "__atomic_fetch_max"
- : "__atomic_fetch_umax";
- AddDirectArgument(*this, Args, UseOptimizedLibcall, Val1.getPointer(),
- LoweredMemTy, E->getExprLoc(), TInfo.Width);
- break;
- // T __atomic_nand_fetch_N(T *mem, T val, int order)
- // T __atomic_fetch_nand_N(T *mem, T val, int order)
- case AtomicExpr::AO__atomic_nand_fetch:
- PostOp = llvm::Instruction::And; // the NOT is special cased below
- [[fallthrough]];
- case AtomicExpr::AO__atomic_fetch_nand:
- case AtomicExpr::AO__c11_atomic_fetch_nand:
- LibCallName = "__atomic_fetch_nand";
- AddDirectArgument(*this, Args, UseOptimizedLibcall, Val1.getPointer(),
- MemTy, E->getExprLoc(), TInfo.Width);
- break;
+ llvm_unreachable("Integral atomic operations always become atomicrmw!");
}
if (E->isOpenCL()) {
@@ -1286,57 +1186,24 @@ RValue CodeGenFunction::EmitAtomicExpr(AtomicExpr *E) {
StringRef(LibCallName).drop_front(1).str();
}
- // Optimized functions have the size in their name.
- if (UseOptimizedLibcall)
- LibCallName += "_" + llvm::utostr(Size);
// By default, assume we return a value of the atomic type.
if (!HaveRetTy) {
- if (UseOptimizedLibcall) {
- // Value is returned directly.
- // The function returns an appropriately sized integer type.
- RetTy = getContext().getIntTypeForBitwidth(
- getContext().toBits(TInfo.Width), /*Signed=*/false);
- } else {
- // Value is returned through parameter before the order.
- RetTy = getContext().VoidTy;
- Args.add(RValue::get(Dest.getPointer()), getContext().VoidPtrTy);
- }
+ // Value is returned through parameter before the order.
+ RetTy = getContext().VoidTy;
+ Args.add(RValue::get(CastToGenericAddrSpace(Dest.getPointer(), RetTy)),
+ getContext().VoidPtrTy);
}
- // order is always the last parameter
+ // Order is always the last parameter.
Args.add(RValue::get(Order),
getContext().IntTy);
if (E->isOpenCL())
Args.add(RValue::get(Scope), getContext().IntTy);
- // PostOp is only needed for the atomic_*_fetch...
[truncated]
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
d0f164d
to
dd6964d
Compare
When I first came across this code I found it quite surprising that both clang and LLVM need handling for atomic libcalls. I feel it would be a lot nicer if this could all be handled inside the LLVM backend logic. Would it also be possible to emit the unoptimized libcall from LLVM instead of Clang and always emit the atomic IR operation and let the backend deal with lowering or am I missing something? |
I believe Clang currently does not properly propagate alignment information for certain IR (e.g. atomicrmw and cmpxchg). Look at the definitions of |
As a historical note, when I first wrote the support for atomics, it was under a model where LLVM IR only modeled legal atomics, and clang lowered illegal atomics to libcalls. This was changed a few years later to generalize atomics in LLVM IR to support arbitrary atomic ops lowered to libcalls. But there are still a few inconsistencies resulting from this, I think. |
dd6964d
to
5d4b6ca
Compare
I've rewritten a part of this PR to incorporate @arichardson's suggestion. I wasn't able to find inconsistencies through a cursory review of the backend's AtomicExpand pass, nor through a small set of tests built for common targets. How can I find the gaps I missed? |
When I said "inconsistencies", I just meant the way the responsibility for lowering atomics is split between LLVM and clang; I didn't mean anything was actually broken. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall this looks great to me, very happy to see reduced duplication between clang and llvm.
It would be great if we could just emit the atomic IR in all cases but unfortunately non-power-of-two sizes result in a verifier error. We could probably emit the atomic IR for power-of-two sizes and let the expand pass deal with it but since we still need the fallback for other sizes we might as well keep it.
In the beginning, Clang only emitted atomic IR for operations it knew the underlying microarch had instructions for, meaning it required significant knowledge of the target. Later, the backend acquired the ability to lower IR to libcalls. To avoid duplicating logic and improve logic locality, we'd like to move as much as possible to the backend. There are many ways to describe this change. For example, this change reduces the variables Clang uses to decide whether to emit libcalls or IR, down to only the atomic's size.
Updated with suggestions. How does this look? |
There exist atomic IR unit tests and libatomic unit tests, but neither can test the atomicity and interoperability of atomic builtins and compiler-rt's atomic library. These tests aim to approximate behaviour encountered in user code. These tests have caught issues in Clang. See llvm/llvm-project#74349 and llvm/llvm-project#73176 for LLVM changes inspired by these tests.
There exist atomic IR unit tests and libatomic unit tests, but neither can test the atomicity and interoperability of atomic builtins and compiler-rt's atomic library. These tests aim to approximate behaviour encountered in user code. These tests have caught issues in Clang. See llvm/llvm-project#74349 and llvm/llvm-project#73176 for LLVM changes inspired by these tests.
BTW, please don't rebase/amend commits and force-push, it makes a it extremely difficult to review the changes since the previous review -- especially if the new changes are commingled with a rebase. Much better to simply push new commits on top of your existing branch. And if you need an updated baseline, you can |
There exist atomic IR unit tests and libatomic unit tests, but neither can test the atomicity and interoperability of atomic builtins and compiler-rt's atomic library. These tests aim to approximate behaviour encountered in user code. These tests have caught issues in Clang. See llvm/llvm-project#74349 and llvm/llvm-project#73176 for LLVM changes inspired by these tests.
Gentle bump on this. Is there anything I can do now to make this easier to review? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delay. This looks good to me now!
I'm pretty certain this patch broke the Solaris/sparcv9 buildbot. |
Confirmed: reverting the patch locally restores the build. |
https://reviews.llvm.org/D118021 likely related, looking into this. |
The libcalls per se wouldn't be a problem since on 32-bit Solaris/sparc |
Looks like a bug in the SPARC backend. #73176 (review) indicated that atomic expansion was working correctly for all backends, but I guess it isn't working correctly on SPARC. |
There exist atomic IR unit tests and libatomic unit tests, but neither can test the atomicity and interoperability of atomic builtins and compiler-rt's atomic library. These tests aim to approximate behaviour encountered in user code. These tests have caught issues in Clang. See llvm/llvm-project#74349 and llvm/llvm-project#73176 for LLVM changes inspired by these tests.
What config is this bot using? Is it "-target sparc-solaris -mcpu=v9" (that is: 32-bit sparc, but with v9 cpu available)? I see that SparcV9TargetInfo sets MaxAtomicInlineWidth to 64 unconditionally. But, the message Yet, even then, SparcV8TargetInfo appears to intend to configure MaxAtomicInlineWidth to 64 when using a V9 CPU. That doesn't seem to be actually happening, though. OK, so the frontend is apparently buggily configured to only support 32-bit atomics in 32-bit mode, even when a V9 CPU is available. That, then, was covering up a bug in the backend. The backend claims to (and should be able to!) support 64-bit atomics in 32-bit mode. But it doesn't actually, right now. I'll send a patch to fix this by limiting atomics to 32-bit in the backend when |
Part of that confusion comes from SPARC's own naming. V9 is the CPU, but a V9 CPU being used for 32-bit code is called V8+... |
When in 32-bit mode, the backend doesn't currently implement 64-bit atomics, even though the hardware is capable if you have specified a V9 CPU. Thus, limit the width to 32-bit, for now, leaving behind a TODO. This fixes a regression triggered by PR llvm#73176.
Sparc. (llvm#81655) When in 32-bit mode, the backend doesn't currently implement 64-bit atomics, even though the hardware is capable if you have specified a V9 CPU. Thus, limit the width to 32-bit, for now, leaving behind a TODO. This fixes a regression triggered by PR llvm#73176. (cherry picked from commit c1a99b2)
In the beginning, Clang only emitted atomic IR for operations it knew the
underlying microarch had instructions for, meaning it required significant
knowledge of the target. Later, the backend acquired the ability to lower
IR to libcalls. To avoid duplicating logic and improve logic locality,
we'd like to move as much as possible to the backend.
There are many ways to describe this change. For example, this change
reduces the variables Clang uses to decide whether to emit libcalls or
IR, down to only the atomic's size.