[clang][CodeGen] Emit atomic IR in place of optimized libcalls. #73176

Logikable · 2023-11-22T21:54:29Z

In the beginning, Clang only emitted atomic IR for operations it knew the
underlying microarch had instructions for, meaning it required significant
knowledge of the target. Later, the backend acquired the ability to lower
IR to libcalls. To avoid duplicating logic and improve logic locality,
we'd like to move as much as possible to the backend.

There are many ways to describe this change. For example, this change
reduces the variables Clang uses to decide whether to emit libcalls or
IR, down to only the atomic's size.

llvmbot · 2023-11-22T21:54:56Z

@llvm/pr-subscribers-clang-codegen
@llvm/pr-subscribers-backend-risc-v

@llvm/pr-subscribers-clang

Author: None (Logikable)

Changes

…inters.

Calling __atomic_fetch_op_n is undefined for misaligned pointers.

Since the backend can handle atomic IR on misaligned pointers, emit that instead. To keep things simple, we make this change for all misaligned operations, not just the integral ones.

There is an additional consequence of this change. Previously, libcalls were emitted for misaligned, misshapen (size != 2^n), and oversized objects. Since optimized libcalls only operate on 2^n sized members, removing the misaligned case means optimized libcalls will never be emitted, and all relevant codepaths can be cleaned up.

A simple correctness test is to have one thread perform an arithmetic operation on a misaligned integer, and another thread perform a non-arithmetic operation (e.g. xchg) on the same value. Such a test exhibits incorrect behaviour currently.

Patch is 62.44 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/73176.diff

11 Files Affected:

(modified) clang/lib/CodeGen/CGAtomic.cpp (+99-232)
(modified) clang/test/CodeGen/LoongArch/atomics.c (+3-3)
(modified) clang/test/CodeGen/PowerPC/quadword-atomics.c (+1-1)
(modified) clang/test/CodeGen/RISCV/riscv-atomics.c (+21-21)
(modified) clang/test/CodeGen/arm-atomics-m.c (+4-4)
(modified) clang/test/CodeGen/arm-atomics-m0.c (+8-8)
(modified) clang/test/CodeGen/atomic-ops-libcall.c (+73-46)
(modified) clang/test/CodeGen/atomic-ops.c (+15-10)
(modified) clang/test/CodeGen/atomics-inlining.c (+14-14)
(modified) clang/test/CodeGen/c11atomics.c (+6-12)
(modified) clang/test/CodeGenOpenCL/atomic-ops-libcall.cl (+27-27)

diff --git a/clang/lib/CodeGen/CGAtomic.cpp b/clang/lib/CodeGen/CGAtomic.cpp
index 6005d5c51c0e1ac..bc432afb6ee288b 100644
--- a/clang/lib/CodeGen/CGAtomic.cpp
+++ b/clang/lib/CodeGen/CGAtomic.cpp
@@ -785,27 +785,76 @@ static void EmitAtomicOp(CodeGenFunction &CGF, AtomicExpr *Expr, Address Dest,
   Builder.SetInsertPoint(ContBB);
 }
 
-static void
-AddDirectArgument(CodeGenFunction &CGF, CallArgList &Args,
-                  bool UseOptimizedLibcall, llvm::Value *Val, QualType ValTy,
-                  SourceLocation Loc, CharUnits SizeInChars) {
-  if (UseOptimizedLibcall) {
-    // Load value and pass it to the function directly.
-    CharUnits Align = CGF.getContext().getTypeAlignInChars(ValTy);
-    int64_t SizeInBits = CGF.getContext().toBits(SizeInChars);
-    ValTy =
-        CGF.getContext().getIntTypeForBitwidth(SizeInBits, /*Signed=*/false);
-    llvm::Type *ITy = llvm::IntegerType::get(CGF.getLLVMContext(), SizeInBits);
-    Address Ptr = Address(Val, ITy, Align);
-    Val = CGF.EmitLoadOfScalar(Ptr, false,
-                               CGF.getContext().getPointerType(ValTy),
-                               Loc);
-    // Coerce the value into an appropriately sized integer type.
-    Args.add(RValue::get(Val), ValTy);
-  } else {
-    // Non-optimized functions always take a reference.
-    Args.add(RValue::get(Val), CGF.getContext().VoidPtrTy);
+static bool
+isArithmeticOp(AtomicExpr::AtomicOp op) {
+  switch (op) {
+    case AtomicExpr::AO__atomic_add_fetch:
+    case AtomicExpr::AO__atomic_fetch_add:
+    case AtomicExpr::AO__c11_atomic_fetch_add:
+    case AtomicExpr::AO__hip_atomic_fetch_add:
+    case AtomicExpr::AO__opencl_atomic_fetch_add:
+    case AtomicExpr::AO__atomic_and_fetch:
+    case AtomicExpr::AO__atomic_fetch_and:
+    case AtomicExpr::AO__c11_atomic_fetch_and:
+    case AtomicExpr::AO__hip_atomic_fetch_and:
+    case AtomicExpr::AO__opencl_atomic_fetch_and:
+    case AtomicExpr::AO__atomic_or_fetch:
+    case AtomicExpr::AO__atomic_fetch_or:
+    case AtomicExpr::AO__c11_atomic_fetch_or:
+    case AtomicExpr::AO__hip_atomic_fetch_or:
+    case AtomicExpr::AO__opencl_atomic_fetch_or:
+    case AtomicExpr::AO__atomic_sub_fetch:
+    case AtomicExpr::AO__atomic_fetch_sub:
+    case AtomicExpr::AO__c11_atomic_fetch_sub:
+    case AtomicExpr::AO__hip_atomic_fetch_sub:
+    case AtomicExpr::AO__opencl_atomic_fetch_sub:
+    case AtomicExpr::AO__atomic_xor_fetch:
+    case AtomicExpr::AO__atomic_fetch_xor:
+    case AtomicExpr::AO__c11_atomic_fetch_xor:
+    case AtomicExpr::AO__hip_atomic_fetch_xor:
+    case AtomicExpr::AO__opencl_atomic_fetch_xor:
+    case AtomicExpr::AO__atomic_nand_fetch:
+    case AtomicExpr::AO__atomic_fetch_nand:
+    case AtomicExpr::AO__c11_atomic_fetch_nand:
+    case AtomicExpr::AO__atomic_min_fetch:
+    case AtomicExpr::AO__atomic_fetch_min:
+    case AtomicExpr::AO__c11_atomic_fetch_min:
+    case AtomicExpr::AO__hip_atomic_fetch_min:
+    case AtomicExpr::AO__opencl_atomic_fetch_min:
+    case AtomicExpr::AO__atomic_max_fetch:
+    case AtomicExpr::AO__atomic_fetch_max:
+    case AtomicExpr::AO__c11_atomic_fetch_max:
+    case AtomicExpr::AO__hip_atomic_fetch_max:
+    case AtomicExpr::AO__opencl_atomic_fetch_max:
+      return true;
+    case AtomicExpr::AO__c11_atomic_init:
+    case AtomicExpr::AO__opencl_atomic_init:
+    case AtomicExpr::AO__atomic_compare_exchange:
+    case AtomicExpr::AO__atomic_compare_exchange_n:
+    case AtomicExpr::AO__c11_atomic_compare_exchange_weak:
+    case AtomicExpr::AO__c11_atomic_compare_exchange_strong:
+    case AtomicExpr::AO__hip_atomic_compare_exchange_weak:
+    case AtomicExpr::AO__hip_atomic_compare_exchange_strong:
+    case AtomicExpr::AO__opencl_atomic_compare_exchange_weak:
+    case AtomicExpr::AO__opencl_atomic_compare_exchange_strong:
+    case AtomicExpr::AO__atomic_exchange:
+    case AtomicExpr::AO__atomic_exchange_n:
+    case AtomicExpr::AO__c11_atomic_exchange:
+    case AtomicExpr::AO__hip_atomic_exchange:
+    case AtomicExpr::AO__opencl_atomic_exchange:
+    case AtomicExpr::AO__atomic_store:
+    case AtomicExpr::AO__atomic_store_n:
+    case AtomicExpr::AO__c11_atomic_store:
+    case AtomicExpr::AO__hip_atomic_store:
+    case AtomicExpr::AO__opencl_atomic_store:
+    case AtomicExpr::AO__atomic_load:
+    case AtomicExpr::AO__atomic_load_n:
+    case AtomicExpr::AO__c11_atomic_load:
+    case AtomicExpr::AO__hip_atomic_load:
+    case AtomicExpr::AO__opencl_atomic_load:
+      return false;
   }
+  llvm_unreachable("All atomic ops should be handled!");
 }
 
 RValue CodeGenFunction::EmitAtomicExpr(AtomicExpr *E) {
@@ -833,7 +882,8 @@ RValue CodeGenFunction::EmitAtomicExpr(AtomicExpr *E) {
 
   bool Oversized = getContext().toBits(TInfo.Width) > MaxInlineWidthInBits;
   bool Misaligned = (Ptr.getAlignment() % TInfo.Width) != 0;
-  bool UseLibcall = Misaligned | Oversized;
+  bool PowerOf2Size = (Size & (Size - 1)) == 0;
+  bool UseLibcall = (!PowerOf2Size | Oversized) & !isArithmeticOp(E->getOp());
   bool ShouldCastToIntPtrTy = true;
 
   CharUnits MaxInlineWidth =
@@ -994,98 +1044,16 @@ RValue CodeGenFunction::EmitAtomicExpr(AtomicExpr *E) {
       Dest = Atomics.castToAtomicIntPointer(Dest);
   }
 
-  // Use a library call.  See: http://gcc.gnu.org/wiki/Atomic/GCCMM/LIbrary .
+  // Use a library call.  See: http://gcc.gnu.org/wiki/Atomic/GCCMM/Library.
+  // Clang should never generate an optimized libcall -- it's better the backend
+  // handle it.
   if (UseLibcall) {
-    bool UseOptimizedLibcall = false;
-    switch (E->getOp()) {
-    case AtomicExpr::AO__c11_atomic_init:
-    case AtomicExpr::AO__opencl_atomic_init:
-      llvm_unreachable("Already handled above with EmitAtomicInit!");
-
-    case AtomicExpr::AO__atomic_fetch_add:
-    case AtomicExpr::AO__atomic_fetch_and:
-    case AtomicExpr::AO__atomic_fetch_max:
-    case AtomicExpr::AO__atomic_fetch_min:
-    case AtomicExpr::AO__atomic_fetch_nand:
-    case AtomicExpr::AO__atomic_fetch_or:
-    case AtomicExpr::AO__atomic_fetch_sub:
-    case AtomicExpr::AO__atomic_fetch_xor:
-    case AtomicExpr::AO__atomic_add_fetch:
-    case AtomicExpr::AO__atomic_and_fetch:
-    case AtomicExpr::AO__atomic_max_fetch:
-    case AtomicExpr::AO__atomic_min_fetch:
-    case AtomicExpr::AO__atomic_nand_fetch:
-    case AtomicExpr::AO__atomic_or_fetch:
-    case AtomicExpr::AO__atomic_sub_fetch:
-    case AtomicExpr::AO__atomic_xor_fetch:
-    case AtomicExpr::AO__c11_atomic_fetch_add:
-    case AtomicExpr::AO__c11_atomic_fetch_and:
-    case AtomicExpr::AO__c11_atomic_fetch_max:
-    case AtomicExpr::AO__c11_atomic_fetch_min:
-    case AtomicExpr::AO__c11_atomic_fetch_nand:
-    case AtomicExpr::AO__c11_atomic_fetch_or:
-    case AtomicExpr::AO__c11_atomic_fetch_sub:
-    case AtomicExpr::AO__c11_atomic_fetch_xor:
-    case AtomicExpr::AO__hip_atomic_fetch_add:
-    case AtomicExpr::AO__hip_atomic_fetch_and:
-    case AtomicExpr::AO__hip_atomic_fetch_max:
-    case AtomicExpr::AO__hip_atomic_fetch_min:
-    case AtomicExpr::AO__hip_atomic_fetch_or:
-    case AtomicExpr::AO__hip_atomic_fetch_sub:
-    case AtomicExpr::AO__hip_atomic_fetch_xor:
-    case AtomicExpr::AO__opencl_atomic_fetch_add:
-    case AtomicExpr::AO__opencl_atomic_fetch_and:
-    case AtomicExpr::AO__opencl_atomic_fetch_max:
-    case AtomicExpr::AO__opencl_atomic_fetch_min:
-    case AtomicExpr::AO__opencl_atomic_fetch_or:
-    case AtomicExpr::AO__opencl_atomic_fetch_sub:
-    case AtomicExpr::AO__opencl_atomic_fetch_xor:
-      // For these, only library calls for certain sizes exist.
-      UseOptimizedLibcall = true;
-      break;
-
-    case AtomicExpr::AO__atomic_load:
-    case AtomicExpr::AO__atomic_store:
-    case AtomicExpr::AO__atomic_exchange:
-    case AtomicExpr::AO__atomic_compare_exchange:
-      // Use the generic version if we don't know that the operand will be
-      // suitably aligned for the optimized version.
-      if (Misaligned)
-        break;
-      [[fallthrough]];
-    case AtomicExpr::AO__atomic_load_n:
-    case AtomicExpr::AO__atomic_store_n:
-    case AtomicExpr::AO__atomic_exchange_n:
-    case AtomicExpr::AO__atomic_compare_exchange_n:
-    case AtomicExpr::AO__c11_atomic_load:
-    case AtomicExpr::AO__c11_atomic_store:
-    case AtomicExpr::AO__c11_atomic_exchange:
-    case AtomicExpr::AO__c11_atomic_compare_exchange_weak:
-    case AtomicExpr::AO__c11_atomic_compare_exchange_strong:
-    case AtomicExpr::AO__hip_atomic_load:
-    case AtomicExpr::AO__hip_atomic_store:
-    case AtomicExpr::AO__hip_atomic_exchange:
-    case AtomicExpr::AO__hip_atomic_compare_exchange_weak:
-    case AtomicExpr::AO__hip_atomic_compare_exchange_strong:
-    case AtomicExpr::AO__opencl_atomic_load:
-    case AtomicExpr::AO__opencl_atomic_store:
-    case AtomicExpr::AO__opencl_atomic_exchange:
-    case AtomicExpr::AO__opencl_atomic_compare_exchange_weak:
-    case AtomicExpr::AO__opencl_atomic_compare_exchange_strong:
-      // Only use optimized library calls for sizes for which they exist.
-      // FIXME: Size == 16 optimized library functions exist too.
-      if (Size == 1 || Size == 2 || Size == 4 || Size == 8)
-        UseOptimizedLibcall = true;
-      break;
-    }
-
     CallArgList Args;
-    if (!UseOptimizedLibcall) {
-      // For non-optimized library calls, the size is the first parameter
-      Args.add(RValue::get(llvm::ConstantInt::get(SizeTy, Size)),
-               getContext().getSizeType());
-    }
-    // Atomic address is the first or second parameter
+    // For non-optimized library calls, the size is the first parameter.
+    Args.add(RValue::get(llvm::ConstantInt::get(SizeTy, Size)),
+              getContext().getSizeType());
+
+    // The atomic address is the second parameter.
     // The OpenCL atomic library functions only accept pointer arguments to
     // generic address space.
     auto CastToGenericAddrSpace = [&](llvm::Value *V, QualType PT) {
@@ -1100,18 +1068,15 @@ RValue CodeGenFunction::EmitAtomicExpr(AtomicExpr *E) {
       return getTargetHooks().performAddrSpaceCast(
           *this, V, AS, LangAS::opencl_generic, DestType, false);
     };
-
     Args.add(RValue::get(CastToGenericAddrSpace(Ptr.getPointer(),
                                                 E->getPtr()->getType())),
              getContext().VoidPtrTy);
 
+    // The next 1-3 parameters are op-dependent.
     std::string LibCallName;
-    QualType LoweredMemTy =
-      MemTy->isPointerType() ? getContext().getIntPtrType() : MemTy;
     QualType RetTy;
     bool HaveRetTy = false;
-    llvm::Instruction::BinaryOps PostOp = (llvm::Instruction::BinaryOps)0;
-    bool PostOpMinMax = false;
+
     switch (E->getOp()) {
     case AtomicExpr::AO__c11_atomic_init:
     case AtomicExpr::AO__opencl_atomic_init:
@@ -1122,8 +1087,6 @@ RValue CodeGenFunction::EmitAtomicExpr(AtomicExpr *E) {
     // and exchange.
     // bool __atomic_compare_exchange(size_t size, void *mem, void *expected,
     //                                void *desired, int success, int failure)
-    // bool __atomic_compare_exchange_N(T *mem, T *expected, T desired,
-    //                                  int success, int failure)
     case AtomicExpr::AO__atomic_compare_exchange:
     case AtomicExpr::AO__atomic_compare_exchange_n:
     case AtomicExpr::AO__c11_atomic_compare_exchange_weak:
@@ -1138,25 +1101,25 @@ RValue CodeGenFunction::EmitAtomicExpr(AtomicExpr *E) {
       Args.add(RValue::get(CastToGenericAddrSpace(Val1.getPointer(),
                                                   E->getVal1()->getType())),
                getContext().VoidPtrTy);
-      AddDirectArgument(*this, Args, UseOptimizedLibcall, Val2.getPointer(),
-                        MemTy, E->getExprLoc(), TInfo.Width);
+      Args.add(RValue::get(CastToGenericAddrSpace(Val2.getPointer(),
+                                                  E->getVal2()->getType())),
+               getContext().VoidPtrTy);
       Args.add(RValue::get(Order), getContext().IntTy);
       Order = OrderFail;
       break;
     // void __atomic_exchange(size_t size, void *mem, void *val, void *return,
     //                        int order)
-    // T __atomic_exchange_N(T *mem, T val, int order)
     case AtomicExpr::AO__atomic_exchange:
     case AtomicExpr::AO__atomic_exchange_n:
     case AtomicExpr::AO__c11_atomic_exchange:
     case AtomicExpr::AO__hip_atomic_exchange:
     case AtomicExpr::AO__opencl_atomic_exchange:
       LibCallName = "__atomic_exchange";
-      AddDirectArgument(*this, Args, UseOptimizedLibcall, Val1.getPointer(),
-                        MemTy, E->getExprLoc(), TInfo.Width);
+      Args.add(RValue::get(CastToGenericAddrSpace(Val1.getPointer(),
+                                                  E->getVal1()->getType())),
+               getContext().VoidPtrTy);
       break;
     // void __atomic_store(size_t size, void *mem, void *val, int order)
-    // void __atomic_store_N(T *mem, T val, int order)
     case AtomicExpr::AO__atomic_store:
     case AtomicExpr::AO__atomic_store_n:
     case AtomicExpr::AO__c11_atomic_store:
@@ -1165,11 +1128,11 @@ RValue CodeGenFunction::EmitAtomicExpr(AtomicExpr *E) {
       LibCallName = "__atomic_store";
       RetTy = getContext().VoidTy;
       HaveRetTy = true;
-      AddDirectArgument(*this, Args, UseOptimizedLibcall, Val1.getPointer(),
-                        MemTy, E->getExprLoc(), TInfo.Width);
+      Args.add(RValue::get(CastToGenericAddrSpace(Val1.getPointer(),
+                                                  E->getVal1()->getType())),
+               getContext().VoidPtrTy);
       break;
     // void __atomic_load(size_t size, void *mem, void *return, int order)
-    // T __atomic_load_N(T *mem, int order)
     case AtomicExpr::AO__atomic_load:
     case AtomicExpr::AO__atomic_load_n:
     case AtomicExpr::AO__c11_atomic_load:
@@ -1177,108 +1140,45 @@ RValue CodeGenFunction::EmitAtomicExpr(AtomicExpr *E) {
     case AtomicExpr::AO__opencl_atomic_load:
       LibCallName = "__atomic_load";
       break;
-    // T __atomic_add_fetch_N(T *mem, T val, int order)
-    // T __atomic_fetch_add_N(T *mem, T val, int order)
     case AtomicExpr::AO__atomic_add_fetch:
-      PostOp = llvm::Instruction::Add;
-      [[fallthrough]];
     case AtomicExpr::AO__atomic_fetch_add:
     case AtomicExpr::AO__c11_atomic_fetch_add:
     case AtomicExpr::AO__hip_atomic_fetch_add:
     case AtomicExpr::AO__opencl_atomic_fetch_add:
-      LibCallName = "__atomic_fetch_add";
-      AddDirectArgument(*this, Args, UseOptimizedLibcall, Val1.getPointer(),
-                        LoweredMemTy, E->getExprLoc(), TInfo.Width);
-      break;
-    // T __atomic_and_fetch_N(T *mem, T val, int order)
-    // T __atomic_fetch_and_N(T *mem, T val, int order)
     case AtomicExpr::AO__atomic_and_fetch:
-      PostOp = llvm::Instruction::And;
-      [[fallthrough]];
     case AtomicExpr::AO__atomic_fetch_and:
     case AtomicExpr::AO__c11_atomic_fetch_and:
     case AtomicExpr::AO__hip_atomic_fetch_and:
     case AtomicExpr::AO__opencl_atomic_fetch_and:
-      LibCallName = "__atomic_fetch_and";
-      AddDirectArgument(*this, Args, UseOptimizedLibcall, Val1.getPointer(),
-                        MemTy, E->getExprLoc(), TInfo.Width);
-      break;
-    // T __atomic_or_fetch_N(T *mem, T val, int order)
-    // T __atomic_fetch_or_N(T *mem, T val, int order)
     case AtomicExpr::AO__atomic_or_fetch:
-      PostOp = llvm::Instruction::Or;
-      [[fallthrough]];
     case AtomicExpr::AO__atomic_fetch_or:
     case AtomicExpr::AO__c11_atomic_fetch_or:
     case AtomicExpr::AO__hip_atomic_fetch_or:
     case AtomicExpr::AO__opencl_atomic_fetch_or:
-      LibCallName = "__atomic_fetch_or";
-      AddDirectArgument(*this, Args, UseOptimizedLibcall, Val1.getPointer(),
-                        MemTy, E->getExprLoc(), TInfo.Width);
-      break;
-    // T __atomic_sub_fetch_N(T *mem, T val, int order)
-    // T __atomic_fetch_sub_N(T *mem, T val, int order)
     case AtomicExpr::AO__atomic_sub_fetch:
-      PostOp = llvm::Instruction::Sub;
-      [[fallthrough]];
     case AtomicExpr::AO__atomic_fetch_sub:
     case AtomicExpr::AO__c11_atomic_fetch_sub:
     case AtomicExpr::AO__hip_atomic_fetch_sub:
     case AtomicExpr::AO__opencl_atomic_fetch_sub:
-      LibCallName = "__atomic_fetch_sub";
-      AddDirectArgument(*this, Args, UseOptimizedLibcall, Val1.getPointer(),
-                        LoweredMemTy, E->getExprLoc(), TInfo.Width);
-      break;
-    // T __atomic_xor_fetch_N(T *mem, T val, int order)
-    // T __atomic_fetch_xor_N(T *mem, T val, int order)
     case AtomicExpr::AO__atomic_xor_fetch:
-      PostOp = llvm::Instruction::Xor;
-      [[fallthrough]];
     case AtomicExpr::AO__atomic_fetch_xor:
     case AtomicExpr::AO__c11_atomic_fetch_xor:
     case AtomicExpr::AO__hip_atomic_fetch_xor:
     case AtomicExpr::AO__opencl_atomic_fetch_xor:
-      LibCallName = "__atomic_fetch_xor";
-      AddDirectArgument(*this, Args, UseOptimizedLibcall, Val1.getPointer(),
-                        MemTy, E->getExprLoc(), TInfo.Width);
-      break;
+    case AtomicExpr::AO__atomic_nand_fetch:
+    case AtomicExpr::AO__atomic_fetch_nand:
+    case AtomicExpr::AO__c11_atomic_fetch_nand:
     case AtomicExpr::AO__atomic_min_fetch:
-      PostOpMinMax = true;
-      [[fallthrough]];
     case AtomicExpr::AO__atomic_fetch_min:
     case AtomicExpr::AO__c11_atomic_fetch_min:
     case AtomicExpr::AO__hip_atomic_fetch_min:
     case AtomicExpr::AO__opencl_atomic_fetch_min:
-      LibCallName = E->getValueType()->isSignedIntegerType()
-                        ? "__atomic_fetch_min"
-                        : "__atomic_fetch_umin";
-      AddDirectArgument(*this, Args, UseOptimizedLibcall, Val1.getPointer(),
-                        LoweredMemTy, E->getExprLoc(), TInfo.Width);
-      break;
     case AtomicExpr::AO__atomic_max_fetch:
-      PostOpMinMax = true;
-      [[fallthrough]];
     case AtomicExpr::AO__atomic_fetch_max:
     case AtomicExpr::AO__c11_atomic_fetch_max:
     case AtomicExpr::AO__hip_atomic_fetch_max:
     case AtomicExpr::AO__opencl_atomic_fetch_max:
-      LibCallName = E->getValueType()->isSignedIntegerType()
-                        ? "__atomic_fetch_max"
-                        : "__atomic_fetch_umax";
-      AddDirectArgument(*this, Args, UseOptimizedLibcall, Val1.getPointer(),
-                        LoweredMemTy, E->getExprLoc(), TInfo.Width);
-      break;
-    // T __atomic_nand_fetch_N(T *mem, T val, int order)
-    // T __atomic_fetch_nand_N(T *mem, T val, int order)
-    case AtomicExpr::AO__atomic_nand_fetch:
-      PostOp = llvm::Instruction::And; // the NOT is special cased below
-      [[fallthrough]];
-    case AtomicExpr::AO__atomic_fetch_nand:
-    case AtomicExpr::AO__c11_atomic_fetch_nand:
-      LibCallName = "__atomic_fetch_nand";
-      AddDirectArgument(*this, Args, UseOptimizedLibcall, Val1.getPointer(),
-                        MemTy, E->getExprLoc(), TInfo.Width);
-      break;
+      llvm_unreachable("Integral atomic operations always become atomicrmw!");
     }
 
     if (E->isOpenCL()) {
@@ -1286,57 +1186,24 @@ RValue CodeGenFunction::EmitAtomicExpr(AtomicExpr *E) {
           StringRef(LibCallName).drop_front(1).str();
 
     }
-    // Optimized functions have the size in their name.
-    if (UseOptimizedLibcall)
-      LibCallName += "_" + llvm::utostr(Size);
     // By default, assume we return a value of the atomic type.
     if (!HaveRetTy) {
-      if (UseOptimizedLibcall) {
-        // Value is returned directly.
-        // The function returns an appropriately sized integer type.
-        RetTy = getContext().getIntTypeForBitwidth(
-            getContext().toBits(TInfo.Width), /*Signed=*/false);
-      } else {
-        // Value is returned through parameter before the order.
-        RetTy = getContext().VoidTy;
-        Args.add(RValue::get(Dest.getPointer()), getContext().VoidPtrTy);
-      }
+      // Value is returned through parameter before the order.
+      RetTy = getContext().VoidTy;
+      Args.add(RValue::get(CastToGenericAddrSpace(Dest.getPointer(), RetTy)),
+               getContext().VoidPtrTy);
     }
-    // order is always the last parameter
+    // Order is always the last parameter.
     Args.add(RValue::get(Order),
              getContext().IntTy);
     if (E->isOpenCL())
       Args.add(RValue::get(Scope), getContext().IntTy);
 
-    // PostOp is only needed for the atomic_*_fetch...
[truncated]

github-actions · 2023-11-22T21:56:56Z

✅ With the latest revision this PR passed the C/C++ code formatter.

arichardson · 2023-11-27T21:02:20Z

When I first came across this code I found it quite surprising that both clang and LLVM need handling for atomic libcalls. I feel it would be a lot nicer if this could all be handled inside the LLVM backend logic. Would it also be possible to emit the unoptimized libcall from LLVM instead of Clang and always emit the atomic IR operation and let the backend deal with lowering or am I missing something?

Logikable · 2023-11-28T19:27:39Z

I believe Clang currently does not properly propagate alignment information for certain IR (e.g. atomicrmw and cmpxchg). Look at the definitions of CreateAtomic* in Clang. The patches in https://bugs.llvm.org/show_bug.cgi?id=27168 did not address this (thanks @MaskRay for helping me find this).

efriedma-quic · 2023-11-29T22:12:09Z

As a historical note, when I first wrote the support for atomics, it was under a model where LLVM IR only modeled legal atomics, and clang lowered illegal atomics to libcalls. This was changed a few years later to generalize atomics in LLVM IR to support arbitrary atomic ops lowered to libcalls. But there are still a few inconsistencies resulting from this, I think.

Logikable · 2023-12-21T19:09:20Z

I've rewritten a part of this PR to incorporate @arichardson's suggestion. I wasn't able to find inconsistencies through a cursory review of the backend's AtomicExpand pass, nor through a small set of tests built for common targets.

How can I find the gaps I missed?

efriedma-quic · 2023-12-21T21:18:58Z

When I said "inconsistencies", I just meant the way the responsibility for lowering atomics is split between LLVM and clang; I didn't mean anything was actually broken.

jyknight

Overall, I think this is a great improvement.

And I believe it should be correct to do now, after fixing the backends' MaxAtomicSizeInBitsSupported in #75703 #75185 #75112 #74385 #74389, and fixing Clang's propagation of alignment to atomicrmw/cmpxchg in #74349.

Just a few more comments.

clang/lib/CodeGen/CGAtomic.cpp

clang/test/CodeGen/LoongArch/atomics.c

clang/test/CodeGen/PowerPC/quadword-atomics.c

clang/test/CodeGen/RISCV/riscv-atomics.c

clang/test/CodeGen/atomic-ops-libcall.c

clang/test/CodeGen/LoongArch/atomics.c

arichardson

Overall this looks great to me, very happy to see reduced duplication between clang and llvm.

It would be great if we could just emit the atomic IR in all cases but unfortunately non-power-of-two sizes result in a verifier error. We could probably emit the atomic IR for power-of-two sizes and let the expand pass deal with it but since we still need the fallback for other sizes we might as well keep it.

In the beginning, Clang only emitted atomic IR for operations it knew the underlying microarch had instructions for, meaning it required significant knowledge of the target. Later, the backend acquired the ability to lower IR to libcalls. To avoid duplicating logic and improve logic locality, we'd like to move as much as possible to the backend. There are many ways to describe this change. For example, this change reduces the variables Clang uses to decide whether to emit libcalls or IR, down to only the atomic's size.

Logikable · 2024-01-19T17:41:33Z

Updated with suggestions. How does this look?

There exist atomic IR unit tests and libatomic unit tests, but neither can test the atomicity and interoperability of atomic builtins and compiler-rt's atomic library. These tests aim to approximate behaviour encountered in user code. These tests have caught issues in Clang. See llvm/llvm-project#74349 and llvm/llvm-project#73176 for LLVM changes inspired by these tests.

jyknight · 2024-01-20T04:03:16Z

BTW, please don't rebase/amend commits and force-push, it makes a it extremely difficult to review the changes since the previous review -- especially if the new changes are commingled with a rebase.

Much better to simply push new commits on top of your existing branch. And if you need an updated baseline, you can git merge origin/main in your PR branch.

There exist atomic IR unit tests and libatomic unit tests, but neither can test the atomicity and interoperability of atomic builtins and compiler-rt's atomic library. These tests aim to approximate behaviour encountered in user code. These tests have caught issues in Clang. See llvm/llvm-project#74349 and llvm/llvm-project#73176 for LLVM changes inspired by these tests.

Logikable · 2024-02-06T17:17:24Z

Gentle bump on this. Is there anything I can do now to make this easier to review?

jyknight

Sorry for the delay. This looks good to me now!

rorth · 2024-02-13T08:06:01Z

I'm pretty certain this patch broke the Solaris/sparcv9 buildbot.

rorth · 2024-02-13T14:44:42Z

Confirmed: reverting the patch locally restores the build.

Logikable · 2024-02-13T17:44:57Z

https://reviews.llvm.org/D118021 likely related, looking into this.

rorth · 2024-02-13T17:52:31Z

The libcalls per se wouldn't be a problem since on 32-bit Solaris/sparc -latomic is always linked due to that. However, the __sync_* libcalls aren't defined in libatomic.so, thus the link failure.

efriedma-quic · 2024-02-13T18:11:15Z

Looks like a bug in the SPARC backend. #73176 (review) indicated that atomic expansion was working correctly for all backends, but I guess it isn't working correctly on SPARC.

There exist atomic IR unit tests and libatomic unit tests, but neither can test the atomicity and interoperability of atomic builtins and compiler-rt's atomic library. These tests aim to approximate behaviour encountered in user code. These tests have caught issues in Clang. See llvm/llvm-project#74349 and llvm/llvm-project#73176 for LLVM changes inspired by these tests.

jyknight · 2024-02-13T18:45:54Z

What config is this bot using? Is it "-target sparc-solaris -mcpu=v9" (that is: 32-bit sparc, but with v9 cpu available)?

I see that SparcV9TargetInfo sets MaxAtomicInlineWidth to 64 unconditionally. But, the message warning: large atomic operation may incur significant performance penalty; the access size (8 bytes) exceeds the max lock-free size (4 bytes) [-Watomic-alignment] indicates that MaxAtomicInlineWidth is 32. OK, wait, this class is poorly-named. SparcV8TargetInfo really means "sparc 32-bit" and SparcV9TargetInfo really means "sparc 64-bit". So: this is actually using SparcV8TargetInfo, instead.

Yet, even then, SparcV8TargetInfo appears to intend to configure MaxAtomicInlineWidth to 64 when using a V9 CPU. That doesn't seem to be actually happening, though.

OK, so the frontend is apparently buggily configured to only support 32-bit atomics in 32-bit mode, even when a V9 CPU is available.

That, then, was covering up a bug in the backend. The backend claims to (and should be able to!) support 64-bit atomics in 32-bit mode. But it doesn't actually, right now.

I'll send a patch to fix this by limiting atomics to 32-bit in the backend when !Subtarget->is64Bit(), which will get things back to working, though it doesn't address those deeper issues.

jrtc27 · 2024-02-13T18:53:11Z

Part of that confusion comes from SPARC's own naming. V9 is the CPU, but a V9 CPU being used for 32-bit code is called V8+...

When in 32-bit mode, the backend doesn't currently implement 64-bit atomics, even though the hardware is capable if you have specified a V9 CPU. Thus, limit the width to 32-bit, for now, leaving behind a TODO. This fixes a regression triggered by PR llvm#73176.

…81655) When in 32-bit mode, the backend doesn't currently implement 64-bit atomics, even though the hardware is capable if you have specified a V9 CPU. Thus, limit the width to 32-bit, for now, leaving behind a TODO. This fixes a regression triggered by PR #73176.

Sparc. (llvm#81655) When in 32-bit mode, the backend doesn't currently implement 64-bit atomics, even though the hardware is capable if you have specified a V9 CPU. Thus, limit the width to 32-bit, for now, leaving behind a TODO. This fixes a regression triggered by PR llvm#73176. (cherry picked from commit c1a99b2)

llvmbot added clang Clang issues not falling into any other category backend:RISC-V clang:codegen IR generation bugs: mangling, exceptions, etc. labels Nov 22, 2023

Logikable force-pushed the main branch 3 times, most recently from d0f164d to dd6964d Compare November 22, 2023 22:10

Logikable force-pushed the main branch 2 times, most recently from dd6964d to 5d4b6ca Compare December 1, 2023 17:49

Logikable force-pushed the main branch from 5d4b6ca to 1451f9d Compare December 21, 2023 19:03

Logikable changed the title ~~[clang][CodeGen] Emit atomic IR instead of libcalls for misaligned po…~~ [clang][CodeGen] Emit atomic IR in place of optimized libcalls. Dec 21, 2023

jyknight reviewed Jan 10, 2024

View reviewed changes

arichardson reviewed Jan 11, 2024

View reviewed changes

clang/test/CodeGen/LoongArch/atomics.c Outdated Show resolved Hide resolved

arichardson reviewed Jan 16, 2024

View reviewed changes

Logikable force-pushed the main branch from 1451f9d to 9dfcc46 Compare January 17, 2024 19:35

Logikable requested review from jyknight and arichardson January 19, 2024 23:15

Logikable mentioned this pull request Jan 19, 2024

[SingleSource/Atomic] Add preliminary tests for atomic builtins. llvm/llvm-test-suite#78

Closed

jyknight approved these changes Feb 10, 2024

View reviewed changes

MaskRay merged commit 5fdd094 into llvm:main Feb 12, 2024

Logikable mentioned this pull request Feb 13, 2024

[SingleSource/Atomic] Add preliminary tests for atomic builtins. llvm/llvm-test-suite#94

Open

jyknight mentioned this pull request Feb 13, 2024

[Sparc] limit MaxAtomicSizeInBitsSupported to 32 for 32-bit Sparc. #81655

Merged

brad0 mentioned this pull request Feb 14, 2024

[Sparc] limit MaxAtomicSizeInBitsSupported to 32 for 32-bit Sparc. (#81655) #81713

Closed

RossSmyth mentioned this pull request May 16, 2024

clang vs gcc difference in armv6m vs armv7m inline atomic codegen #58603

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[clang][CodeGen] Emit atomic IR in place of optimized libcalls. #73176

[clang][CodeGen] Emit atomic IR in place of optimized libcalls. #73176

Logikable commented Nov 22, 2023 •

edited

Loading

llvmbot commented Nov 22, 2023 •

edited

Loading

github-actions bot commented Nov 22, 2023 •

edited

Loading

arichardson commented Nov 27, 2023

Logikable commented Nov 28, 2023

efriedma-quic commented Nov 29, 2023

Logikable commented Dec 21, 2023

efriedma-quic commented Dec 21, 2023

jyknight left a comment

arichardson left a comment

Logikable commented Jan 19, 2024

jyknight commented Jan 20, 2024

Logikable commented Feb 6, 2024

jyknight left a comment

rorth commented Feb 13, 2024

rorth commented Feb 13, 2024

Logikable commented Feb 13, 2024

rorth commented Feb 13, 2024

efriedma-quic commented Feb 13, 2024

jyknight commented Feb 13, 2024

jrtc27 commented Feb 13, 2024

[clang][CodeGen] Emit atomic IR in place of optimized libcalls. #73176

[clang][CodeGen] Emit atomic IR in place of optimized libcalls. #73176

Conversation

Logikable commented Nov 22, 2023 • edited Loading

llvmbot commented Nov 22, 2023 • edited Loading

github-actions bot commented Nov 22, 2023 • edited Loading

arichardson commented Nov 27, 2023

Logikable commented Nov 28, 2023

efriedma-quic commented Nov 29, 2023

Logikable commented Dec 21, 2023

efriedma-quic commented Dec 21, 2023

jyknight left a comment

Choose a reason for hiding this comment

arichardson left a comment

Choose a reason for hiding this comment

Logikable commented Jan 19, 2024

jyknight commented Jan 20, 2024

Logikable commented Feb 6, 2024

jyknight left a comment

Choose a reason for hiding this comment

rorth commented Feb 13, 2024

rorth commented Feb 13, 2024

Logikable commented Feb 13, 2024

rorth commented Feb 13, 2024

efriedma-quic commented Feb 13, 2024

jyknight commented Feb 13, 2024

jrtc27 commented Feb 13, 2024

Logikable commented Nov 22, 2023 •

edited

Loading

llvmbot commented Nov 22, 2023 •

edited

Loading

github-actions bot commented Nov 22, 2023 •

edited

Loading