Skip to content

[clang][CodeGen] Emit atomic IR in place of optimized libcalls. #73176

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Feb 12, 2024

Conversation

Logikable
Copy link
Contributor

@Logikable Logikable commented Nov 22, 2023

In the beginning, Clang only emitted atomic IR for operations it knew the
underlying microarch had instructions for, meaning it required significant
knowledge of the target. Later, the backend acquired the ability to lower
IR to libcalls. To avoid duplicating logic and improve logic locality,
we'd like to move as much as possible to the backend.

There are many ways to describe this change. For example, this change
reduces the variables Clang uses to decide whether to emit libcalls or
IR, down to only the atomic's size.

@llvmbot llvmbot added clang Clang issues not falling into any other category backend:RISC-V clang:codegen IR generation bugs: mangling, exceptions, etc. labels Nov 22, 2023
@llvmbot
Copy link
Member

llvmbot commented Nov 22, 2023

@llvm/pr-subscribers-clang-codegen
@llvm/pr-subscribers-backend-risc-v

@llvm/pr-subscribers-clang

Author: None (Logikable)

Changes

…inters.

Calling __atomic_fetch_op_n is undefined for misaligned pointers.

Since the backend can handle atomic IR on misaligned pointers, emit that instead. To keep things simple, we make this change for all misaligned operations, not just the integral ones.

There is an additional consequence of this change. Previously, libcalls were emitted for misaligned, misshapen (size != 2^n), and oversized objects. Since optimized libcalls only operate on 2^n sized members, removing the misaligned case means optimized libcalls will never be emitted, and all relevant codepaths can be cleaned up.

A simple correctness test is to have one thread perform an arithmetic operation on a misaligned integer, and another thread perform a non-arithmetic operation (e.g. xchg) on the same value. Such a test exhibits incorrect behaviour currently.


Patch is 62.44 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/73176.diff

11 Files Affected:

  • (modified) clang/lib/CodeGen/CGAtomic.cpp (+99-232)
  • (modified) clang/test/CodeGen/LoongArch/atomics.c (+3-3)
  • (modified) clang/test/CodeGen/PowerPC/quadword-atomics.c (+1-1)
  • (modified) clang/test/CodeGen/RISCV/riscv-atomics.c (+21-21)
  • (modified) clang/test/CodeGen/arm-atomics-m.c (+4-4)
  • (modified) clang/test/CodeGen/arm-atomics-m0.c (+8-8)
  • (modified) clang/test/CodeGen/atomic-ops-libcall.c (+73-46)
  • (modified) clang/test/CodeGen/atomic-ops.c (+15-10)
  • (modified) clang/test/CodeGen/atomics-inlining.c (+14-14)
  • (modified) clang/test/CodeGen/c11atomics.c (+6-12)
  • (modified) clang/test/CodeGenOpenCL/atomic-ops-libcall.cl (+27-27)
diff --git a/clang/lib/CodeGen/CGAtomic.cpp b/clang/lib/CodeGen/CGAtomic.cpp
index 6005d5c51c0e1ac..bc432afb6ee288b 100644
--- a/clang/lib/CodeGen/CGAtomic.cpp
+++ b/clang/lib/CodeGen/CGAtomic.cpp
@@ -785,27 +785,76 @@ static void EmitAtomicOp(CodeGenFunction &CGF, AtomicExpr *Expr, Address Dest,
   Builder.SetInsertPoint(ContBB);
 }
 
-static void
-AddDirectArgument(CodeGenFunction &CGF, CallArgList &Args,
-                  bool UseOptimizedLibcall, llvm::Value *Val, QualType ValTy,
-                  SourceLocation Loc, CharUnits SizeInChars) {
-  if (UseOptimizedLibcall) {
-    // Load value and pass it to the function directly.
-    CharUnits Align = CGF.getContext().getTypeAlignInChars(ValTy);
-    int64_t SizeInBits = CGF.getContext().toBits(SizeInChars);
-    ValTy =
-        CGF.getContext().getIntTypeForBitwidth(SizeInBits, /*Signed=*/false);
-    llvm::Type *ITy = llvm::IntegerType::get(CGF.getLLVMContext(), SizeInBits);
-    Address Ptr = Address(Val, ITy, Align);
-    Val = CGF.EmitLoadOfScalar(Ptr, false,
-                               CGF.getContext().getPointerType(ValTy),
-                               Loc);
-    // Coerce the value into an appropriately sized integer type.
-    Args.add(RValue::get(Val), ValTy);
-  } else {
-    // Non-optimized functions always take a reference.
-    Args.add(RValue::get(Val), CGF.getContext().VoidPtrTy);
+static bool
+isArithmeticOp(AtomicExpr::AtomicOp op) {
+  switch (op) {
+    case AtomicExpr::AO__atomic_add_fetch:
+    case AtomicExpr::AO__atomic_fetch_add:
+    case AtomicExpr::AO__c11_atomic_fetch_add:
+    case AtomicExpr::AO__hip_atomic_fetch_add:
+    case AtomicExpr::AO__opencl_atomic_fetch_add:
+    case AtomicExpr::AO__atomic_and_fetch:
+    case AtomicExpr::AO__atomic_fetch_and:
+    case AtomicExpr::AO__c11_atomic_fetch_and:
+    case AtomicExpr::AO__hip_atomic_fetch_and:
+    case AtomicExpr::AO__opencl_atomic_fetch_and:
+    case AtomicExpr::AO__atomic_or_fetch:
+    case AtomicExpr::AO__atomic_fetch_or:
+    case AtomicExpr::AO__c11_atomic_fetch_or:
+    case AtomicExpr::AO__hip_atomic_fetch_or:
+    case AtomicExpr::AO__opencl_atomic_fetch_or:
+    case AtomicExpr::AO__atomic_sub_fetch:
+    case AtomicExpr::AO__atomic_fetch_sub:
+    case AtomicExpr::AO__c11_atomic_fetch_sub:
+    case AtomicExpr::AO__hip_atomic_fetch_sub:
+    case AtomicExpr::AO__opencl_atomic_fetch_sub:
+    case AtomicExpr::AO__atomic_xor_fetch:
+    case AtomicExpr::AO__atomic_fetch_xor:
+    case AtomicExpr::AO__c11_atomic_fetch_xor:
+    case AtomicExpr::AO__hip_atomic_fetch_xor:
+    case AtomicExpr::AO__opencl_atomic_fetch_xor:
+    case AtomicExpr::AO__atomic_nand_fetch:
+    case AtomicExpr::AO__atomic_fetch_nand:
+    case AtomicExpr::AO__c11_atomic_fetch_nand:
+    case AtomicExpr::AO__atomic_min_fetch:
+    case AtomicExpr::AO__atomic_fetch_min:
+    case AtomicExpr::AO__c11_atomic_fetch_min:
+    case AtomicExpr::AO__hip_atomic_fetch_min:
+    case AtomicExpr::AO__opencl_atomic_fetch_min:
+    case AtomicExpr::AO__atomic_max_fetch:
+    case AtomicExpr::AO__atomic_fetch_max:
+    case AtomicExpr::AO__c11_atomic_fetch_max:
+    case AtomicExpr::AO__hip_atomic_fetch_max:
+    case AtomicExpr::AO__opencl_atomic_fetch_max:
+      return true;
+    case AtomicExpr::AO__c11_atomic_init:
+    case AtomicExpr::AO__opencl_atomic_init:
+    case AtomicExpr::AO__atomic_compare_exchange:
+    case AtomicExpr::AO__atomic_compare_exchange_n:
+    case AtomicExpr::AO__c11_atomic_compare_exchange_weak:
+    case AtomicExpr::AO__c11_atomic_compare_exchange_strong:
+    case AtomicExpr::AO__hip_atomic_compare_exchange_weak:
+    case AtomicExpr::AO__hip_atomic_compare_exchange_strong:
+    case AtomicExpr::AO__opencl_atomic_compare_exchange_weak:
+    case AtomicExpr::AO__opencl_atomic_compare_exchange_strong:
+    case AtomicExpr::AO__atomic_exchange:
+    case AtomicExpr::AO__atomic_exchange_n:
+    case AtomicExpr::AO__c11_atomic_exchange:
+    case AtomicExpr::AO__hip_atomic_exchange:
+    case AtomicExpr::AO__opencl_atomic_exchange:
+    case AtomicExpr::AO__atomic_store:
+    case AtomicExpr::AO__atomic_store_n:
+    case AtomicExpr::AO__c11_atomic_store:
+    case AtomicExpr::AO__hip_atomic_store:
+    case AtomicExpr::AO__opencl_atomic_store:
+    case AtomicExpr::AO__atomic_load:
+    case AtomicExpr::AO__atomic_load_n:
+    case AtomicExpr::AO__c11_atomic_load:
+    case AtomicExpr::AO__hip_atomic_load:
+    case AtomicExpr::AO__opencl_atomic_load:
+      return false;
   }
+  llvm_unreachable("All atomic ops should be handled!");
 }
 
 RValue CodeGenFunction::EmitAtomicExpr(AtomicExpr *E) {
@@ -833,7 +882,8 @@ RValue CodeGenFunction::EmitAtomicExpr(AtomicExpr *E) {
 
   bool Oversized = getContext().toBits(TInfo.Width) > MaxInlineWidthInBits;
   bool Misaligned = (Ptr.getAlignment() % TInfo.Width) != 0;
-  bool UseLibcall = Misaligned | Oversized;
+  bool PowerOf2Size = (Size & (Size - 1)) == 0;
+  bool UseLibcall = (!PowerOf2Size | Oversized) & !isArithmeticOp(E->getOp());
   bool ShouldCastToIntPtrTy = true;
 
   CharUnits MaxInlineWidth =
@@ -994,98 +1044,16 @@ RValue CodeGenFunction::EmitAtomicExpr(AtomicExpr *E) {
       Dest = Atomics.castToAtomicIntPointer(Dest);
   }
 
-  // Use a library call.  See: http://gcc.gnu.org/wiki/Atomic/GCCMM/LIbrary .
+  // Use a library call.  See: http://gcc.gnu.org/wiki/Atomic/GCCMM/Library.
+  // Clang should never generate an optimized libcall -- it's better the backend
+  // handle it.
   if (UseLibcall) {
-    bool UseOptimizedLibcall = false;
-    switch (E->getOp()) {
-    case AtomicExpr::AO__c11_atomic_init:
-    case AtomicExpr::AO__opencl_atomic_init:
-      llvm_unreachable("Already handled above with EmitAtomicInit!");
-
-    case AtomicExpr::AO__atomic_fetch_add:
-    case AtomicExpr::AO__atomic_fetch_and:
-    case AtomicExpr::AO__atomic_fetch_max:
-    case AtomicExpr::AO__atomic_fetch_min:
-    case AtomicExpr::AO__atomic_fetch_nand:
-    case AtomicExpr::AO__atomic_fetch_or:
-    case AtomicExpr::AO__atomic_fetch_sub:
-    case AtomicExpr::AO__atomic_fetch_xor:
-    case AtomicExpr::AO__atomic_add_fetch:
-    case AtomicExpr::AO__atomic_and_fetch:
-    case AtomicExpr::AO__atomic_max_fetch:
-    case AtomicExpr::AO__atomic_min_fetch:
-    case AtomicExpr::AO__atomic_nand_fetch:
-    case AtomicExpr::AO__atomic_or_fetch:
-    case AtomicExpr::AO__atomic_sub_fetch:
-    case AtomicExpr::AO__atomic_xor_fetch:
-    case AtomicExpr::AO__c11_atomic_fetch_add:
-    case AtomicExpr::AO__c11_atomic_fetch_and:
-    case AtomicExpr::AO__c11_atomic_fetch_max:
-    case AtomicExpr::AO__c11_atomic_fetch_min:
-    case AtomicExpr::AO__c11_atomic_fetch_nand:
-    case AtomicExpr::AO__c11_atomic_fetch_or:
-    case AtomicExpr::AO__c11_atomic_fetch_sub:
-    case AtomicExpr::AO__c11_atomic_fetch_xor:
-    case AtomicExpr::AO__hip_atomic_fetch_add:
-    case AtomicExpr::AO__hip_atomic_fetch_and:
-    case AtomicExpr::AO__hip_atomic_fetch_max:
-    case AtomicExpr::AO__hip_atomic_fetch_min:
-    case AtomicExpr::AO__hip_atomic_fetch_or:
-    case AtomicExpr::AO__hip_atomic_fetch_sub:
-    case AtomicExpr::AO__hip_atomic_fetch_xor:
-    case AtomicExpr::AO__opencl_atomic_fetch_add:
-    case AtomicExpr::AO__opencl_atomic_fetch_and:
-    case AtomicExpr::AO__opencl_atomic_fetch_max:
-    case AtomicExpr::AO__opencl_atomic_fetch_min:
-    case AtomicExpr::AO__opencl_atomic_fetch_or:
-    case AtomicExpr::AO__opencl_atomic_fetch_sub:
-    case AtomicExpr::AO__opencl_atomic_fetch_xor:
-      // For these, only library calls for certain sizes exist.
-      UseOptimizedLibcall = true;
-      break;
-
-    case AtomicExpr::AO__atomic_load:
-    case AtomicExpr::AO__atomic_store:
-    case AtomicExpr::AO__atomic_exchange:
-    case AtomicExpr::AO__atomic_compare_exchange:
-      // Use the generic version if we don't know that the operand will be
-      // suitably aligned for the optimized version.
-      if (Misaligned)
-        break;
-      [[fallthrough]];
-    case AtomicExpr::AO__atomic_load_n:
-    case AtomicExpr::AO__atomic_store_n:
-    case AtomicExpr::AO__atomic_exchange_n:
-    case AtomicExpr::AO__atomic_compare_exchange_n:
-    case AtomicExpr::AO__c11_atomic_load:
-    case AtomicExpr::AO__c11_atomic_store:
-    case AtomicExpr::AO__c11_atomic_exchange:
-    case AtomicExpr::AO__c11_atomic_compare_exchange_weak:
-    case AtomicExpr::AO__c11_atomic_compare_exchange_strong:
-    case AtomicExpr::AO__hip_atomic_load:
-    case AtomicExpr::AO__hip_atomic_store:
-    case AtomicExpr::AO__hip_atomic_exchange:
-    case AtomicExpr::AO__hip_atomic_compare_exchange_weak:
-    case AtomicExpr::AO__hip_atomic_compare_exchange_strong:
-    case AtomicExpr::AO__opencl_atomic_load:
-    case AtomicExpr::AO__opencl_atomic_store:
-    case AtomicExpr::AO__opencl_atomic_exchange:
-    case AtomicExpr::AO__opencl_atomic_compare_exchange_weak:
-    case AtomicExpr::AO__opencl_atomic_compare_exchange_strong:
-      // Only use optimized library calls for sizes for which they exist.
-      // FIXME: Size == 16 optimized library functions exist too.
-      if (Size == 1 || Size == 2 || Size == 4 || Size == 8)
-        UseOptimizedLibcall = true;
-      break;
-    }
-
     CallArgList Args;
-    if (!UseOptimizedLibcall) {
-      // For non-optimized library calls, the size is the first parameter
-      Args.add(RValue::get(llvm::ConstantInt::get(SizeTy, Size)),
-               getContext().getSizeType());
-    }
-    // Atomic address is the first or second parameter
+    // For non-optimized library calls, the size is the first parameter.
+    Args.add(RValue::get(llvm::ConstantInt::get(SizeTy, Size)),
+              getContext().getSizeType());
+
+    // The atomic address is the second parameter.
     // The OpenCL atomic library functions only accept pointer arguments to
     // generic address space.
     auto CastToGenericAddrSpace = [&](llvm::Value *V, QualType PT) {
@@ -1100,18 +1068,15 @@ RValue CodeGenFunction::EmitAtomicExpr(AtomicExpr *E) {
       return getTargetHooks().performAddrSpaceCast(
           *this, V, AS, LangAS::opencl_generic, DestType, false);
     };
-
     Args.add(RValue::get(CastToGenericAddrSpace(Ptr.getPointer(),
                                                 E->getPtr()->getType())),
              getContext().VoidPtrTy);
 
+    // The next 1-3 parameters are op-dependent.
     std::string LibCallName;
-    QualType LoweredMemTy =
-      MemTy->isPointerType() ? getContext().getIntPtrType() : MemTy;
     QualType RetTy;
     bool HaveRetTy = false;
-    llvm::Instruction::BinaryOps PostOp = (llvm::Instruction::BinaryOps)0;
-    bool PostOpMinMax = false;
+
     switch (E->getOp()) {
     case AtomicExpr::AO__c11_atomic_init:
     case AtomicExpr::AO__opencl_atomic_init:
@@ -1122,8 +1087,6 @@ RValue CodeGenFunction::EmitAtomicExpr(AtomicExpr *E) {
     // and exchange.
     // bool __atomic_compare_exchange(size_t size, void *mem, void *expected,
     //                                void *desired, int success, int failure)
-    // bool __atomic_compare_exchange_N(T *mem, T *expected, T desired,
-    //                                  int success, int failure)
     case AtomicExpr::AO__atomic_compare_exchange:
     case AtomicExpr::AO__atomic_compare_exchange_n:
     case AtomicExpr::AO__c11_atomic_compare_exchange_weak:
@@ -1138,25 +1101,25 @@ RValue CodeGenFunction::EmitAtomicExpr(AtomicExpr *E) {
       Args.add(RValue::get(CastToGenericAddrSpace(Val1.getPointer(),
                                                   E->getVal1()->getType())),
                getContext().VoidPtrTy);
-      AddDirectArgument(*this, Args, UseOptimizedLibcall, Val2.getPointer(),
-                        MemTy, E->getExprLoc(), TInfo.Width);
+      Args.add(RValue::get(CastToGenericAddrSpace(Val2.getPointer(),
+                                                  E->getVal2()->getType())),
+               getContext().VoidPtrTy);
       Args.add(RValue::get(Order), getContext().IntTy);
       Order = OrderFail;
       break;
     // void __atomic_exchange(size_t size, void *mem, void *val, void *return,
     //                        int order)
-    // T __atomic_exchange_N(T *mem, T val, int order)
     case AtomicExpr::AO__atomic_exchange:
     case AtomicExpr::AO__atomic_exchange_n:
     case AtomicExpr::AO__c11_atomic_exchange:
     case AtomicExpr::AO__hip_atomic_exchange:
     case AtomicExpr::AO__opencl_atomic_exchange:
       LibCallName = "__atomic_exchange";
-      AddDirectArgument(*this, Args, UseOptimizedLibcall, Val1.getPointer(),
-                        MemTy, E->getExprLoc(), TInfo.Width);
+      Args.add(RValue::get(CastToGenericAddrSpace(Val1.getPointer(),
+                                                  E->getVal1()->getType())),
+               getContext().VoidPtrTy);
       break;
     // void __atomic_store(size_t size, void *mem, void *val, int order)
-    // void __atomic_store_N(T *mem, T val, int order)
     case AtomicExpr::AO__atomic_store:
     case AtomicExpr::AO__atomic_store_n:
     case AtomicExpr::AO__c11_atomic_store:
@@ -1165,11 +1128,11 @@ RValue CodeGenFunction::EmitAtomicExpr(AtomicExpr *E) {
       LibCallName = "__atomic_store";
       RetTy = getContext().VoidTy;
       HaveRetTy = true;
-      AddDirectArgument(*this, Args, UseOptimizedLibcall, Val1.getPointer(),
-                        MemTy, E->getExprLoc(), TInfo.Width);
+      Args.add(RValue::get(CastToGenericAddrSpace(Val1.getPointer(),
+                                                  E->getVal1()->getType())),
+               getContext().VoidPtrTy);
       break;
     // void __atomic_load(size_t size, void *mem, void *return, int order)
-    // T __atomic_load_N(T *mem, int order)
     case AtomicExpr::AO__atomic_load:
     case AtomicExpr::AO__atomic_load_n:
     case AtomicExpr::AO__c11_atomic_load:
@@ -1177,108 +1140,45 @@ RValue CodeGenFunction::EmitAtomicExpr(AtomicExpr *E) {
     case AtomicExpr::AO__opencl_atomic_load:
       LibCallName = "__atomic_load";
       break;
-    // T __atomic_add_fetch_N(T *mem, T val, int order)
-    // T __atomic_fetch_add_N(T *mem, T val, int order)
     case AtomicExpr::AO__atomic_add_fetch:
-      PostOp = llvm::Instruction::Add;
-      [[fallthrough]];
     case AtomicExpr::AO__atomic_fetch_add:
     case AtomicExpr::AO__c11_atomic_fetch_add:
     case AtomicExpr::AO__hip_atomic_fetch_add:
     case AtomicExpr::AO__opencl_atomic_fetch_add:
-      LibCallName = "__atomic_fetch_add";
-      AddDirectArgument(*this, Args, UseOptimizedLibcall, Val1.getPointer(),
-                        LoweredMemTy, E->getExprLoc(), TInfo.Width);
-      break;
-    // T __atomic_and_fetch_N(T *mem, T val, int order)
-    // T __atomic_fetch_and_N(T *mem, T val, int order)
     case AtomicExpr::AO__atomic_and_fetch:
-      PostOp = llvm::Instruction::And;
-      [[fallthrough]];
     case AtomicExpr::AO__atomic_fetch_and:
     case AtomicExpr::AO__c11_atomic_fetch_and:
     case AtomicExpr::AO__hip_atomic_fetch_and:
     case AtomicExpr::AO__opencl_atomic_fetch_and:
-      LibCallName = "__atomic_fetch_and";
-      AddDirectArgument(*this, Args, UseOptimizedLibcall, Val1.getPointer(),
-                        MemTy, E->getExprLoc(), TInfo.Width);
-      break;
-    // T __atomic_or_fetch_N(T *mem, T val, int order)
-    // T __atomic_fetch_or_N(T *mem, T val, int order)
     case AtomicExpr::AO__atomic_or_fetch:
-      PostOp = llvm::Instruction::Or;
-      [[fallthrough]];
     case AtomicExpr::AO__atomic_fetch_or:
     case AtomicExpr::AO__c11_atomic_fetch_or:
     case AtomicExpr::AO__hip_atomic_fetch_or:
     case AtomicExpr::AO__opencl_atomic_fetch_or:
-      LibCallName = "__atomic_fetch_or";
-      AddDirectArgument(*this, Args, UseOptimizedLibcall, Val1.getPointer(),
-                        MemTy, E->getExprLoc(), TInfo.Width);
-      break;
-    // T __atomic_sub_fetch_N(T *mem, T val, int order)
-    // T __atomic_fetch_sub_N(T *mem, T val, int order)
     case AtomicExpr::AO__atomic_sub_fetch:
-      PostOp = llvm::Instruction::Sub;
-      [[fallthrough]];
     case AtomicExpr::AO__atomic_fetch_sub:
     case AtomicExpr::AO__c11_atomic_fetch_sub:
     case AtomicExpr::AO__hip_atomic_fetch_sub:
     case AtomicExpr::AO__opencl_atomic_fetch_sub:
-      LibCallName = "__atomic_fetch_sub";
-      AddDirectArgument(*this, Args, UseOptimizedLibcall, Val1.getPointer(),
-                        LoweredMemTy, E->getExprLoc(), TInfo.Width);
-      break;
-    // T __atomic_xor_fetch_N(T *mem, T val, int order)
-    // T __atomic_fetch_xor_N(T *mem, T val, int order)
     case AtomicExpr::AO__atomic_xor_fetch:
-      PostOp = llvm::Instruction::Xor;
-      [[fallthrough]];
     case AtomicExpr::AO__atomic_fetch_xor:
     case AtomicExpr::AO__c11_atomic_fetch_xor:
     case AtomicExpr::AO__hip_atomic_fetch_xor:
     case AtomicExpr::AO__opencl_atomic_fetch_xor:
-      LibCallName = "__atomic_fetch_xor";
-      AddDirectArgument(*this, Args, UseOptimizedLibcall, Val1.getPointer(),
-                        MemTy, E->getExprLoc(), TInfo.Width);
-      break;
+    case AtomicExpr::AO__atomic_nand_fetch:
+    case AtomicExpr::AO__atomic_fetch_nand:
+    case AtomicExpr::AO__c11_atomic_fetch_nand:
     case AtomicExpr::AO__atomic_min_fetch:
-      PostOpMinMax = true;
-      [[fallthrough]];
     case AtomicExpr::AO__atomic_fetch_min:
     case AtomicExpr::AO__c11_atomic_fetch_min:
     case AtomicExpr::AO__hip_atomic_fetch_min:
     case AtomicExpr::AO__opencl_atomic_fetch_min:
-      LibCallName = E->getValueType()->isSignedIntegerType()
-                        ? "__atomic_fetch_min"
-                        : "__atomic_fetch_umin";
-      AddDirectArgument(*this, Args, UseOptimizedLibcall, Val1.getPointer(),
-                        LoweredMemTy, E->getExprLoc(), TInfo.Width);
-      break;
     case AtomicExpr::AO__atomic_max_fetch:
-      PostOpMinMax = true;
-      [[fallthrough]];
     case AtomicExpr::AO__atomic_fetch_max:
     case AtomicExpr::AO__c11_atomic_fetch_max:
     case AtomicExpr::AO__hip_atomic_fetch_max:
     case AtomicExpr::AO__opencl_atomic_fetch_max:
-      LibCallName = E->getValueType()->isSignedIntegerType()
-                        ? "__atomic_fetch_max"
-                        : "__atomic_fetch_umax";
-      AddDirectArgument(*this, Args, UseOptimizedLibcall, Val1.getPointer(),
-                        LoweredMemTy, E->getExprLoc(), TInfo.Width);
-      break;
-    // T __atomic_nand_fetch_N(T *mem, T val, int order)
-    // T __atomic_fetch_nand_N(T *mem, T val, int order)
-    case AtomicExpr::AO__atomic_nand_fetch:
-      PostOp = llvm::Instruction::And; // the NOT is special cased below
-      [[fallthrough]];
-    case AtomicExpr::AO__atomic_fetch_nand:
-    case AtomicExpr::AO__c11_atomic_fetch_nand:
-      LibCallName = "__atomic_fetch_nand";
-      AddDirectArgument(*this, Args, UseOptimizedLibcall, Val1.getPointer(),
-                        MemTy, E->getExprLoc(), TInfo.Width);
-      break;
+      llvm_unreachable("Integral atomic operations always become atomicrmw!");
     }
 
     if (E->isOpenCL()) {
@@ -1286,57 +1186,24 @@ RValue CodeGenFunction::EmitAtomicExpr(AtomicExpr *E) {
           StringRef(LibCallName).drop_front(1).str();
 
     }
-    // Optimized functions have the size in their name.
-    if (UseOptimizedLibcall)
-      LibCallName += "_" + llvm::utostr(Size);
     // By default, assume we return a value of the atomic type.
     if (!HaveRetTy) {
-      if (UseOptimizedLibcall) {
-        // Value is returned directly.
-        // The function returns an appropriately sized integer type.
-        RetTy = getContext().getIntTypeForBitwidth(
-            getContext().toBits(TInfo.Width), /*Signed=*/false);
-      } else {
-        // Value is returned through parameter before the order.
-        RetTy = getContext().VoidTy;
-        Args.add(RValue::get(Dest.getPointer()), getContext().VoidPtrTy);
-      }
+      // Value is returned through parameter before the order.
+      RetTy = getContext().VoidTy;
+      Args.add(RValue::get(CastToGenericAddrSpace(Dest.getPointer(), RetTy)),
+               getContext().VoidPtrTy);
     }
-    // order is always the last parameter
+    // Order is always the last parameter.
     Args.add(RValue::get(Order),
              getContext().IntTy);
     if (E->isOpenCL())
       Args.add(RValue::get(Scope), getContext().IntTy);
 
-    // PostOp is only needed for the atomic_*_fetch...
[truncated]

Copy link

github-actions bot commented Nov 22, 2023

✅ With the latest revision this PR passed the C/C++ code formatter.

@Logikable Logikable force-pushed the main branch 3 times, most recently from d0f164d to dd6964d Compare November 22, 2023 22:10
@arichardson
Copy link
Member

When I first came across this code I found it quite surprising that both clang and LLVM need handling for atomic libcalls. I feel it would be a lot nicer if this could all be handled inside the LLVM backend logic. Would it also be possible to emit the unoptimized libcall from LLVM instead of Clang and always emit the atomic IR operation and let the backend deal with lowering or am I missing something?

@Logikable
Copy link
Contributor Author

I believe Clang currently does not properly propagate alignment information for certain IR (e.g. atomicrmw and cmpxchg). Look at the definitions of CreateAtomic* in Clang. The patches in https://bugs.llvm.org/show_bug.cgi?id=27168 did not address this (thanks @MaskRay for helping me find this).

@efriedma-quic
Copy link
Collaborator

As a historical note, when I first wrote the support for atomics, it was under a model where LLVM IR only modeled legal atomics, and clang lowered illegal atomics to libcalls. This was changed a few years later to generalize atomics in LLVM IR to support arbitrary atomic ops lowered to libcalls. But there are still a few inconsistencies resulting from this, I think.

@Logikable Logikable force-pushed the main branch 2 times, most recently from dd6964d to 5d4b6ca Compare December 1, 2023 17:49
@Logikable Logikable changed the title [clang][CodeGen] Emit atomic IR instead of libcalls for misaligned po… [clang][CodeGen] Emit atomic IR in place of optimized libcalls. Dec 21, 2023
@Logikable
Copy link
Contributor Author

I've rewritten a part of this PR to incorporate @arichardson's suggestion. I wasn't able to find inconsistencies through a cursory review of the backend's AtomicExpand pass, nor through a small set of tests built for common targets.

How can I find the gaps I missed?

@efriedma-quic
Copy link
Collaborator

When I said "inconsistencies", I just meant the way the responsibility for lowering atomics is split between LLVM and clang; I didn't mean anything was actually broken.

Copy link
Member

@jyknight jyknight left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, I think this is a great improvement.

And I believe it should be correct to do now, after fixing the backends' MaxAtomicSizeInBitsSupported in #75703 #75185 #75112 #74385 #74389, and fixing Clang's propagation of alignment to atomicrmw/cmpxchg in #74349.

Just a few more comments.

Copy link
Member

@arichardson arichardson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this looks great to me, very happy to see reduced duplication between clang and llvm.

It would be great if we could just emit the atomic IR in all cases but unfortunately non-power-of-two sizes result in a verifier error. We could probably emit the atomic IR for power-of-two sizes and let the expand pass deal with it but since we still need the fallback for other sizes we might as well keep it.

In the beginning, Clang only emitted atomic IR for operations it knew the
underlying microarch had instructions for, meaning it required significant
knowledge of the target. Later, the backend acquired the ability to lower
IR to libcalls. To avoid duplicating logic and improve logic locality,
we'd like to move as much as possible to the backend.

There are many ways to describe this change. For example, this change
reduces the variables Clang uses to decide whether to emit libcalls or
IR, down to only the atomic's size.
@Logikable
Copy link
Contributor Author

Updated with suggestions. How does this look?

Logikable added a commit to Logikable/llvm-test-suite that referenced this pull request Jan 19, 2024
There exist atomic IR unit tests and libatomic unit tests, but neither
can test the atomicity and interoperability of atomic builtins and
compiler-rt's atomic library. These tests aim to approximate behaviour
encountered in user code.

These tests have caught issues in Clang. See
llvm/llvm-project#74349 and
llvm/llvm-project#73176 for LLVM changes
inspired by these tests.
Logikable added a commit to Logikable/llvm-test-suite that referenced this pull request Jan 19, 2024
There exist atomic IR unit tests and libatomic unit tests, but neither
can test the atomicity and interoperability of atomic builtins and
compiler-rt's atomic library. These tests aim to approximate behaviour
encountered in user code.

These tests have caught issues in Clang. See
llvm/llvm-project#74349 and
llvm/llvm-project#73176 for LLVM changes
inspired by these tests.
@jyknight
Copy link
Member

BTW, please don't rebase/amend commits and force-push, it makes a it extremely difficult to review the changes since the previous review -- especially if the new changes are commingled with a rebase.

Much better to simply push new commits on top of your existing branch. And if you need an updated baseline, you can git merge origin/main in your PR branch.

Logikable added a commit to Logikable/llvm-test-suite that referenced this pull request Jan 23, 2024
There exist atomic IR unit tests and libatomic unit tests, but neither
can test the atomicity and interoperability of atomic builtins and
compiler-rt's atomic library. These tests aim to approximate behaviour
encountered in user code.

These tests have caught issues in Clang. See
llvm/llvm-project#74349 and
llvm/llvm-project#73176 for LLVM changes
inspired by these tests.
@Logikable
Copy link
Contributor Author

Gentle bump on this. Is there anything I can do now to make this easier to review?

Copy link
Member

@jyknight jyknight left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay. This looks good to me now!

@MaskRay MaskRay merged commit 5fdd094 into llvm:main Feb 12, 2024
@rorth
Copy link
Collaborator

rorth commented Feb 13, 2024

I'm pretty certain this patch broke the Solaris/sparcv9 buildbot.

@rorth
Copy link
Collaborator

rorth commented Feb 13, 2024

Confirmed: reverting the patch locally restores the build.

@Logikable
Copy link
Contributor Author

https://reviews.llvm.org/D118021 likely related, looking into this.

@rorth
Copy link
Collaborator

rorth commented Feb 13, 2024

The libcalls per se wouldn't be a problem since on 32-bit Solaris/sparc -latomic is always linked due to that. However, the __sync_* libcalls aren't defined in libatomic.so, thus the link failure.

@efriedma-quic
Copy link
Collaborator

Looks like a bug in the SPARC backend. #73176 (review) indicated that atomic expansion was working correctly for all backends, but I guess it isn't working correctly on SPARC.

Logikable added a commit to Logikable/llvm-test-suite that referenced this pull request Feb 13, 2024
There exist atomic IR unit tests and libatomic unit tests, but neither
can test the atomicity and interoperability of atomic builtins and
compiler-rt's atomic library. These tests aim to approximate behaviour
encountered in user code.

These tests have caught issues in Clang. See
llvm/llvm-project#74349 and
llvm/llvm-project#73176 for LLVM changes
inspired by these tests.
@jyknight
Copy link
Member

What config is this bot using? Is it "-target sparc-solaris -mcpu=v9" (that is: 32-bit sparc, but with v9 cpu available)?

I see that SparcV9TargetInfo sets MaxAtomicInlineWidth to 64 unconditionally. But, the message warning: large atomic operation may incur significant performance penalty; the access size (8 bytes) exceeds the max lock-free size (4 bytes) [-Watomic-alignment] indicates that MaxAtomicInlineWidth is 32. OK, wait, this class is poorly-named. SparcV8TargetInfo really means "sparc 32-bit" and SparcV9TargetInfo really means "sparc 64-bit". So: this is actually using SparcV8TargetInfo, instead.

Yet, even then, SparcV8TargetInfo appears to intend to configure MaxAtomicInlineWidth to 64 when using a V9 CPU. That doesn't seem to be actually happening, though.

OK, so the frontend is apparently buggily configured to only support 32-bit atomics in 32-bit mode, even when a V9 CPU is available.

That, then, was covering up a bug in the backend. The backend claims to (and should be able to!) support 64-bit atomics in 32-bit mode. But it doesn't actually, right now.

I'll send a patch to fix this by limiting atomics to 32-bit in the backend when !Subtarget->is64Bit(), which will get things back to working, though it doesn't address those deeper issues.

@jrtc27
Copy link
Collaborator

jrtc27 commented Feb 13, 2024

Part of that confusion comes from SPARC's own naming. V9 is the CPU, but a V9 CPU being used for 32-bit code is called V8+...

jyknight added a commit to jyknight/llvm-project that referenced this pull request Feb 13, 2024
When in 32-bit mode, the backend doesn't currently implement 64-bit
atomics, even though the hardware is capable if you have specified a
V9 CPU. Thus, limit the width to 32-bit, for now, leaving behind a
TODO.

This fixes a regression triggered by PR llvm#73176.
jyknight added a commit that referenced this pull request Feb 13, 2024
…81655)

When in 32-bit mode, the backend doesn't currently implement 64-bit
atomics, even though the hardware is capable if you have specified a V9
CPU. Thus, limit the width to 32-bit, for now, leaving behind a TODO.

This fixes a regression triggered by PR #73176.
brad0 added a commit to brad0/llvm-project that referenced this pull request Feb 14, 2024
 Sparc. (llvm#81655)

When in 32-bit mode, the backend doesn't currently implement 64-bit
atomics, even though the hardware is capable if you have specified a V9
CPU. Thus, limit the width to 32-bit, for now, leaving behind a TODO.

This fixes a regression triggered by PR llvm#73176.

(cherry picked from commit c1a99b2)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:RISC-V clang:codegen IR generation bugs: mangling, exceptions, etc. clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants