[AMDGPU] Implement IR expansion for frem instruction #130988

frederik-h · 2025-03-12T16:31:42Z

This patch implements a correctly rounded expansion of the frem instruction in LLVM IR. This is useful for target architectures for which such an expansion is too involved to be implement in ISel Lowering. The expansion is based on the code from the AMD device libs and has been tested successfully against the OpenCL conformance tests on amdgpu. The expansion is implemented in the preexisting "expand-fp" pass. It replaces the expansion of "frem" in ISel for the amdgpu target; it is enabled for targets which do not directly support "frem" and for which no matching "fmod" LibCall is available.

This patch implements a correctly rounded expansion of the frem instruction in LLVM IR. This is useful for target architectures where such an expansion is too involved to be implement on ISel Lowering. The expansion is based on the code from the AMD device libs and has been tested successfully against the OpenCL conformance tests on AMDGPU. The expansion is implemented in the preexisting "expand-large-fp-convert" pass. It is enabled by a new "shouldExpandFRemInIR" function in TargetLowering.

llvmbot · 2025-03-12T16:32:28Z

@llvm/pr-subscribers-backend-amdgpu

Author: Frederik Harwath (frederik-h)

Changes

This patch implements a correctly rounded expansion of the frem instruction in LLVM IR. This is useful for target architectures for which such an expansion is too involved to be implement on ISel Lowering. The expansion is based on the code from the AMD device libs and has been tested successfully against the OpenCL conformance tests on AMDGPU. The expansion is implemented in the preexisting "expand-large-fp-convert" pass. It is enabled by a new "shouldExpandFRemInIR" function in TargetLowering.

Patch is 245.02 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/130988.diff

5 Files Affected:

(modified) llvm/include/llvm/CodeGen/TargetLowering.h (+4)
(modified) llvm/lib/CodeGen/ExpandLargeFpConvert.cpp (+386-3)
(modified) llvm/lib/Target/AMDGPU/AMDGPUISelLowering.h (+2)
(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/frem.ll (+3944-603)
(modified) llvm/test/CodeGen/AMDGPU/wave32.ll (+322-62)

diff --git a/llvm/include/llvm/CodeGen/TargetLowering.h b/llvm/include/llvm/CodeGen/TargetLowering.h
index 2089d47e9cbc8..b64c57fdba992 100644
--- a/llvm/include/llvm/CodeGen/TargetLowering.h
+++ b/llvm/include/llvm/CodeGen/TargetLowering.h
@@ -5671,6 +5671,10 @@ class TargetLowering : public TargetLoweringBase {
                                        LoadSDNode *OriginalLoad,
                                        SelectionDAG &DAG) const;
 
+  /// Indicates whether the FRem instruction should be expanded before
+  /// ISel in the LLVM IR.
+  virtual bool shouldExpandFRemInIR() const { return false; };
+
 private:
   SDValue foldSetCCWithAnd(EVT VT, SDValue N0, SDValue N1, ISD::CondCode Cond,
                            const SDLoc &DL, DAGCombinerInfo &DCI) const;
diff --git a/llvm/lib/CodeGen/ExpandLargeFpConvert.cpp b/llvm/lib/CodeGen/ExpandLargeFpConvert.cpp
index ee583a25214ef..31d3779eb7c9f 100644
--- a/llvm/lib/CodeGen/ExpandLargeFpConvert.cpp
+++ b/llvm/lib/CodeGen/ExpandLargeFpConvert.cpp
@@ -6,11 +6,12 @@
 //
 //===----------------------------------------------------------------------===//
 //
-
 // This pass expands ‘fptoui .. to’, ‘fptosi .. to’, ‘uitofp .. to’,
 // ‘sitofp .. to’ instructions with a bitwidth above a threshold into
 // auto-generated functions. This is useful for targets like x86_64 that cannot
 // lower fp convertions with more than 128 bits.
+// Furthermore, the pass can expand FRem instructions if requested in the
+// TargetLowering for the current target.
 //
 //===----------------------------------------------------------------------===//
 
@@ -21,6 +22,7 @@
 #include "llvm/CodeGen/TargetLowering.h"
 #include "llvm/CodeGen/TargetPassConfig.h"
 #include "llvm/CodeGen/TargetSubtargetInfo.h"
+#include "llvm/IR/FMF.h"
 #include "llvm/IR/IRBuilder.h"
 #include "llvm/IR/InstIterator.h"
 #include "llvm/IR/PassManager.h"
@@ -28,6 +30,9 @@
 #include "llvm/Pass.h"
 #include "llvm/Support/CommandLine.h"
 #include "llvm/Target/TargetMachine.h"
+#include "llvm/Transforms/Utils/BasicBlockUtils.h"
+
+#define DEBUG_TYPE "expand-large-fp-convert"
 
 using namespace llvm;
 
@@ -37,6 +42,376 @@ static cl::opt<unsigned>
                      cl::desc("fp convert instructions on integers with "
                               "more than <N> bits are expanded."));
 
+namespace {
+/// This class implements a precise expansion of the frem instruction.
+/// The generated code is based on the fmod implementation in the AMD device
+/// libs.
+class FRemExpander {
+  /// The IRBuilder to use for the expansion.
+  IRBuilder<> &B;
+
+  /// Floating point type of the return value and the arguments of the FRem
+  /// instructions that should be expanded.
+  Type *FremTy;
+
+  /// Floating point type to use for the computation.  This may be
+  /// wider than the \p FremTy.
+  Type *ComputeFpTy;
+
+  /// Integer type that can hold floating point values of type \p FremTY.
+  Type *IntTy;
+
+  /// Integer type used to hold the exponents returned by frexp.
+  Type *ExTy;
+
+  /// How many bits of the quotient to compute per iteration of the
+  /// algorithm, stored as a value of type \p ExTy.
+  Value *Bits;
+
+  /// Constant 1 of type \p ExTy.
+  Value *One;
+
+  /// The sign bit for floating point values of type \p FremTy.
+  const unsigned long Signbit;
+
+public:
+  static std::optional<FRemExpander> create(IRBuilder<> &B, Type *Ty) {
+    if (Ty->is16bitFPTy())
+      return FRemExpander{B, Ty, 11, 0x8000, B.getFloatTy(), B.getInt16Ty()};
+    if (Ty->isFloatTy() || Ty->isHalfTy())
+      return FRemExpander{B, Ty, 12, 0x80000000L, Ty, B.getInt32Ty()};
+    if (Ty->isDoubleTy())
+      return FRemExpander{B, Ty, 26, 0x8000000000000000L, Ty, B.getInt64Ty()};
+
+    return std::nullopt;
+  }
+
+  /// Build the FRem expansion for the numerator \p X and the
+  /// denumerator \p Y using the builder \p B.  The type of X and Y
+  /// must match the type for which the class instance has been
+  /// created. The code will be generated at the insertion point of \p
+  /// B and the insertion point will be reset at exit.
+  Value *buildFRem(Value *X, Value *Y) const;
+
+private:
+  FRemExpander(IRBuilder<> &B, Type *FremTy, short Bits, unsigned long Signbit,
+               Type *ComputeFpTy, Type *IntTy)
+      : B(B), FremTy(FremTy), ComputeFpTy(ComputeFpTy), IntTy(IntTy),
+        ExTy(B.getInt32Ty()), Bits(ConstantInt::get(ExTy, Bits)),
+        One(ConstantInt::get(ExTy, 1)), Signbit(Signbit) {};
+
+  Value *createLdexp(Value *Base, Value *Exp, const Twine &Name) const {
+    return B.CreateIntrinsic(Intrinsic::ldexp, {ComputeFpTy, B.getInt32Ty()},
+                             {Base, Exp}, {}, Name);
+  }
+
+  Value *createRcp(Value *V, const Twine &Name) const {
+    return B.CreateFDiv(ConstantFP::get(ComputeFpTy, 1.0), V, Name);
+  }
+
+  // Helper function to build the UPDATE_AX code which is common to the
+  // loop body and the "final iteration".
+  Value *buildUpdateAx(Value *Ax, Value *Ay, Value *Ayinv) const {
+    // Build:
+    //   float q = BUILTIN_RINT_ComputeFpTy(ax * ayinv);
+    //   ax = fnma(q, ay, ax);
+    //   int clt = ax < 0.0f;
+    //   float axp = ax + ay;
+    //   ax = clt ? axp : ax;
+    Value *Q = B.CreateUnaryIntrinsic(Intrinsic::rint, B.CreateFMul(Ax, Ayinv),
+                                      {}, "q");
+    Value *AxUpdate = B.CreateIntrinsic(Intrinsic::fma, {ComputeFpTy},
+                                        {B.CreateFNeg(Q), Ay, Ax}, {}, "ax");
+    Value *Clt = B.CreateFCmp(CmpInst::FCMP_OLT, AxUpdate,
+                              ConstantFP::get(ComputeFpTy, 0.0), "clt");
+    Value *Axp = B.CreateFAdd(AxUpdate, Ay, "axp");
+    AxUpdate = B.CreateSelect(Clt, Axp, AxUpdate, "ax");
+
+    return AxUpdate;
+  }
+
+  /// Build code to extract the exponent and mantissa of \p Src.
+  /// Return the exponent minus one for use as a loop bound and
+  /// the mantissa taken to the given \p NewExp power.
+  std::pair<Value *, Value *> buildExpAndPower(Value *Src, Value *NewExp,
+                                               const Twine &ExName,
+                                               const Twine &PowName) const {
+    // Build:
+    //   ExName = BUILTIN_FREXP_EXP_ComputeFpTy(Src) - 1;
+    //   PowName =
+    //   BUILTIN_FLDEXP_ComputeFpTy(BUILTIN_FREXP_MANT_ComputeFpTy(ExName),
+    //   NewExp);
+    Type *Ty = Src->getType();
+    Type *ExTy = B.getInt32Ty();
+    Value *Frexp = B.CreateIntrinsic(Intrinsic::frexp, {Ty, ExTy}, Src);
+    Value *Mant = B.CreateExtractValue(Frexp, {0});
+    Value *Exp = B.CreateExtractValue(Frexp, {1});
+
+    Exp = B.CreateSub(Exp, One, ExName);
+    Value *Pow = createLdexp(Mant, NewExp, PowName);
+
+    return {Pow, Exp};
+  }
+
+  /// Build the main computation of the remainder for the case in which
+  /// Ax > Ay, where Ax = |X|, Ay = |Y|, and X is the numerator and Y the
+  /// denumerator. Add the incoming edge from the computation result
+  /// to \p RetPhi.
+  void buildRemainderComputation(Value *AxInitial, Value *AyInitial, Value *X,
+                                 PHINode *RetPhi) const {
+    // Build:
+    // ex = BUILTIN_FREXP_EXP_ComputeFpTy(ax) - 1;
+    // ax = BUILTIN_FLDEXP_ComputeFpTy(BUILTIN_FREXP_MANT_ComputeFpTy(ax),
+    // bits); ey = BUILTIN_FREXP_EXP_ComputeFpTy(ay) - 1; ay =
+    // BUILTIN_FLDEXP_ComputeFpTy(BUILTIN_FREXP_MANT_ComputeFpTy(ay), 1); auto
+    // [Ax, Ex]{getFrexpResults(B, AxInitial)};
+    auto [Ax, Ex] = buildExpAndPower(AxInitial, Bits, "ex", "ax");
+    auto [Ay, Ey] = buildExpAndPower(AyInitial, One, "ey", "ay");
+
+    // Build:
+    //   int nb = ex - ey;
+    //   float ayinv = MATH_FAST_RCP(ay);
+    Value *Nb = B.CreateSub(Ex, Ey, "nb");
+    Value *Ayinv = createRcp(Ay, "ayinv");
+
+    // Build: while (nb > bits)
+    BasicBlock *PreheaderBB = B.GetInsertBlock();
+    Function *Fun = PreheaderBB->getParent();
+    auto *LoopBB = BasicBlock::Create(B.getContext(), "frem.loop_body", Fun);
+    auto *ExitBB = BasicBlock::Create(B.getContext(), "frem.loop_exit", Fun);
+
+    B.CreateCondBr(B.CreateICmp(CmpInst::ICMP_SGT, Nb, Bits), LoopBB, ExitBB);
+
+    // Build loop body:
+    //   UPDATE_AX
+    //   ax = BUILTIN_FLDEXP_ComputeFpTy(ax, bits);
+    //   nb -= bits;
+    // One iteration of the loop is factored out.  The code shared by
+    // the loop and this "iteration" is denoted by UPDATE_AX.
+    B.SetInsertPoint(LoopBB);
+    auto *NbIv = B.CreatePHI(Nb->getType(), 2, "nb_iv");
+    NbIv->addIncoming(Nb, PreheaderBB);
+
+    auto *AxPhi = B.CreatePHI(ComputeFpTy, 2, "ax_loop_phi");
+    AxPhi->addIncoming(Ax, PreheaderBB);
+
+    Value *AxPhiUpdate = buildUpdateAx(AxPhi, Ay, Ayinv);
+    AxPhiUpdate = createLdexp(AxPhiUpdate, Bits, "ax_update");
+    AxPhi->addIncoming(AxPhiUpdate, LoopBB);
+    NbIv->addIncoming(B.CreateSub(NbIv, Bits, "nb_update"), LoopBB);
+
+    B.CreateCondBr(B.CreateICmp(CmpInst::ICMP_SGT, NbIv, Bits), LoopBB, ExitBB);
+
+    // Build final iteration
+    //   ax = BUILTIN_FLDEXP_ComputeFpTy(ax, nb - bits + 1);
+    //   UPDATE_AX
+    B.SetInsertPoint(ExitBB);
+
+    auto *AxPhiExit = B.CreatePHI(ComputeFpTy, 2, "ax_exit_phi");
+    AxPhiExit->addIncoming(Ax, PreheaderBB);
+    AxPhiExit->addIncoming(AxPhi, LoopBB);
+    auto *NbExitPhi = B.CreatePHI(Nb->getType(), 2, "nb_exit_phi");
+    NbExitPhi->addIncoming(NbIv, LoopBB);
+    NbExitPhi->addIncoming(Nb, PreheaderBB);
+
+    Value *AxFinal = createLdexp(
+        AxPhiExit, B.CreateAdd(B.CreateSub(NbExitPhi, Bits), One), "ax");
+    AxFinal = buildUpdateAx(AxFinal, Ay, Ayinv);
+
+    // Adjust exponent and sign
+    //    ax = BUILTIN_FLDEXP_ComputeFpTy(ax, ey);
+    //    ret = AS_FLOAT((AS_INT(x) & SIGNBIT_SP32) ^ AS_INT(ax));
+    AxFinal = createLdexp(AxFinal, Ey, "ax");
+
+    Value *XAsInt = B.CreateBitCast(X, IntTy, "x_as_int");
+    if (ComputeFpTy != X->getType())
+      AxFinal = B.CreateFPTrunc(AxFinal, X->getType());
+
+    Value *AxAsInt = B.CreateBitCast(AxFinal, IntTy, "ax_as_int");
+
+    Value *Ret =
+        B.CreateXor(B.CreateAnd(XAsInt, Signbit), AxAsInt, "Remainder");
+    Ret = B.CreateBitCast(Ret, X->getType());
+
+    RetPhi->addIncoming(Ret, ExitBB);
+  }
+
+  /// Build the else-branch of the conditional in the FRem
+  /// expansion, i.e. the case in wich Ax <= Ay, where Ax = |X|, Ay
+  /// = |Y|, and X is the numerator and Y the denumerator. Add the
+  /// incoming edge from the result to \p RetPhi.
+  void buildElseBranch(Value *Ax, Value *Ay, Value *X, PHINode *RetPhi) const {
+    // Build:
+    // ret = ax == ay ? BUILTIN_COPYSIGN_ComputeFpTy(0.0f, x) : x;
+    Value *ZeroWithXSign = B.CreateIntrinsic(
+        Intrinsic::copysign, {FremTy}, {ConstantFP::get(FremTy, 0.0), X}, {});
+
+    Value *Ret = B.CreateSelect(B.CreateFCmpOEQ(Ax, Ay), ZeroWithXSign, X);
+
+    RetPhi->addIncoming(Ret, B.GetInsertBlock());
+  }
+
+  /// Adjust the result of the main computation from the FRem expansion
+  /// if NaNs or infinite values are possible.
+  Value *buildNanAndInfHandling(Value *Ret, Value *X, Value *Y) const {
+    // Build:
+    //   ret = y == 0.0f ? QNAN_ComputeFpTy : ret;
+    //   bool c = !BUILTIN_ISNAN_ComputeFpTy(y) &&
+    //   BUILTIN_ISFINITE_ComputeFpTy(x); ret = c ? ret : QNAN_ComputeFpTy;
+    // TODO Handle NaN and infinity fast math flags separately here?
+    Value *Nan = ConstantFP::getQNaN(FremTy);
+
+    Ret = B.CreateSelect(B.createIsFPClass(Y, FPClassTest::fcZero), Nan, Ret);
+    Value *C = B.CreateLogicalAnd(
+        B.CreateNot(B.createIsFPClass(Y, FPClassTest::fcNan)),
+        B.createIsFPClass(X, FPClassTest::fcFinite));
+    Ret = B.CreateSelect(C, Ret, Nan);
+
+    return Ret;
+  }
+};
+
+Value *FRemExpander::buildFRem(Value *X, Value *Y) const {
+  assert(X->getType() == FremTy && Y->getType() == FremTy);
+
+  FastMathFlags FMF = B.getFastMathFlags();
+
+  // This function generates the following code structure:
+  //   if (abs(x) > abs(y))
+  //   { ret = compute remainder }
+  //   else
+  //   { ret = x or 0 with sign of x }
+  //   Adjust ret to NaN/inf in input
+  //   return ret
+  Value *Ax = B.CreateUnaryIntrinsic(Intrinsic::fabs, X, {}, "ax");
+  Value *Ay = B.CreateUnaryIntrinsic(Intrinsic::fabs, Y, {}, "ay");
+  if (ComputeFpTy != X->getType()) {
+    Ax = B.CreateFPExt(Ax, ComputeFpTy, "ax");
+    Ay = B.CreateFPExt(Ay, ComputeFpTy, "ay");
+  }
+  Value *AxAyCmp = B.CreateFCmpOGT(Ax, Ay);
+
+  PHINode *RetPhi = B.CreatePHI(FremTy, 2, "ret");
+  Value *Ret = RetPhi;
+
+  if (!FMF.noNaNs() || !FMF.noInfs())
+    Ret = buildNanAndInfHandling(Ret, X, Y);
+
+  Function *Fun = B.GetInsertBlock()->getParent();
+  auto *ThenBB = BasicBlock::Create(B.getContext(), "frem.compute", Fun);
+  auto *ElseBB = BasicBlock::Create(B.getContext(), "frem.else", Fun);
+  SplitBlockAndInsertIfThenElse(AxAyCmp, RetPhi, &ThenBB, &ElseBB);
+
+  auto SavedInsertPt = B.GetInsertPoint();
+
+  // Build remainder computation for "then" branch
+  //
+  // The ordered comparison ensures that ax and ay are not NaNs
+  // in the then-branch. Furthermore, y cannot be an infinity and the
+  // check at the end of the function ensures that the result will not
+  // be used if x is an infinity.
+  FastMathFlags ComputeFMF = FMF;
+  ComputeFMF.setNoInfs();
+  ComputeFMF.setNoNaNs();
+
+  B.SetInsertPoint(ThenBB);
+  B.setFastMathFlags(ComputeFMF);
+  buildRemainderComputation(Ax, Ay, X, RetPhi);
+  B.setFastMathFlags(FMF);
+  B.CreateBr(RetPhi->getParent());
+
+  // Build "else"-branch
+  B.SetInsertPoint(ElseBB);
+  buildElseBranch(Ax, Ay, X, RetPhi);
+  B.CreateBr(RetPhi->getParent());
+
+  B.SetInsertPoint(SavedInsertPt);
+
+  return Ret;
+}
+} // namespace
+
+/// Return true if \p Op either is a constant or a selection
+/// instruction with constant operands.
+static bool isConstOrConstSelectOp(Value *Op) {
+  if (isa<Constant>(Op))
+    return true;
+
+  auto *S = dyn_cast<SelectInst>(Op);
+  if (!S)
+    return false;
+
+  return isa<Constant>(S->getTrueValue()) && isa<Constant>(S->getFalseValue());
+}
+
+/// Returns true if \p I should not be expanded because
+/// it will be eliminated during ISel.
+static bool shouldSkipExpandFRem(BinaryOperator &I) {
+  // This condition should be sufficient for DAGCombiner::visitFREM to
+  // eliminate the instruction.
+  return isConstOrConstSelectOp(I.getOperand(0)) &&
+         isConstOrConstSelectOp(I.getOperand(1));
+}
+
+static bool expandFRem(BinaryOperator &I) {
+  LLVM_DEBUG(dbgs() << "Expanding instruction: " << I << '\n');
+  if (shouldSkipExpandFRem(I)) {
+    LLVM_DEBUG(
+        dbgs() << "Skipping 'frem' instruction that should be removed by "
+                  "DAGCombiner.\n");
+    return false;
+  }
+
+  Type *ReturnTy = I.getType();
+  assert(ReturnTy->isFPOrFPVectorTy());
+
+  FastMathFlags FMF = I.getFastMathFlags();
+  // TODO Make use of those flags for optimization?
+  FMF.setAllowReciprocal(false);
+  FMF.setAllowContract(false);
+  FMF.setApproxFunc(false);
+
+  IRBuilder<> B(&I);
+  B.setFastMathFlags(FMF);
+  B.SetCurrentDebugLocation(I.getDebugLoc());
+
+  Type *ElemTy = ReturnTy->getScalarType();
+  const std::optional<FRemExpander> Expander = FRemExpander::create(B, ElemTy);
+
+  if (!Expander || isa<ScalableVectorType>(ReturnTy)) {
+    LLVM_DEBUG(dbgs() << "Cannot expand 'frem' of type " << ReturnTy << ".\n");
+    return false;
+  }
+
+  Value *Ret;
+  if (ReturnTy->isFloatingPointTy())
+    Ret = Expander->buildFRem(I.getOperand(0), I.getOperand(1));
+  else {
+    auto VecTy = cast<FixedVectorType>(ReturnTy);
+
+    // This could use SplitBlockAndInsertForEachLane but the interface
+    // is a bit awkward for a constant number of elements and it will
+    // boil down to the same code.
+    // TODO Expand the FRem instruction only once and reuse the code.
+    Value *Nums = I.getOperand(0);
+    Value *Denums = I.getOperand(1);
+    Ret = PoisonValue::get(I.getType());
+    for (int I = 0, E = VecTy->getNumElements(); I != E; ++I) {
+      Value *Num = B.CreateExtractElement(Nums, I);
+      Value *Denum = B.CreateExtractElement(Denums, I);
+      Value *Rem = Expander->buildFRem(Num, Denum);
+      Ret = B.CreateInsertElement(Ret, Rem, I);
+    }
+  }
+
+  I.replaceAllUsesWith(Ret);
+  Ret->takeName(&I);
+  I.removeFromParent();
+  I.dropAllReferences();
+
+  return true;
+}
+
 /// Generate code to convert a fp number to integer, replacing FPToS(U)I with
 /// the generated code. This currently generates code similarly to compiler-rt's
 /// implementations.
@@ -604,6 +979,12 @@ static bool runImpl(Function &F, const TargetLowering &TLI) {
 
   for (auto &I : instructions(F)) {
     switch (I.getOpcode()) {
+    case Instruction::FRem:
+      if (TLI.shouldExpandFRemInIR()) {
+        Replace.push_back(&I);
+        Modified = true;
+      }
+      break;
     case Instruction::FPToUI:
     case Instruction::FPToSI: {
       // TODO: This pass doesn't handle scalable vectors.
@@ -654,8 +1035,10 @@ static bool runImpl(Function &F, const TargetLowering &TLI) {
 
   while (!Replace.empty()) {
     Instruction *I = Replace.pop_back_val();
-    if (I->getOpcode() == Instruction::FPToUI ||
-        I->getOpcode() == Instruction::FPToSI) {
+    if (I->getOpcode() == Instruction::FRem)
+      expandFRem(llvm::cast<BinaryOperator>(*I));
+    else if (I->getOpcode() == Instruction::FPToUI ||
+             I->getOpcode() == Instruction::FPToSI) {
       expandFPToI(I);
     } else {
       expandIToFP(I);
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUISelLowering.h b/llvm/lib/Target/AMDGPU/AMDGPUISelLowering.h
index c74dc7942f52c..b2b136c984bf4 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUISelLowering.h
+++ b/llvm/lib/Target/AMDGPU/AMDGPUISelLowering.h
@@ -387,6 +387,8 @@ class AMDGPUTargetLowering : public TargetLowering {
   MVT getFenceOperandTy(const DataLayout &DL) const override {
     return MVT::i32;
   }
+  bool shouldExpandFRemInIR() const override { return true; };
+
 };
 
 namespace AMDGPUISD {
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/frem.ll b/llvm/test/CodeGen/AMDGPU/GlobalISel/frem.ll
index e4e6c44b051c3..e40d9690d832b 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/frem.ll
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/frem.ll
@@ -7,59 +7,206 @@ define amdgpu_kernel void @frem_f16(ptr addrspace(1) %out, ptr addrspace(1) %in1
 ; CI:       ; %bb.0:
 ; CI-NEXT:    s_load_dwordx4 s[0:3], s[4:5], 0x9
 ; CI-NEXT:    s_load_dwordx2 s[4:5], s[4:5], 0xd
+; CI-NEXT:    ; implicit-def: $vgpr0
 ; CI-NEXT:    s_waitcnt lgkmcnt(0)
 ; CI-NEXT:    s_load_dword s2, s[2:3], 0x0
 ; CI-NEXT:    s_load_dword s3, s[4:5], 0x2
+; CI-NEXT:    s_mov_b32 s4, 1
 ; CI-NEXT:    s_waitcnt lgkmcnt(0)
-; CI-NEXT:    v_cvt_f32_f16_e32 v0, s2
-; CI-NEXT:    v_cvt_f32_f16_e32 v1, s3
-; CI-NEXT:    v_div_scale_f32 v2, s[2:3], v1, v1, v0
-; CI-NEXT:    v_div_scale_f32 v3, vcc, v0, v1, v0
-; CI-NEXT:    v_rcp_f32_e32 v4, v2
+; CI-NEXT:    v_cvt_f32_f16_e64 v2, |s2|
+; CI-NEXT:    v_cvt_f32_f16_e64 v1, |s3|
+; CI-NEXT:    v_cmp_ngt_f32_e32 vcc, v2, v1
+; CI-NEXT:    s_cbranch_vccz .LBB0_2
+; CI-NEXT:  ; %bb.1: ; %frem.else
+; CI-NEXT:    s_and_b32 s4, s2, 0xffff8000
+; CI-NEXT:    v_cmp_eq_f32_e32 vcc, v2, v1
+; CI-NEXT:    v_mov_b32_e32 v0, s4
+; CI-NEXT:    v_mov_b32_e32 v3, s2
+; CI-NEXT:    v_cndmask_b32_e32 v0, v3, v0, vcc
+; CI-NEXT:    s_mov_b32 s4, 0
+; CI-NEXT:  .LBB0_2: ; %Flow18
+; CI-NEXT:    s_xor_b32 s4, s4, 1
+; CI-NEXT:    s_and_b32 s4, s4, 1
+; CI-NEXT:    s_cmp_lg_u32 s4, 0
+; CI-NEXT:    s_cbranch_scc1 .LBB0_8
+; CI-NEXT:  ; %bb.3: ; %frem.compute
+; CI-NEXT:    v_frexp_mant_f32_e32 v3, v1
+; CI-NEXT:    v_frexp_exp_i32_f32_e32 v6, v1
+; CI-NEXT:    v_ldexp_f32_e64 v1, v3, 1
+; CI-NEXT:    v_div_scale_f32 v3, s[4:5], v1, v1, 1.0
+; CI-NEXT:    v_frexp_mant_f32_e32 v0, v2
+; CI-NEXT:    v_frexp_exp_i32_f32_e32 v5, v2
+; CI-NEXT:    v_add_i32_e32 v2, vcc, -1, v5
+; CI-NEXT:    v_ldexp_f32_e64 v4, v0, 11
+; CI-NEXT:    v_add_i32_e32 v0, vcc, -1, v6
+; CI-NEXT:    v_sub_i32_e32 v2, vcc, v2, v0
+; CI-NEXT:    v_div_scale_f32 v7, vcc, 1.0, v1, 1.0
+; CI-NEXT:    v_rcp_f32_e32 v8, v3
 ; CI-NEXT:    s_setreg_imm32_b32 hwreg(HW_REG_MODE, 4, 2), 3
-; CI-NEXT:    v_fma_f32 v5, -v2, v4, 1.0
-; CI-NEXT:    v_fma_f32 v4, v5, v4, v4
-; CI-NEXT:    v_mul_f32_e32 v5, v3, v4
-; CI-NEXT:    v_fma_f32 v6, -v2, v5, v3
-; CI-NEXT:    v_fma_f32 v5, v6, v4, v5
-; CI-NEXT:    v_fma_f32 v2, -v2, v5, v3
+; CI-NEXT:    v_fma_f32 v9, -v3, v8, 1.0
+; CI-NEXT:    v_fma_f32 v8, v9, v8, v8
+; CI-NEXT...
[truncated]

github-actions · 2025-03-12T16:35:18Z

✅ With the latest revision this PR passed the C/C++ code formatter.

arsenm

Missing test changes. We probably need to replace whatever few frem tests already exist. Also need checking of the IR output of the expansion

llvm/include/llvm/CodeGen/TargetLowering.h

llvm/lib/CodeGen/ExpandLargeFpConvert.cpp

Co-authored-by: Matt Arsenault <arsenm2@gmail.com>

This is meant as a preparation for PR llvm#130988 "[AMDGPU] Implement IR expansion for frem instruction" which implements the expansion of another instruction in this pass. The more general name seems more appropriate given this change and quite reasonable even without it. The renaming of the source files happens in the following commit for a better diff.

This is meant as a preparation for PR #130988 "[AMDGPU] Implement IR expansion for frem instruction" which implements the expansion of another instruction in this pass. The more general name seems more appropriate given this change and quite reasonable even without it.

This is meant as a preparation for PR llvm#130988 "[AMDGPU] Implement IR expansion for frem instruction" which implements the expansion of another instruction in this pass. The more general name seems more appropriate given this change and quite reasonable even without it. The renaming of the source files happens in the following commit for a better diff. * llvm/lib/CodeGen/ExpandLargeFpConvert.cpp:

Adjust line length

This commit adds a function for creating fma intrinsic calls to the IRBuilder. If the "IsFPConstrained" flag of the builder is set, the function creates a call to "experimental.constrained.fma" instead of "llvm.fma" . To support the creation of the constrained intrinsic, a function "CreateConstrainedFPIntrinsic" is introduced.

Use IRBuilder functions for creating Ldexp, FMA intrinsic call Use different condition for controlling frem expansion Update test Fixup after merge NaN handling fixes Update tests

Round out the AMDGPU codegen test to all the generations to cover the illegal f16 targets.

... only if optimizations are enabled. Disable Value Tracking analysis if optimizations are disabled.

frederik-h · 2025-06-02T16:17:44Z

@arsenm I was wondering what needs to be done before we can move forward with this PR? Do you think we need to add some afn-based optimization first (see https://github.com/llvm/llvm-project/pull/130988/files#r2032988749)?

True16 tests had to be disabled because of issue: [AMDGPU][True16] si-fix-sgpr-copies: invalid sgpr_lo16 copy destination llvm#144561

frederik-h · 2025-09-02T13:08:38Z

@arsenm ping

arsenm

LGTM with some test cleanups

llvm/lib/CodeGen/ExpandFp.cpp

llvm/test/CodeGen/AMDGPU/dagcombine-select-frem.ll

llvm/test/Transforms/ExpandFp/AMDGPU/frem.ll

Co-authored-by: Matt Arsenault <arsenm2@gmail.com>

* Add TODO concerning propagation of fmf * Remove attributes from ExpandFp test * Remove xfailed dagcombine-select-frem.ll and add TODO to corresponding test in dagcombine-select.ll.

The CI build has started failing because of this ; which has been there for quite some time: /home/gha/actions-runner/_work/llvm-project/llvm-project/llvm/lib/CodeGen/ExpandFp.cpp:1134:36: error: extra ';' outside of a function is incompatible with C++98 [-Werror,-Wc++98-compat-extra-sem 1134 | : TM(TM), OptLevel(OptLevel) {}; Remove it.

mikaelholmen · 2025-09-04T10:34:40Z

Hi @frederik-h

The following starts crashing with this patch:

llc -mtriple x86_64-unknown-linux-gnu bbi-110248.ll

It fails like

llc: ../lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp:5858: SDValue llvm::DAGTypeLegalizer::ExpandIntOp_XINT_TO_FP(SDNode *): Assertion `LC != RTLIB::UNKNOWN_LIBCALL && "Don't know how to expand this XINT_TO_FP!"' failed.
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0.	Program arguments: build-all/bin/llc -mtriple x86_64-unknown-linux-gnu bbi-110248.ll
1.	Running pass 'Function Pass Manager' on module 'bbi-110248.ll'.
2.	Running pass 'X86 DAG->DAG Instruction Selection' on function '@main'
 #0 0x0000556515aa33a6 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (build-all/bin/llc+0x7bd33a6)
 #1 0x0000556515aa0af5 llvm::sys::RunSignalHandlers() (build-all/bin/llc+0x7bd0af5)
 #2 0x0000556515aa41b9 SignalHandler(int, siginfo_t*, void*) Signals.cpp:0:0
 #3 0x00007f42a15a4990 __restore_rt (/lib64/libpthread.so.0+0x12990)
 #4 0x00007f429ef4452f raise (/lib64/libc.so.6+0x4e52f)
 #5 0x00007f429ef17e65 abort (/lib64/libc.so.6+0x21e65)
 #6 0x00007f429ef17d39 _nl_load_domain.cold.0 (/lib64/libc.so.6+0x21d39)
 #7 0x00007f429ef3ce86 (/lib64/libc.so.6+0x46e86)
 #8 0x00005565158c03f5 llvm::DAGTypeLegalizer::ExpandIntOp_XINT_TO_FP(llvm::SDNode*) LegalizeIntegerTypes.cpp:0:0
 #9 0x00005565158bf15f llvm::DAGTypeLegalizer::ExpandIntegerOperand(llvm::SDNode*, unsigned int) LegalizeIntegerTypes.cpp:0:0
#10 0x0000556515884c8d llvm::DAGTypeLegalizer::run() LegalizeTypes.cpp:0:0
#11 0x000055651588a449 llvm::SelectionDAG::LegalizeTypes() (build-all/bin/llc+0x79ba449)
#12 0x0000556515870350 llvm::SelectionDAGISel::CodeGenAndEmitDAG() (build-all/bin/llc+0x79a0350)
#13 0x000055651586ed9e llvm::SelectionDAGISel::SelectAllBasicBlocks(llvm::Function const&) (build-all/bin/llc+0x799ed9e)
#14 0x000055651586c371 llvm::SelectionDAGISel::runOnMachineFunction(llvm::MachineFunction&) (build-all/bin/llc+0x799c371)
#15 0x0000556515869d39 llvm::SelectionDAGISelLegacy::runOnMachineFunction(llvm::MachineFunction&) (build-all/bin/llc+0x7999d39)
#16 0x0000556514ab5ee7 llvm::MachineFunctionPass::runOnFunction(llvm::Function&) (build-all/bin/llc+0x6be5ee7)
#17 0x000055651500ee4c llvm::FPPassManager::runOnFunction(llvm::Function&) (build-all/bin/llc+0x713ee4c)
#18 0x00005565150170f2 llvm::FPPassManager::runOnModule(llvm::Module&) (build-all/bin/llc+0x71470f2)
#19 0x000055651500f918 llvm::legacy::PassManagerImpl::run(llvm::Module&) (build-all/bin/llc+0x713f918)
#20 0x00005565129cef87 compileModule(char**, llvm::LLVMContext&) llc.cpp:0:0
#21 0x00005565129cc500 main (build-all/bin/llc+0x4afc500)
#22 0x00007f429ef307e5 __libc_start_main (/lib64/libc.so.6+0x3a7e5)
#23 0x00005565129cb96e _start (build-all/bin/llc+0x4afb96e)
Abort (core dumped)

I don't know what x86 thinks about i224 as input arguments but this didn't crash before.
We see this all over the place in downstream fuzzy testing.
bbi-110248.ll.gz

bjope · 2025-09-04T11:32:58Z

llvm/lib/Passes/PassRegistry.def

+    [TM = TM](CodeGenOptLevel OL) {
+      return ExpandFpPass(TM, OL);
+    },
+    parseExpandFpOptions, "opt-level")


I think this could say "opt-level=N" to indicate that the syntax is like that, and not that opt-level should be replaced by a value. Similar to "guard=N" for bounds-checking.

bjope · 2025-09-04T11:33:16Z

llvm/lib/Passes/PassRegistry.def

@@ -720,6 +719,13 @@ FUNCTION_PASS_WITH_PARAMS(
    },
    parseBoundsCheckingOptions,
    "trap;rt;rt-abort;min-rt;min-rt-abort;merge;guard=N")
+    FUNCTION_PASS_WITH_PARAMS(


Wrong indents.

bjope · 2025-09-04T11:41:18Z

llvm/lib/CodeGen/ExpandFp.cpp

-    return runImpl(F, *TLI);
+    AssumptionCache *AC = nullptr;
+
+    if (OptLevel != CodeGenOptLevel::None || F.hasOptNone())


Isn't this a bit weird? (If OptLevel is O1/O2/O3 or if function attributes say O0.)

If F.hasOptNone() is true then normally skipFunction would return true so we wouldn't even get here.

It is invalid to use skipFunction in mandatory lowering passes

It is invalid to use skipFunction in mandatory lowering passes

Ah yes, the skipFunction handling at the top of this function is responsible for the regression that @mikaelholmen reported. I'll open a bugfix PR ...

mikaelholmen · 2025-09-04T12:04:27Z

Hi @frederik-h

The following starts crashing with this patch:

llc -mtriple x86_64-unknown-linux-gnu bbi-110248.ll

It fails like

llc: ../lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp:5858: SDValue llvm::DAGTypeLegalizer::ExpandIntOp_XINT_TO_FP(SDNode *): Assertion `LC != RTLIB::UNKNOWN_LIBCALL && "Don't know how to expand this XINT_TO_FP!"' failed.
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0.	Program arguments: build-all/bin/llc -mtriple x86_64-unknown-linux-gnu bbi-110248.ll
1.	Running pass 'Function Pass Manager' on module 'bbi-110248.ll'.
2.	Running pass 'X86 DAG->DAG Instruction Selection' on function '@main'
 #0 0x0000556515aa33a6 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (build-all/bin/llc+0x7bd33a6)
 #1 0x0000556515aa0af5 llvm::sys::RunSignalHandlers() (build-all/bin/llc+0x7bd0af5)
 #2 0x0000556515aa41b9 SignalHandler(int, siginfo_t*, void*) Signals.cpp:0:0
 #3 0x00007f42a15a4990 __restore_rt (/lib64/libpthread.so.0+0x12990)
 #4 0x00007f429ef4452f raise (/lib64/libc.so.6+0x4e52f)
 #5 0x00007f429ef17e65 abort (/lib64/libc.so.6+0x21e65)
 #6 0x00007f429ef17d39 _nl_load_domain.cold.0 (/lib64/libc.so.6+0x21d39)
 #7 0x00007f429ef3ce86 (/lib64/libc.so.6+0x46e86)
 #8 0x00005565158c03f5 llvm::DAGTypeLegalizer::ExpandIntOp_XINT_TO_FP(llvm::SDNode*) LegalizeIntegerTypes.cpp:0:0
 #9 0x00005565158bf15f llvm::DAGTypeLegalizer::ExpandIntegerOperand(llvm::SDNode*, unsigned int) LegalizeIntegerTypes.cpp:0:0
#10 0x0000556515884c8d llvm::DAGTypeLegalizer::run() LegalizeTypes.cpp:0:0
#11 0x000055651588a449 llvm::SelectionDAG::LegalizeTypes() (build-all/bin/llc+0x79ba449)
#12 0x0000556515870350 llvm::SelectionDAGISel::CodeGenAndEmitDAG() (build-all/bin/llc+0x79a0350)
#13 0x000055651586ed9e llvm::SelectionDAGISel::SelectAllBasicBlocks(llvm::Function const&) (build-all/bin/llc+0x799ed9e)
#14 0x000055651586c371 llvm::SelectionDAGISel::runOnMachineFunction(llvm::MachineFunction&) (build-all/bin/llc+0x799c371)
#15 0x0000556515869d39 llvm::SelectionDAGISelLegacy::runOnMachineFunction(llvm::MachineFunction&) (build-all/bin/llc+0x7999d39)
#16 0x0000556514ab5ee7 llvm::MachineFunctionPass::runOnFunction(llvm::Function&) (build-all/bin/llc+0x6be5ee7)
#17 0x000055651500ee4c llvm::FPPassManager::runOnFunction(llvm::Function&) (build-all/bin/llc+0x713ee4c)
#18 0x00005565150170f2 llvm::FPPassManager::runOnModule(llvm::Module&) (build-all/bin/llc+0x71470f2)
#19 0x000055651500f918 llvm::legacy::PassManagerImpl::run(llvm::Module&) (build-all/bin/llc+0x713f918)
#20 0x00005565129cef87 compileModule(char**, llvm::LLVMContext&) llc.cpp:0:0
#21 0x00005565129cc500 main (build-all/bin/llc+0x4afc500)
#22 0x00007f429ef307e5 __libc_start_main (/lib64/libc.so.6+0x3a7e5)
#23 0x00005565129cb96e _start (build-all/bin/llc+0x4afb96e)
Abort (core dumped)

I don't know what x86 thinks about i224 as input arguments but this didn't crash before. We see this all over the place in downstream fuzzy testing. bbi-110248.ll.gz

When running "expand-fp" it just says

Skipping pass 'Expand fp' on function main

with this patch, but before it expanded it to a lot of stuff.

frederik-h · 2025-09-04T13:10:42Z

Hi @frederik-h
The following starts crashing with this patch:

llc -mtriple x86_64-unknown-linux-gnu bbi-110248.ll

It fails like

llc: ../lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp:5858: SDValue llvm::DAGTypeLegalizer::ExpandIntOp_XINT_TO_FP(SDNode *): Assertion `LC != RTLIB::UNKNOWN_LIBCALL && "Don't know how to expand this XINT_TO_FP!"' failed.
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0.	Program arguments: build-all/bin/llc -mtriple x86_64-unknown-linux-gnu bbi-110248.ll
1.	Running pass 'Function Pass Manager' on module 'bbi-110248.ll'.
2.	Running pass 'X86 DAG->DAG Instruction Selection' on function '@main'
 #0 0x0000556515aa33a6 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (build-all/bin/llc+0x7bd33a6)
 #1 0x0000556515aa0af5 llvm::sys::RunSignalHandlers() (build-all/bin/llc+0x7bd0af5)
 #2 0x0000556515aa41b9 SignalHandler(int, siginfo_t*, void*) Signals.cpp:0:0
 #3 0x00007f42a15a4990 __restore_rt (/lib64/libpthread.so.0+0x12990)
 #4 0x00007f429ef4452f raise (/lib64/libc.so.6+0x4e52f)
 #5 0x00007f429ef17e65 abort (/lib64/libc.so.6+0x21e65)
 #6 0x00007f429ef17d39 _nl_load_domain.cold.0 (/lib64/libc.so.6+0x21d39)
 #7 0x00007f429ef3ce86 (/lib64/libc.so.6+0x46e86)
 #8 0x00005565158c03f5 llvm::DAGTypeLegalizer::ExpandIntOp_XINT_TO_FP(llvm::SDNode*) LegalizeIntegerTypes.cpp:0:0
 #9 0x00005565158bf15f llvm::DAGTypeLegalizer::ExpandIntegerOperand(llvm::SDNode*, unsigned int) LegalizeIntegerTypes.cpp:0:0
#10 0x0000556515884c8d llvm::DAGTypeLegalizer::run() LegalizeTypes.cpp:0:0
#11 0x000055651588a449 llvm::SelectionDAG::LegalizeTypes() (build-all/bin/llc+0x79ba449)
#12 0x0000556515870350 llvm::SelectionDAGISel::CodeGenAndEmitDAG() (build-all/bin/llc+0x79a0350)
#13 0x000055651586ed9e llvm::SelectionDAGISel::SelectAllBasicBlocks(llvm::Function const&) (build-all/bin/llc+0x799ed9e)
#14 0x000055651586c371 llvm::SelectionDAGISel::runOnMachineFunction(llvm::MachineFunction&) (build-all/bin/llc+0x799c371)
#15 0x0000556515869d39 llvm::SelectionDAGISelLegacy::runOnMachineFunction(llvm::MachineFunction&) (build-all/bin/llc+0x7999d39)
#16 0x0000556514ab5ee7 llvm::MachineFunctionPass::runOnFunction(llvm::Function&) (build-all/bin/llc+0x6be5ee7)
#17 0x000055651500ee4c llvm::FPPassManager::runOnFunction(llvm::Function&) (build-all/bin/llc+0x713ee4c)
#18 0x00005565150170f2 llvm::FPPassManager::runOnModule(llvm::Module&) (build-all/bin/llc+0x71470f2)
#19 0x000055651500f918 llvm::legacy::PassManagerImpl::run(llvm::Module&) (build-all/bin/llc+0x713f918)
#20 0x00005565129cef87 compileModule(char**, llvm::LLVMContext&) llc.cpp:0:0
#21 0x00005565129cc500 main (build-all/bin/llc+0x4afc500)
#22 0x00007f429ef307e5 __libc_start_main (/lib64/libc.so.6+0x3a7e5)
#23 0x00005565129cb96e _start (build-all/bin/llc+0x4afb96e)
Abort (core dumped)

I don't know what x86 thinks about i224 as input arguments but this didn't crash before. We see this all over the place in downstream fuzzy testing. bbi-110248.ll.gz

When running "expand-fp" it just says

Skipping pass 'Expand fp' on function main

with this patch, but before it expanded it to a lot of stuff.

That's probably related to the things @bjope mentioned. Looking into it right now.

frederik-h added the backend:AMDGPU label Mar 12, 2025

frederik-h added 2 commits March 12, 2025 12:36

Adjust some comments, remove include

193957c

clang-format changes

58a2edb

nikic changed the title ~~Implement IR expansion for frem instruction~~ [AMDGPU] Implement IR expansion for frem instruction Mar 12, 2025

arsenm reviewed Mar 13, 2025

View reviewed changes

arsenm added the floating-point Floating-point math label Mar 13, 2025

Apply suggestions from code review

f3f0a8f

Co-authored-by: Matt Arsenault <arsenm2@gmail.com>

frederik-h mentioned this pull request Mar 13, 2025

Rename ExpandLargeFpConvertPass to ExpandFpPass #131128

Merged

frederik-h added 3 commits March 18, 2025 05:38

Rename ExpandLargeFpConvert.{cpp,h}

ea49f29

clang-format changes

80301aa

Adjust line length

frederik-h mentioned this pull request Mar 18, 2025

DAG: Fix promote of half freeze #131844

Merged

frederik-h and others added 4 commits March 18, 2025 13:34

Review changes

5be756a

Use IRBuilder functions for creating Ldexp, FMA intrinsic call Use different condition for controlling frem expansion Update test Fixup after merge NaN handling fixes Update tests

AMDGPU: Add more freeze codegen tests

f38f769

DAG: Fix promote of half freeze

56369a8

Round out the AMDGPU codegen test to all the generations to cover the illegal f16 targets.

frederik-h force-pushed the frem-expansion branch from 7390437 to 56369a8 Compare March 18, 2025 17:41

arsenm and others added 6 commits March 19, 2025 02:55

Fix losing the actual freeze

6acfd7a

Adjust expand-fp pass DEBUG_TYPE

3d3d5fb

Use AssumptionCache

22ba910

... only if optimizations are enabled. Disable Value Tracking analysis if optimizations are disabled.

Adjust freeze.ll tst

05e90cd

Remove exceptional handling to dagcombine

485f446

Adjust test

8559b6d

frederik-h added 4 commits April 25, 2025 01:52

Add opt-level parameter to ExpandFp pass

54cf635

Merge remote-tracking branch 'upstream/main' into frem-expansion

2e2612d

Update tests

e9763c1

clang-format changes

a43871f

frederik-h requested a review from arsenm April 28, 2025 08:47

frederik-h added 5 commits June 16, 2025 04:43

Merge remote-tracking branch 'upstream' into frem-expansion

f4ad156

Adjust to isKnownNeverInfinity changes

8a72d2f

Merge remote-tracking branch 'upstream' into frem-expansion

7ee8aa9

Move check for ScalableVectorType earlier

d1bf4cb

Adjust tests

9ab5edd

True16 tests had to be disabled because of issue: [AMDGPU][True16] si-fix-sgpr-copies: invalid sgpr_lo16 copy destination llvm#144561

arsenm mentioned this pull request Jul 7, 2025

[NVPTX] Don't propagate ninf and nnan in lowerFREM #147125

Open

frederik-h added 3 commits August 15, 2025 06:17

Merge remote-tracking branch 'upstream/main' into frem-expansion

6cb045c

Merge remote-tracking branch 'upstream/main' into frem-expansion

74cf44c

Introduce approximate FRem expansion

fc6effa

frederik-h force-pushed the frem-expansion branch from fd9563d to fc6effa Compare August 25, 2025 16:24

arsenm approved these changes Sep 3, 2025

View reviewed changes

Apply suggestions from code review

3d81f67

Co-authored-by: Matt Arsenault <arsenm2@gmail.com>

frederik-h mentioned this pull request Sep 3, 2025

[AMDGPU] Add fold for "fneg (rint(x)) -> rint(fneg(x))" #156634

Closed

frederik-h added 2 commits September 3, 2025 06:08

Merge remote-tracking branch 'upstream/main' into frem-expansion

93e863f

Review changes

9b3f155

* Add TODO concerning propagation of fmf * Remove attributes from ExpandFp test * Remove xfailed dagcombine-select-frem.ll and add TODO to corresponding test in dagcombine-select.ll.

arsenm approved these changes Sep 3, 2025

View reviewed changes

frederik-h merged commit 47793f9 into llvm:main Sep 3, 2025
9 checks passed

frederik-h deleted the frem-expansion branch September 3, 2025 14:27

bjope reviewed Sep 4, 2025

View reviewed changes

[AMDGPU] Implement IR expansion for frem instruction #130988

[AMDGPU] Implement IR expansion for frem instruction #130988

Conversation

frederik-h commented Mar 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Mar 12, 2025

Uh oh!

github-actions bot commented Mar 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

arsenm left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

frederik-h commented Jun 2, 2025

Uh oh!

frederik-h commented Sep 2, 2025

Uh oh!

arsenm left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mikaelholmen commented Sep 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bjope Sep 4, 2025

Choose a reason for hiding this comment

Uh oh!

bjope Sep 4, 2025

Choose a reason for hiding this comment

Uh oh!

bjope Sep 4, 2025

Choose a reason for hiding this comment

Uh oh!

arsenm Sep 4, 2025

Choose a reason for hiding this comment

Uh oh!

frederik-h Sep 4, 2025

Choose a reason for hiding this comment

Uh oh!

mikaelholmen commented Sep 4, 2025

Uh oh!

frederik-h commented Sep 4, 2025

Uh oh!

Uh oh!

frederik-h commented Mar 12, 2025 •

edited

Loading

github-actions bot commented Mar 12, 2025 •

edited

Loading

mikaelholmen commented Sep 4, 2025 •

edited

Loading