[SDAG] Fix deferring constrained function calls #153029

spavloff · 2025-08-11T14:56:43Z

Selection DAG has a more sophisticated execution order representation than the simple sequence used in IR, so building the DAG can take into account specific properties of the nodes to better express possible parallelism. The existing implementation does this for constrained function calls, some of them are considered as independent, which can potentially improve the generated code. However this mechanism incorrectly implies that the calls with exception behavior 'ebIgnore' cannot raise floating-point exception. The purpose of this change is to fix the implementation.

In the current implementation, constrained function calls don't immediately update the DAG root. Instead, the DAG builder collects their output chains and flushes them when the root is required. Constrained function calls cannot be moved across calls of external functions and intrinsics that access floating-point environment, they work as barriers. Between the barriers, constrained function calls can be reordered, they may be considered independent from viewpoint of raising exceptions. For strictfp functions this is possible only if floating-point trapping is disabled.

This change introduces a new restriction - the calls with default exception handling cannot not be moved between strictfp function calls. Otherwise the exceptions raised by such call can disturb the expected exception sequence. It means that constrained function calls with strict exception behavior act as barriers for the calls with non-strict behavior and vice versa. Effectively it means that the entire sequence of constrained calls in IR is split into "strict" and "non-strict" regions, in which restrictions on the order of constrained calls are relaxed, but move from one region to another is not allowed. It agrees with the representation of strictfp code in high-level languages. For example, C/C++ strictfp code correspond to blocks where pragma STDC FENV_ACCESS ON is in effect, this restriction should help preserving the intended semantics.

When floating-point exception trapping is enabled, constrained intrinsics with 'ebStrict' cannot be reordered, their sequence must be identical to the original source order. The current implementation does not distinguish between strictfp modes with trapping and without it. This change make assumption that the trapping is disabled. It is not correct in the general case, but is compatible with the existing implementation.

Two test cases added in 21056a4 have been removed. These tests verified correct merging of constrained function calls that differed only in their exception handling:

%div = call double @llvm.experimental.constrained.fdiv.f64(double %a, double %b, metadata !"round.dynamic", metadata !"fpexcept.strict") #0
%div2 = call double @llvm.experimental.constrained.fdiv.f64(double %a, double %b, metadata !"round.dynamic", metadata !"fpexcept.ignore") #0

Previously one of such calls was eliminated by CSE during DAG construction. It was possible because the calls had the same income chain, as they were in the same group made by TokenFactor. Now these calls are no longer merged by CSE, and the test cases become unnecessary. Removal of one of such calls is still possible and profitable, but should be made elsewhere, probably at IR level.

Selection DAG has a more sophisticated execution order representation than the simple sequence used in IR, so building the DAG can take into account specific properties of the nodes to better express possible parallelism. The existing implementation does this for constrained function calls, some of them are considered as independent, which can potentially improve the generated code. However this mechanism incorrectly implies that the calls with exception behavior 'ebIgnore' cannot raise floating-point exception. The purpose of this change is to fix the implementation. In the current implementation, constrained function calls don't immediately update the DAG root. Instead, the DAG builder collects their output chains and flushes them when the root is required. Constrained function calls cannot be moved across calls of external functions and intrinsics that access floating-point environment, they work as barriers. Between the barriers, constrained function calls can be reordered, they may be considered independent from viewpoint of raising exceptions. For strictfp functions this is possible only if floating-point trapping is disabled. This change introduces a new restriction - the calls with default exception handling cannot not be moved between strictfp function calls. Otherwise the exceptions raised by such call can disturb the expected exception sequence. It means that constrained function calls with strict exception behavior act as barriers for the calls with non-strict behavior and vice versa. Effectively it means that the entire sequence of constrained calls in IR is split into "strict" and "non-strict" regions, in which restrictions on the order of constrained calls are relaxed, but move from one region to another is not allowed. It agrees with the representation of strictfp code in high-level languages. For example, C/C++ strictfp code correspond to blocks where pragma `STDC FENV_ACCESS ON` is in effect, this restriction should help preserving the intended semantics. When floating-point exception trapping is enabled, constrained intrinsics with 'ebStrinc' cannot be reordered, their sequence must be identical to the original source order. The current implementation does not distinguish between strictfp modes with trapping and without it. This change make assumption that the trapping is disabled. It is not correct in the general case, but is compatible with the existing implementation, which makes this change NFC.

llvmbot · 2025-08-11T14:57:20Z

@llvm/pr-subscribers-backend-x86

@llvm/pr-subscribers-llvm-selectiondag

Author: Serge Pavlov (spavloff)

Changes

Selection DAG has a more sophisticated execution order representation than the simple sequence used in IR, so building the DAG can take into account specific properties of the nodes to better express possible parallelism. The existing implementation does this for constrained function calls, some of them are considered as independent, which can potentially improve the generated code. However this mechanism incorrectly implies that the calls with exception behavior 'ebIgnore' cannot raise floating-point exception. The purpose of this change is to fix the implementation.

In the current implementation, constrained function calls don't immediately update the DAG root. Instead, the DAG builder collects their output chains and flushes them when the root is required. Constrained function calls cannot be moved across calls of external functions and intrinsics that access floating-point environment, they work as barriers. Between the barriers, constrained function calls can be reordered, they may be considered independent from viewpoint of raising exceptions. For strictfp functions this is possible only if floating-point trapping is disabled.

This change introduces a new restriction - the calls with default exception handling cannot not be moved between strictfp function calls. Otherwise the exceptions raised by such call can disturb the expected exception sequence. It means that constrained function calls with strict exception behavior act as barriers for the calls with non-strict behavior and vice versa. Effectively it means that the entire sequence of constrained calls in IR is split into "strict" and "non-strict" regions, in which restrictions on the order of constrained calls are relaxed, but move from one region to another is not allowed. It agrees with the representation of strictfp code in high-level languages. For example, C/C++ strictfp code correspond to blocks where pragma STDC FENV_ACCESS ON is in effect, this restriction should help preserving the intended semantics.

When floating-point exception trapping is enabled, constrained intrinsics with 'ebStrinc' cannot be reordered, their sequence must be identical to the original source order. The current implementation does not distinguish between strictfp modes with trapping and without it. This change make assumption that the trapping is disabled. It is not correct in the general case, but is compatible with the existing implementation, which makes this change NFC.

Full diff: https://github.com/llvm/llvm-project/pull/153029.diff

2 Files Affected:

(modified) llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp (+50-28)
(modified) llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.h (+5)

diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
index 7aa1fadd10dfc..c2c5d3caad30a 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
@@ -8280,6 +8280,54 @@ void SelectionDAGBuilder::visitIntrinsicCall(const CallInst &I,
   }
 }
 
+void SelectionDAGBuilder::pushFPOpOutChain(SDValue Result,
+                                           fp::ExceptionBehavior EB) {
+  assert(Result.getNode()->getNumValues() == 2);
+  SDValue OutChain = Result.getValue(1);
+  assert(OutChain.getValueType() == MVT::Other);
+
+  // Instead of updating the root immediately, push the produced chain to the
+  // appropriate list, deferring the update until the root is requested. In this
+  // case, the nodes from the lists are chained using TokenFactor, indicating
+  // that the operations are independent.
+  //
+  // In particular, the root is updated before any call that might access the
+  // floating-point environment, except for constrained intrinsics.
+  switch (EB) {
+  case fp::ExceptionBehavior::ebMayTrap:
+  case fp::ExceptionBehavior::ebIgnore:
+    // Floating-point exceptions produced by such operations are not intended
+    // to be observed, so the sequence of these operations does not need to be
+    // preserved.
+    //
+    // They however must not be mixed with the instructions that have strict
+    // exception behavior. Placing an operation with 'ebIgnore' behavior between
+    // 'ebStrict' operations could distort the observed exception behavior.
+    if (!PendingConstrainedFPStrict.empty()) {
+      assert(PendingConstrainedFP.empty());
+      updateRoot(PendingConstrainedFPStrict);
+    }
+    PendingConstrainedFP.push_back(OutChain);
+    break;
+  case fp::ExceptionBehavior::ebStrict:
+    // Floating-point exception produced by these operations may be observed, so
+    // they must be correctly chained. If trapping on FP exceptions is
+    // disabled, the exceptions can be observed only by functions that read
+    // exception flags, like 'llvm.get_fpenv' or 'fetestexcept'. It means that
+    // the order of operations is not significant between barriers.
+    //
+    // If trapping is enabled, each operation becomes an implicit observation
+    // point, so the operations must be sequenced according their original
+    // source order.
+    if (!PendingConstrainedFP.empty()) {
+      assert(PendingConstrainedFPStrict.empty());
+      updateRoot(PendingConstrainedFP);
+    }
+    PendingConstrainedFPStrict.push_back(OutChain);
+    // TODO: Add support for trapping-enabled scenarios.
+  }
+}
+
 void SelectionDAGBuilder::visitConstrainedFPIntrinsic(
     const ConstrainedFPIntrinsic &FPI) {
   SDLoc sdl = getCurSDLoc();
@@ -8293,32 +8341,6 @@ void SelectionDAGBuilder::visitConstrainedFPIntrinsic(
   for (unsigned I = 0, E = FPI.getNonMetadataArgCount(); I != E; ++I)
     Opers.push_back(getValue(FPI.getArgOperand(I)));
 
-  auto pushOutChain = [this](SDValue Result, fp::ExceptionBehavior EB) {
-    assert(Result.getNode()->getNumValues() == 2);
-
-    // Push node to the appropriate list so that future instructions can be
-    // chained up correctly.
-    SDValue OutChain = Result.getValue(1);
-    switch (EB) {
-    case fp::ExceptionBehavior::ebIgnore:
-      // The only reason why ebIgnore nodes still need to be chained is that
-      // they might depend on the current rounding mode, and therefore must
-      // not be moved across instruction that may change that mode.
-      [[fallthrough]];
-    case fp::ExceptionBehavior::ebMayTrap:
-      // These must not be moved across calls or instructions that may change
-      // floating-point exception masks.
-      PendingConstrainedFP.push_back(OutChain);
-      break;
-    case fp::ExceptionBehavior::ebStrict:
-      // These must not be moved across calls or instructions that may change
-      // floating-point exception masks or read floating-point exception flags.
-      // In addition, they cannot be optimized out even if unused.
-      PendingConstrainedFPStrict.push_back(OutChain);
-      break;
-    }
-  };
-
   const TargetLowering &TLI = DAG.getTargetLoweringInfo();
   EVT VT = TLI.getValueType(DAG.getDataLayout(), FPI.getType());
   SDVTList VTs = DAG.getVTList(VT, MVT::Other);
@@ -8346,7 +8368,7 @@ void SelectionDAGBuilder::visitConstrainedFPIntrinsic(
         !TLI.isFMAFasterThanFMulAndFAdd(DAG.getMachineFunction(), VT)) {
       Opers.pop_back();
       SDValue Mul = DAG.getNode(ISD::STRICT_FMUL, sdl, VTs, Opers, Flags);
-      pushOutChain(Mul, EB);
+      pushFPOpOutChain(Mul, EB);
       Opcode = ISD::STRICT_FADD;
       Opers.clear();
       Opers.push_back(Mul.getValue(1));
@@ -8377,7 +8399,7 @@ void SelectionDAGBuilder::visitConstrainedFPIntrinsic(
   }
 
   SDValue Result = DAG.getNode(Opcode, sdl, VTs, Opers, Flags);
-  pushOutChain(Result, EB);
+  pushFPOpOutChain(Result, EB);
 
   SDValue FPResult = Result.getValue(0);
   setValue(&FPI, FPResult);
diff --git a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.h b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.h
index e0835e6310357..e1d7804ce12cd 100644
--- a/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.h
+++ b/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.h
@@ -195,6 +195,11 @@ class SelectionDAGBuilder {
   /// Update root to include all chains from the Pending list.
   SDValue updateRoot(SmallVectorImpl<SDValue> &Pending);
 
+  /// Given a node representing a floating-point operation and its specified
+  /// exception behavior, this either updates the root or stores the node in
+  /// a list to be added to chains latter.
+  void pushFPOpOutChain(SDValue Result, fp::ExceptionBehavior EB);
+
   /// A unique monotonically increasing number used to order the SDNodes we
   /// create.
   unsigned SDNodeOrder;

kpneal · 2025-08-12T12:25:37Z

No tests?

kpneal · 2025-08-12T13:15:39Z

How are we handling rounding mode? When we're ignoring exceptions we might still be changing the rounding mode so we probably shouldn't be reordering constrained intrinsics without checking it as well.

spavloff · 2025-08-14T18:25:17Z

Trying to prepare a test I realized that pending calls must be flushed prior to node creation, the PR is updated accordingly.

However I could not create a test. I wanted to use code like:

declare float@join(float, float, float, float, float, float);
define float @dag(float %a, float %b) strictfp {
entry:
  %v0 = call float @llvm.experimental.constrained.fadd.f32(float %a, float %b, metadata !"round.dynamic", metadata !"fpexcept.ignore")
  %v1 = call float @llvm.experimental.constrained.fadd.f32(float %b, float %b, metadata !"round.dynamic", metadata !"fpexcept.ignore")
  %v2 = call float @llvm.experimental.constrained.fmul.f32(float %a, float %b, metadata !"round.dynamic", metadata !"fpexcept.strict")
  %v3 = call float @llvm.experimental.constrained.fmul.f32(float %a, float %a, metadata !"round.dynamic", metadata !"fpexcept.strict")
  %v4 = call float @llvm.experimental.constrained.fsub.f32(float %a, float %b, metadata !"round.dynamic", metadata !"fpexcept.strict")
  %v5 = call float @llvm.experimental.constrained.fsub.f32(float %b, float %a, metadata !"round.dynamic", metadata !"fpexcept.strict")

  %ret = call float @join(float %v0, float %v1, float %v2, float %v3, float %v4, float %v5)
  ret float %ret
}

This code should producee three TokenFactor groups, which would differ from how the DAG is constructed now. The DAG indeed has structure as expected but the produced code is not. Machine Instruction Scheduler reorders the code breaking the expected order. In the machine representation the code for "strict" calls looks like:

  %4:fr32 = MULSSrr %0:fr32(tied-def 0), %1:fr32, implicit $mxcsr
  %5:fr32 = MULSSrr %0:fr32(tied-def 0), %0:fr32, implicit $mxcsr

The instructions have implicit use of MXCSR, which is sufficient for dynamic rounding, but correct exception handling require also implicit def of MXCSR. I checked X86, AArch64, SystemZ, PPC and RISCV, - none have implicit def of a status register. Correct code generation for exception-aware code is not possible now.

How are we handling rounding mode? When we're ignoring exceptions we might still be changing the rounding mode so we probably shouldn't be reordering constrained intrinsics without checking it as well.

The is no problem with rounding mode. It can be changed only by some functions, which are already barriers for constrained function. Exception handling is special because the exceptions can be raised by any FP instructions.

kpneal · 2025-08-18T16:30:19Z

llvm/test/CodeGen/X86/fp-intrinsics-flags.ll

-  store double %result2, ptr %y
-  ret void
-}
-


What's the issue with this test? I'd hate for us to lose test coverage because a bug was discovered. Wouldn't a FIXME and a good comment be better?

I put back the tests and slightly modified them to check changes introduced by this patch. Previously the two operation in the test were merged because the nodes were identical, due to the same value of chain argument. If the instructions are sequenced, which this patch implements, the chain arguments become different and the nodes are not merged.

It is not an error that the nodes were merged previously, but it was made using CSE mechanism. With sequenced node it does not work. Merging such nodes require another mechanism, probably at IR level.

kpneal · 2025-08-18T16:33:49Z

llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp

+      assert(PendingConstrainedFPStrict.empty());
+      updateRoot(PendingConstrainedFP);
+    }
+    // TODO: Add support for trapping-enabled scenarios.


Are we 100% certain that this change won't create cases where instructions expecting traps to be off will be moved into a chunk of instructions that run with FP traps on?

If some part of a function runs with traps enabled and the other - with traps disabled, somewhere between them must be a function call, that changes FP control register. Such function is either an external function or an intrinsic with flag IntrInaccessibleMemOnly. Both are barriers for FP instructions.

If some part of a function runs with traps enabled and the other - with traps disabled, somewhere between them must be a function call, that changes FP control register. Such function is either an external function or an intrinsic with flag IntrInaccessibleMemOnly. Both are barriers for FP instructions.

Is this specific to SelectionDAG? I thought we didn't have barriers for FP instructions. And since we can have multiple function calls in a basic block, it's possible for traps to be enabled and disabled in different parts of the same basic block. Thus my question about certainty around movement of instructions.

kpneal · 2025-08-18T16:38:26Z

Rounding mode can be changed by an arbitrary function called from inside a basic block. Thus, we can't model it perfectly and need to take it into account when reordering.

And I still think tests are worthwhile even if all they do is demonstrate a bug. It seems like a good comment on the issue including a FIXME and perhaps a link to a bug report would be helpful. Do we have a stated policy on this?

spavloff · 2025-08-22T16:31:27Z

Rounding mode can be changed by an arbitrary function called from inside a basic block. Thus, we can't model it perfectly and need to take it into account when reordering.

What's wrong with the current way FP operations are modeled? Each operation that may depend on dynamic rounding has some memory effect, which indicates that this operation reads FP control register. Functions that changes the control register must have memory effect, that this function writes FP control register. Any instruction movement must honor this relations, just as for real memory access.

kpneal · 2025-08-29T17:51:29Z

Rounding mode can be changed by an arbitrary function called from inside a basic block. Thus, we can't model it perfectly and need to take it into account when reordering.

What's wrong with the current way FP operations are modeled? Each operation that may depend on dynamic rounding has some memory effect, which indicates that this operation reads FP control register. Functions that changes the control register must have memory effect, that this function writes FP control register. Any instruction movement must honor this relations, just as for real memory access.

How about a function where rounding mode matters and is different in different parts of the function calling a function that calls a function that changes the rounding mode? I don't think we can model that, especially if the functions are in different compilations.

spavloff requested review from arsenm, uweigand, mcinally, topperc, andykaylor and kpneal August 11, 2025 14:56

spavloff self-assigned this Aug 11, 2025

llvmbot added the llvm:SelectionDAG SelectionDAGISel as well label Aug 11, 2025

arsenm approved these changes Aug 12, 2025

View reviewed changes

Flush pending calls before a call emission

b977f50

llvmbot added the backend:X86 label Aug 14, 2025

kpneal reviewed Aug 18, 2025

View reviewed changes

Restore test and adapt it for this change

1f57d21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SDAG] Fix deferring constrained function calls #153029

[SDAG] Fix deferring constrained function calls #153029

Uh oh!

spavloff commented Aug 11, 2025 •

edited

Loading

Uh oh!

llvmbot commented Aug 11, 2025 •

edited

Loading

Uh oh!

kpneal commented Aug 12, 2025

Uh oh!

kpneal commented Aug 12, 2025

Uh oh!

spavloff commented Aug 14, 2025

Uh oh!

kpneal Aug 18, 2025

Uh oh!

spavloff Aug 22, 2025

Uh oh!

kpneal Aug 18, 2025

Uh oh!

spavloff Aug 22, 2025

Uh oh!

kpneal Aug 29, 2025

Uh oh!

kpneal commented Aug 18, 2025

Uh oh!

spavloff commented Aug 22, 2025

Uh oh!

kpneal commented Aug 29, 2025

Uh oh!

Uh oh!

[SDAG] Fix deferring constrained function calls #153029

Are you sure you want to change the base?

[SDAG] Fix deferring constrained function calls #153029

Uh oh!

Conversation

spavloff commented Aug 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Aug 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kpneal commented Aug 12, 2025

Uh oh!

kpneal commented Aug 12, 2025

Uh oh!

spavloff commented Aug 14, 2025

Uh oh!

kpneal Aug 18, 2025

Choose a reason for hiding this comment

Uh oh!

spavloff Aug 22, 2025

Choose a reason for hiding this comment

Uh oh!

kpneal Aug 18, 2025

Choose a reason for hiding this comment

Uh oh!

spavloff Aug 22, 2025

Choose a reason for hiding this comment

Uh oh!

kpneal Aug 29, 2025

Choose a reason for hiding this comment

Uh oh!

kpneal commented Aug 18, 2025

Uh oh!

spavloff commented Aug 22, 2025

Uh oh!

kpneal commented Aug 29, 2025

Uh oh!

Uh oh!

spavloff commented Aug 11, 2025 •

edited

Loading

llvmbot commented Aug 11, 2025 •

edited

Loading