Skip to content

Conversation

perlfu
Copy link
Contributor

@perlfu perlfu commented May 7, 2025

Remove recursion to avoid stack overflow on large CFGs.
Avoid worklist for hazard search within single MachineBasicBlock.
Ensure predecessors are visited for all state combinations.

@perlfu perlfu requested review from jayfoad and rampitec May 7, 2025 10:22
@llvmbot
Copy link
Member

llvmbot commented May 7, 2025

@llvm/pr-subscribers-backend-amdgpu

Author: Carl Ritson (perlfu)

Changes

Remove recursion to avoid stack overflow on large CFGs.
Avoid worklist for hazard search within single MachineBasicBlock.
Ensure predecessors are visited for all state combinations.


Full diff: https://github.com/llvm/llvm-project/pull/138841.diff

1 Files Affected:

  • (modified) llvm/lib/Target/AMDGPU/GCNHazardRecognizer.cpp (+48-31)
diff --git a/llvm/lib/Target/AMDGPU/GCNHazardRecognizer.cpp b/llvm/lib/Target/AMDGPU/GCNHazardRecognizer.cpp
index aaefe27b1324f..644fbb77a495a 100644
--- a/llvm/lib/Target/AMDGPU/GCNHazardRecognizer.cpp
+++ b/llvm/lib/Target/AMDGPU/GCNHazardRecognizer.cpp
@@ -436,42 +436,55 @@ using IsExpiredFn = function_ref<bool(const MachineInstr &, int WaitStates)>;
 using GetNumWaitStatesFn = function_ref<unsigned int(const MachineInstr &)>;
 
 // Search for a hazard in a block and its predecessors.
+// StateT must implement getHashValue().
 template <typename StateT>
 static bool
-hasHazard(StateT State,
+hasHazard(StateT InitialState,
           function_ref<HazardFnResult(StateT &, const MachineInstr &)> IsHazard,
           function_ref<void(StateT &, const MachineInstr &)> UpdateState,
-          const MachineBasicBlock *MBB,
-          MachineBasicBlock::const_reverse_instr_iterator I,
-          DenseSet<const MachineBasicBlock *> &Visited) {
-  for (auto E = MBB->instr_rend(); I != E; ++I) {
-    // No need to look at parent BUNDLE instructions.
-    if (I->isBundle())
-      continue;
+          const MachineBasicBlock *InitialMBB,
+          MachineBasicBlock::const_reverse_instr_iterator InitialI) {
+  SmallVector<std::pair<const MachineBasicBlock *, StateT>> Worklist;
+  DenseSet<std::pair<const MachineBasicBlock *, unsigned>> Visited;
+  const MachineBasicBlock *MBB = InitialMBB;
+  StateT State = InitialState;
+  auto I = InitialI;
+
+  for (;;) {
+    bool Expired = false;
+    for (auto E = MBB->instr_rend(); I != E; ++I) {
+      // No need to look at parent BUNDLE instructions.
+      if (I->isBundle())
+        continue;
 
-    switch (IsHazard(State, *I)) {
-    case HazardFound:
-      return true;
-    case HazardExpired:
-      return false;
-    default:
-      // Continue search
-      break;
-    }
+      auto Result = IsHazard(State, *I);
+      if (Result == HazardFound)
+        return true;
+      if (Result == HazardExpired) {
+        Expired = true;
+        break;
+      }
 
-    if (I->isInlineAsm() || I->isMetaInstruction())
-      continue;
+      if (I->isInlineAsm() || I->isMetaInstruction())
+        continue;
 
-    UpdateState(State, *I);
-  }
+      UpdateState(State, *I);
+    }
 
-  for (MachineBasicBlock *Pred : MBB->predecessors()) {
-    if (!Visited.insert(Pred).second)
-      continue;
+    if (!Expired) {
+      unsigned StateHash = State.getHashValue();
+      for (MachineBasicBlock *Pred : MBB->predecessors()) {
+        if (!Visited.insert(std::pair(Pred, StateHash)).second)
+          continue;
+        Worklist.emplace_back(Pred, State);
+      }
+    }
 
-    if (hasHazard(State, IsHazard, UpdateState, Pred, Pred->instr_rbegin(),
-                  Visited))
-      return true;
+    if (Worklist.empty())
+      break;
+
+    std::tie(MBB, State) = Worklist.pop_back_val();
+    I = MBB->instr_rbegin();
   }
 
   return false;
@@ -1624,6 +1637,10 @@ bool GCNHazardRecognizer::fixVALUPartialForwardingHazard(MachineInstr *MI) {
     SmallDenseMap<Register, int, 4> DefPos;
     int ExecPos = std::numeric_limits<int>::max();
     int VALUs = 0;
+
+    unsigned getHashValue() const {
+      return hash_combine(ExecPos, VALUs, hash_combine_range(DefPos));
+    }
   };
 
   StateType State;
@@ -1718,9 +1735,8 @@ bool GCNHazardRecognizer::fixVALUPartialForwardingHazard(MachineInstr *MI) {
       State.VALUs += 1;
   };
 
-  DenseSet<const MachineBasicBlock *> Visited;
   if (!hasHazard<StateType>(State, IsHazardFn, UpdateStateFn, MI->getParent(),
-                            std::next(MI->getReverseIterator()), Visited))
+                            std::next(MI->getReverseIterator())))
     return false;
 
   BuildMI(*MI->getParent(), MI, MI->getDebugLoc(),
@@ -1761,6 +1777,8 @@ bool GCNHazardRecognizer::fixVALUTransUseHazard(MachineInstr *MI) {
   struct StateType {
     int VALUs = 0;
     int TRANS = 0;
+
+    unsigned getHashValue() const { return hash_combine(VALUs, TRANS); }
   };
 
   StateType State;
@@ -1796,9 +1814,8 @@ bool GCNHazardRecognizer::fixVALUTransUseHazard(MachineInstr *MI) {
       State.TRANS += 1;
   };
 
-  DenseSet<const MachineBasicBlock *> Visited;
   if (!hasHazard<StateType>(State, IsHazardFn, UpdateStateFn, MI->getParent(),
-                            std::next(MI->getReverseIterator()), Visited))
+                            std::next(MI->getReverseIterator())))
     return false;
 
   // Hazard is observed - insert a wait on va_dst counter to ensure hazard is

if (!Expired) {
unsigned StateHash = State.getHashValue();
for (MachineBasicBlock *Pred : MBB->predecessors()) {
if (!Visited.insert(std::pair(Pred, StateHash)).second)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't safe in the presence of hash collisions.

As an alternative, could StateT have a "merge" or "min" function which takes two states and chooses the most constrained one, or the most contraining combination of them? Then if we reach the same predecessor twice by different paths we can merge the states and continue with the merged state.

Then perhaps the worklist could be a MapVector<const MachineBasicBlock *, StateT> with no need for a separate Visited set.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wrote this on the basis that is safer than current solution that never revisits a predecessor.

I had considered a solution similar to the one you propose.
But had performance concerns.

I'll work it through and test performance, but I do think we should consider a partial solution such as the one thing patch.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about this more and I concluded a merge or min state is not viable.
The min state would have to be "unsafer" of the two, which could create convergence problems.
Merging states would lead to visiting blocks with synthetic states that don't actually exist.
For some hazards it might not be possible to derive a merged state or determine the least safe state.

I have reworked the implementation to address hash collisions.
This now keeps a deduplicated vector or states, which is the only place states are retained.
Worklist and visited set retain indexes into this vector which are constant and unique for execution of the search.
Hashing is used to map states to vector indexes with a slow path provided for collisions.
(I have seen no collisions in my in depth testing.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about this more and I concluded a merge or min state is not viable.
The min state would have to be "unsafer" of the two, which could create convergence problems.
Merging states would lead to visiting blocks with synthetic states that don't actually exist.
For some hazards it might not be possible to derive a merged state or determine the least safe state.

I disagree with your conclusions, but I guess your way is OK for now, since you've implemented it. Just to be clear: the tradeoff you're making here is that you will potentially revisit the same MBB multiple times with different states, corresponding to different paths back through the CFG that can reach that block (even in the absence of loops).

@perlfu
Copy link
Contributor Author

perlfu commented May 22, 2025

Ping

@perlfu
Copy link
Contributor Author

perlfu commented Jun 2, 2025

Ping

@perlfu perlfu requested review from piotrAMD and arsenm June 16, 2025 01:16
@perlfu
Copy link
Contributor Author

perlfu commented Jun 30, 2025

Ping

@perlfu perlfu force-pushed the amdgpu-has-hazard-rework branch from c8c36ab to 4d6dded Compare July 15, 2025 10:39
@perlfu
Copy link
Contributor Author

perlfu commented Jul 16, 2025

Can anyone take another look at this?
I'd like to land this and #138663

Remove recursion to avoid stack overflow on large CFGs.
Avoid worklist for hazard search within single MachineBasicBlock.
Ensure predecessors are visited for all state combinations.
- Handle hashing collisions
@perlfu perlfu force-pushed the amdgpu-has-hazard-rework branch from 4d6dded to 9532454 Compare September 4, 2025 05:53
Copy link
Collaborator

@rovka rovka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks fine to me with a couple readability nits, but please wait a few days in case Jay wants to have another look.

MachineBasicBlock::const_reverse_instr_iterator InitialI) {
SmallVector<std::pair<const MachineBasicBlock *, unsigned>> Worklist;
SmallDenseSet<std::pair<const MachineBasicBlock *, unsigned>> Visited;
SmallVector<std::pair<unsigned, unsigned>, 1> Collisions;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Maybe typedef a HashTy and IdxTy, to make all these definitions easier to grok.

StateIdx = StateHash2Idx[StateHash];
if (LLVM_UNLIKELY(!StateT::isEqual(State, States[StateIdx]))) {
// Hash collision
auto Collision = llvm::find_if(Collisions, [&](auto &C) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're doing the previous nit, also spell out the type of C here (The other auto can stay).

Copy link
Contributor

@jayfoad jayfoad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the behaviour you've imlemented is OK but I'd really like to see the implementation cleaned up as mentioned inline.

Comment on lines 448 to 449
function_ref<HazardFnResult(StateT &, const MachineInstr &)> IsHazard,
function_ref<void(StateT &, const MachineInstr &)> UpdateState,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not something to address in the current patch, but I think this IsHazard/UpdateState interface is confusing because IsHazard also updates the state. So technically I think there is no need for the separate UpdateState function, it could just become part of what IsHazard does before returning NoHazardFound.

SmallDenseMap<unsigned, unsigned> StateHash2Idx;
SmallVector<StateT> States;

// States contains a vector of unique state structures.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels overcomplicated and it's also similar to what MapVector does internally. I think you could probably reimplement this with a MapVector<StateT, void>, using MapVector::getArrayRef to convert between states and StateIdxs. This would have the (I claim) benefit that StateT would have to support standard DenseMap traits instead of the less standard getHashValue and isEqual methods.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stepping back a bit, why do you want this level of indirection in the first place? Is it "just" a performance thing, to avoid copying around states which could be large, and avoid storing multiple copies of identical states? The MapVector solution would still store two copies of each state, since MapVector has a SmallVector of keys+values as well as a DenseMap of the keys.

Another possibility might be to have a DenseMap of pointers to states hashed by the state itself not the pointer value, a bit like this does for MachineInstrs: **

/// Special DenseMapInfo traits to compare MachineInstr* by *value* of the

Comment on lines +521 to +523
if (!Visited.insert(std::pair(Pred, StateIdx)).second)
continue;
Worklist.emplace_back(Pred, StateIdx);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of having separate Visited and Worklist, you could implement this with a single SetVector Worklist which you iterate through with a loop that keeps going while new items are inserted, like for (unsigned I = 0; I < Worklist.size(); ++I).

if (!Expired) {
unsigned StateHash = State.getHashValue();
for (MachineBasicBlock *Pred : MBB->predecessors()) {
if (!Visited.insert(std::pair(Pred, StateHash)).second)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about this more and I concluded a merge or min state is not viable.
The min state would have to be "unsafer" of the two, which could create convergence problems.
Merging states would lead to visiting blocks with synthetic states that don't actually exist.
For some hazards it might not be possible to derive a merged state or determine the least safe state.

I disagree with your conclusions, but I guess your way is OK for now, since you've implemented it. Just to be clear: the tradeoff you're making here is that you will potentially revisit the same MBB multiple times with different states, corresponding to different paths back through the CFG that can reach that block (even in the absence of loops).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants