Skip to content

Conversation

tzuralon
Copy link

@tzuralon tzuralon commented Jul 20, 2025

Fixes #148035

The problem
3 cpp files from #148035 that are based on cppreference coroutines tutorial (the actual code, and 2 minor changes on top of it - of using std::unique_ptr as byval argument[s] to coroutine) are failing to compile under (+assertions) clang-cl with the parameters: /c /EHa /std:c++latest /Os

Why?
It seems that there are 2 broken invariants during the coroutine split pass

  • Unwind edges out of a funclet pad must have the same unwind dest
  • Instruction does not dominate all uses!

How?
The solution here aims to restore those 2 invariants:

  • When the pass tries to add an invoke within another cleanuppad scope, it now unwinds to the same location
  • New unwind paths are introduced for unwinding nodes before CoroBegin, those are based on the original unwind pre-split paths

tzuralon added 3 commits July 20, 2025 08:17
…a new cleanupret instruction was added, make it unwind to the same edge of the genuine cleanupret
…Begin on async-exception ready function, it created instruction dominance issues. the fix is to clone those bbs and keep unwinding path that happens before CoroBegin
Copy link

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

@llvmbot
Copy link
Member

llvmbot commented Jul 20, 2025

@llvm/pr-subscribers-coroutines

@llvm/pr-subscribers-llvm-transforms

Author: None (tzuralon)

Changes

Fixes #148035


Patch is 502.39 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/149691.diff

7 Files Affected:

  • (modified) llvm/lib/Transforms/Coroutines/CoroFrame.cpp (+156)
  • (modified) llvm/lib/Transforms/Coroutines/CoroSplit.cpp (+10-1)
  • (added) llvm/test/Transforms/Coroutines/pr148035_0_coroutine.ll (+7609)
  • (added) llvm/test/Transforms/Coroutines/pr148035_1_coroutine_w_std_unique.ll (+866)
  • (added) llvm/test/Transforms/Coroutines/pr148035_2_coroutine_w_arg_w_move_ctor_and_deleter.ll (+727)
  • (added) llvm/test/Transforms/Coroutines/pr148035_inst_does_not_dominate.ll (+76)
  • (added) llvm/test/Transforms/Coroutines/pr148035_unwind_edges_not_having_same_dest.ll (+74)
diff --git a/llvm/lib/Transforms/Coroutines/CoroFrame.cpp b/llvm/lib/Transforms/Coroutines/CoroFrame.cpp
index a65d0fb54c212..a3ec5d3cb06ea 100644
--- a/llvm/lib/Transforms/Coroutines/CoroFrame.cpp
+++ b/llvm/lib/Transforms/Coroutines/CoroFrame.cpp
@@ -35,6 +35,7 @@
 #include "llvm/Transforms/Coroutines/SpillUtils.h"
 #include "llvm/Transforms/Coroutines/SuspendCrossingInfo.h"
 #include "llvm/Transforms/Utils/BasicBlockUtils.h"
+#include "llvm/Transforms/Utils/Cloning.h"
 #include "llvm/Transforms/Utils/Local.h"
 #include "llvm/Transforms/Utils/PromoteMemToReg.h"
 #include <algorithm>
@@ -974,6 +975,159 @@ static StructType *buildFrameType(Function &F, coro::Shape &Shape,
   return FrameTy;
 }
 
+// Fixer for the "Instruction does not dominate all uses!" bug
+// The fix consists of mapping problematic paths (where CoroBegin does not
+// dominate cleanup BBs) and clones them to 2 flows - the one that insertSpills
+// intended to create (using the spill) and another one, preserving the logics
+// of pre-splitting, which would be triggered if unwinding happened before
+// CoroBegin
+static void
+splitBasicBlocksNotDominatedByCoroBegin(const FrameDataInfo &FrameData,
+                                        coro::Shape &Shape, Function *F,
+                                        DominatorTree &DT) {
+  ValueToValueMapTy VMap;
+  DenseMap<BasicBlock *, BasicBlock *>
+ProcessedSpillBlockToAlternativeUnspilledBlockMap;
+  bool FunctionHasSomeBlockNotDominatedByCoroBegin;
+  SmallVector<BasicBlock *> SpillUserBlocks;
+  
+  for (const auto &E : FrameData.Spills) {
+    for (auto *U : E.second) {
+      if (std::find(SpillUserBlocks.begin(), SpillUserBlocks.end(),
+U->getParent()) == SpillUserBlocks.end()) {
+        SpillUserBlocks.push_back(U->getParent());
+      }
+    }
+  }
+  SpillUserBlocks.push_back(Shape.AllocaSpillBlock);
+
+  do {
+    FunctionHasSomeBlockNotDominatedByCoroBegin = false;
+    // We want to traverse the function post-order (predecessors first),
+    // and check dominance starting CoroBegin
+    bool HaveTraversedCoroBegin = false;
+    for (BasicBlock *CurrentBlock : ReversePostOrderTraversal<Function *>(F)) {
+      if (!HaveTraversedCoroBegin &&
+CurrentBlock != Shape.CoroBegin->getParent()) {
+        continue;
+      }
+      HaveTraversedCoroBegin = true;
+      
+      // Focus on 2 types of users that produce errors - those in
+      // FrameData.Spills, and decendants of Shape.AllocaSpillBlocks
+      if (!DT.dominates(Shape.CoroBegin, CurrentBlock) &&
+std::find(SpillUserBlocks.begin(), SpillUserBlocks.end(),
+CurrentBlock) != SpillUserBlocks.end()) {
+        // Mark another iteration of the loop is needed, to verify that no more
+        // dominance issues after current run
+        FunctionHasSomeBlockNotDominatedByCoroBegin = true;
+
+        // Clone (preserve) the current basic block, before it will be modified
+        // by insertSpills
+        auto UnspilledAlternativeBlock =
+CloneBasicBlock(CurrentBlock, VMap, ".unspilled_alternative", F);
+
+        // Remap the instructions, VMap here aggregates instructions across
+        // multiple BasicBlocks, and we assume that traversal is post-order,
+        // therefore successor blocks (for example instructions having funclet
+        // tags) will be mapped correctly to the new cloned cleanuppad
+        for (Instruction &I : *UnspilledAlternativeBlock) {
+          RemapInstruction(&I, VMap,
+RF_NoModuleLevelChanges | RF_IgnoreMissingLocals);
+        }
+
+        // Keep track between the processed spill basic block and the cloned
+        // alternative unspilled basic block Will help us fix instructions that
+        // their context is complex (for example cleanuppad of funclet is
+        // defined in another BB)
+        ProcessedSpillBlockToAlternativeUnspilledBlockMap[CurrentBlock] =
+UnspilledAlternativeBlock;
+
+        SmallVector<Instruction *> FixUpPredTerminators;
+
+        // Find the specific predecessors that does not dominated by
+        // Shape.CoroBegin We don't fix them here but later because it's not
+        // safe while using predecessors as iterator function
+        for (BasicBlock *Pred : predecessors(CurrentBlock)) {
+          if (!DT.dominates(Shape.CoroBegin, Pred)) {
+            FixUpPredTerminators.push_back(Pred->getTerminator());
+          }
+        }
+
+        // Fixups for current block terminator
+        const auto &CurrentBlockTerminator = CurrentBlock->getTerminator();
+
+        // If it's cleanupret, find the correspondant cleanuppad (use the map to
+        // find it)
+        if (auto CurrentBlockCleanupReturnTerminator =
+dyn_cast<CleanupReturnInst>(CurrentBlockTerminator)) {
+          BasicBlock *CBCPBB =
+CurrentBlockCleanupReturnTerminator->getCleanupPad()->getParent();
+          CleanupReturnInst *DBT = dyn_cast<CleanupReturnInst>(
+UnspilledAlternativeBlock->getTerminator());
+
+          // Again assuming post-order traversal - if we mapped the predecessing
+          // cleanuppad block before, we should find it here If not, do nothing
+          if (ProcessedSpillBlockToAlternativeUnspilledBlockMap.contains(
+CBCPBB)) {
+            Instruction *DCPr =
+&ProcessedSpillBlockToAlternativeUnspilledBlockMap[CBCPBB]
+->front();           
+            CleanupPadInst *DCP = cast<CleanupPadInst>(DCPr);
+            DBT->setCleanupPad(DCP);
+          }
+
+        // If it's a branch/invoke, keep track of its successors, we want to
+          // calculate dominance between CoroBegin and them also They might need
+          // clone as well
+        } else if (auto CurrentBlockBranchTerminator =
+dyn_cast<BranchInst>(CurrentBlockTerminator)) {
+          for (unsigned int successorIdx = 0;
+successorIdx < CurrentBlockBranchTerminator->getNumSuccessors();
+++successorIdx) {
+            SpillUserBlocks.push_back(
+CurrentBlockBranchTerminator->getSuccessor(successorIdx));
+          }
+        } else if (auto CurrentBlockInvokeTerminator =
+dyn_cast<InvokeInst>(CurrentBlockTerminator)) {
+          SpillUserBlocks.push_back(
+CurrentBlockInvokeTerminator->getUnwindDest());
+        } else {
+          report_fatal_error("Not implemented terminator for this specific "
+                             "instruction fixup in current block fixups");
+        }
+
+        // Fixups on the predecessors terminator - direct them to out untouched
+        // alternative block to break dominance error.
+        for (auto FixUpPredTerminator : FixUpPredTerminators) {
+          if (auto FixUpPredTerminatorInvoke =
+dyn_cast<InvokeInst>(FixUpPredTerminator)) {
+            FixUpPredTerminatorInvoke->setUnwindDest(UnspilledAlternativeBlock);
+          } else if (auto FixUpPredTerminatorBranch =
+dyn_cast<BranchInst>(FixUpPredTerminator)) {
+            for (unsigned int successorIdx = 0;
+                successorIdx < FixUpPredTerminatorBranch->getNumSuccessors();
+                ++successorIdx) {
+              if (CurrentBlock ==
+FixUpPredTerminatorBranch->getSuccessor(successorIdx)) {
+                FixUpPredTerminatorBranch->setSuccessor(
+successorIdx, UnspilledAlternativeBlock);
+              }
+            }
+          } else {
+            report_fatal_error("Not implemented terminator for this specific "
+                               "instruction in pred fixups");
+          }
+        }
+        
+        // We changed dominance tree, so recalculate.
+        DT.recalculate(*F);
+        continue;
+      }
+    }
+  } while (FunctionHasSomeBlockNotDominatedByCoroBegin);
+}
+
 // Replace all alloca and SSA values that are accessed across suspend points
 // with GetElementPointer from coroutine frame + loads and stores. Create an
 // AllocaSpillBB that will become the new entry block for the resume parts of
@@ -1053,6 +1207,8 @@ static void insertSpills(const FrameDataInfo &FrameData, coro::Shape &Shape) {
     return GEP;
   };
 
+  splitBasicBlocksNotDominatedByCoroBegin(FrameData, Shape, F, DT);
+
   for (auto const &E : FrameData.Spills) {
     Value *Def = E.first;
     auto SpillAlignment = Align(FrameData.getAlign(Def));
diff --git a/llvm/lib/Transforms/Coroutines/CoroSplit.cpp b/llvm/lib/Transforms/Coroutines/CoroSplit.cpp
index 64b33e46404f0..37b3d61c9f8f2 100644
--- a/llvm/lib/Transforms/Coroutines/CoroSplit.cpp
+++ b/llvm/lib/Transforms/Coroutines/CoroSplit.cpp
@@ -376,7 +376,16 @@ static void replaceUnwindCoroEnd(AnyCoroEndInst *End, const coro::Shape &Shape,
   // If coro.end has an associated bundle, add cleanupret instruction.
   if (auto Bundle = End->getOperandBundle(LLVMContext::OB_funclet)) {
     auto *FromPad = cast<CleanupPadInst>(Bundle->Inputs[0]);
-    auto *CleanupRet = Builder.CreateCleanupRet(FromPad, nullptr);
+
+    // If the terminator is an invoke, 
+    // set the cleanupret unwind destination the same as the other edges, to
+    // avoid validation errors
+    BasicBlock *UBB = nullptr;
+    if (auto II = dyn_cast<InvokeInst>(FromPad->getParent()->getTerminator())) {
+      UBB = II->getUnwindDest();
+    }
+
+    auto *CleanupRet = Builder.CreateCleanupRet(FromPad, UBB);
     End->getParent()->splitBasicBlock(End);
     CleanupRet->getParent()->getTerminator()->eraseFromParent();
   }
diff --git a/llvm/test/Transforms/Coroutines/pr148035_0_coroutine.ll b/llvm/test/Transforms/Coroutines/pr148035_0_coroutine.ll
new file mode 100644
index 0000000000000..2457bc3045cec
--- /dev/null
+++ b/llvm/test/Transforms/Coroutines/pr148035_0_coroutine.ll
@@ -0,0 +1,7609 @@
+; This is the cppreference example for coroutines, in llvm IR form, built with async exceptions flag.
+; crashed before fix because of the validation mismatch of Unwind edges out of a funclet pad must have the same unwind dest
+; RUN: opt < %s -passes=coro-split -S | FileCheck %s
+; CHECK: define
+
+; ModuleID = 'coroutine.cpp'
+source_filename = "coroutine.cpp"
+target datalayout = "e-m:w-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128"
+target triple = "x86_64-pc-windows-msvc19.38.33135"
+
+%"class.std::basic_ostream" = type { ptr, [4 x i8], i32, %"class.std::basic_ios" }
+%"class.std::basic_ios" = type { %"class.std::ios_base", ptr, ptr, i8 }
+%"class.std::ios_base" = type { ptr, i64, i32, i32, i32, i64, i64, ptr, ptr, ptr }
+%rtti.TypeDescriptor23 = type { ptr, ptr, [24 x i8] }
+%eh.CatchableType = type { i32, i32, i32, i32, i32, i32, i32 }
+%rtti.TypeDescriptor19 = type { ptr, ptr, [20 x i8] }
+%eh.CatchableTypeArray.2 = type { i32, [2 x i32] }
+%eh.ThrowInfo = type { i32, i32, i32, i32 }
+%rtti.CompleteObjectLocator = type { i32, i32, i32, i32, i32, i32 }
+%rtti.ClassHierarchyDescriptor = type { i32, i32, i32, i32 }
+%rtti.BaseClassDescriptor = type { i32, i32, i32, i32, i32, i32, i32 }
+%"struct.std::nostopstate_t" = type { i8 }
+%rtti.TypeDescriptor26 = type { ptr, ptr, [27 x i8] }
+%rtti.TypeDescriptor22 = type { ptr, ptr, [23 x i8] }
+%eh.CatchableTypeArray.5 = type { i32, [5 x i32] }
+%"union.std::error_category::_Addr_storage" = type { i64 }
+%rtti.TypeDescriptor35 = type { ptr, ptr, [36 x i8] }
+%rtti.TypeDescriptor24 = type { ptr, ptr, [25 x i8] }
+%"struct.std::_Fake_allocator" = type { i8 }
+%rtti.TypeDescriptor30 = type { ptr, ptr, [31 x i8] }
+%eh.CatchableTypeArray.3 = type { i32, [3 x i32] }
+%struct.awaitable = type { ptr }
+%struct.task = type { i8 }
+%"struct.task::promise_type" = type { i8 }
+%"struct.std::suspend_never" = type { i8 }
+%"class.std::thread::id" = type { i32 }
+%"struct.std::coroutine_handle" = type { ptr }
+%"struct.std::coroutine_handle.0" = type { ptr }
+%"class.std::basic_ostream<char>::sentry" = type { %"class.std::basic_ostream<char>::_Sentry_base", i8 }
+%"class.std::basic_ostream<char>::_Sentry_base" = type { ptr }
+%"class.std::runtime_error" = type { %"class.std::exception" }
+%"class.std::exception" = type { ptr, %struct.__std_exception_data }
+%struct.__std_exception_data = type { ptr, i8 }
+%"class.std::jthread" = type { %"class.std::thread", %"class.std::stop_source" }
+%"class.std::thread" = type { %struct._Thrd_t }
+%struct._Thrd_t = type { ptr, i32 }
+%"class.std::stop_source" = type { ptr }
+%class.anon = type { %"struct.std::coroutine_handle" }
+%"class.std::unique_ptr" = type { %"class.std::_Compressed_pair" }
+%"class.std::_Compressed_pair" = type { ptr }
+%"struct.std::_Stop_state" = type { %"struct.std::atomic", %"struct.std::atomic", %"class.std::_Locked_pointer", %"struct.std::atomic.6", i32 }
+%"struct.std::atomic" = type { %"struct.std::_Atomic_integral_facade" }
+%"struct.std::_Atomic_integral_facade" = type { %"struct.std::_Atomic_integral" }
+%"struct.std::_Atomic_integral" = type { %"struct.std::_Atomic_storage" }
+%"struct.std::_Atomic_storage" = type { %"struct.std::_Atomic_padded" }
+%"struct.std::_Atomic_padded" = type { i32 }
+%"class.std::_Locked_pointer" = type { %"struct.std::atomic.1" }
+%"struct.std::atomic.1" = type { %"struct.std::_Atomic_integral_facade.2" }
+%"struct.std::_Atomic_integral_facade.2" = type { %"struct.std::_Atomic_integral.3" }
+%"struct.std::_Atomic_integral.3" = type { %"struct.std::_Atomic_storage.4" }
+%"struct.std::_Atomic_storage.4" = type { %"struct.std::_Atomic_padded.5" }
+%"struct.std::_Atomic_padded.5" = type { i64 }
+%"struct.std::atomic.6" = type { %"struct.std::_Atomic_pointer" }
+%"struct.std::_Atomic_pointer" = type { %"struct.std::_Atomic_storage.7" }
+%"struct.std::_Atomic_storage.7" = type { %"struct.std::_Atomic_padded.8" }
+%"struct.std::_Atomic_padded.8" = type { ptr }
+%"struct.std::_Exact_args_t" = type { i8 }
+%"struct.std::_Zero_then_variadic_args_t" = type { i8 }
+%"class.std::tuple" = type { %"struct.std::_Tuple_val" }
+%"struct.std::_Tuple_val" = type { %class.anon }
+%"class.std::_Stop_callback_base" = type { ptr, ptr, ptr, ptr }
+%"class.std::basic_streambuf" = type { ptr, ptr, ptr, ptr, ptr, ptr, ptr, ptr, ptr, i32, i32, ptr, ptr, ptr }
+%"class.std::ios_base::failure" = type { %"class.std::system_error" }
+%"class.std::system_error" = type { %"class.std::_System_error" }
+%"class.std::_System_error" = type { %"class.std::runtime_error", %"class.std::error_code" }
+%"class.std::error_code" = type { i32, ptr }
+%"class.std::basic_string" = type { %"class.std::_Compressed_pair.10" }
+%"class.std::_Compressed_pair.10" = type { %"class.std::_String_val" }
+%"class.std::_String_val" = type { %"union.std::_String_val<std::_Simple_types<char>>::_Bxty", i64, i64 }
+%"union.std::_String_val<std::_Simple_types<char>>::_Bxty" = type { ptr, [8 x i8] }
+%"class.std::error_condition" = type { i32, ptr }
+%"struct.std::_Fake_proxy_ptr_impl" = type { i8 }
+%"class.std::bad_array_new_length" = type { %"class.std::bad_alloc" }
+%"class.std::bad_alloc" = type { %"class.std::exception" }
+%"class.std::error_category" = type { ptr, %"union.std::error_category::_Addr_storage" }
+%"class.std::allocator" = type { i8 }
+%"struct.std::_One_then_variadic_args_t" = type { i8 }
+%class.anon.11 = type { i8 }
+
+$"?get_return_object@promise_type@task@@QEAA?AU2@XZ" = comdat any
+
+$"?initial_suspend@promise_type@task@@QEAA?AUsuspend_never@std@@XZ" = comdat any
+
+$"?await_ready@suspend_never@std@@QEBA_NXZ" = comdat any
+
+$"?await_suspend@suspend_never@std@@QEBAXU?$coroutine_handle@X@2@@Z" = comdat any
+
+$"?from_address@?$coroutine_handle@Upromise_type@task@@@std@@SA?AU12@QEAX@Z" = comdat any
+
+$"??B?$coroutine_handle@Upromise_type@task@@@std@@QEBA?AU?$coroutine_handle@X@1@XZ" = comdat any
+
+$"?await_resume@suspend_never@std@@QEBAXXZ" = comdat any
+
+$"??$?6U?$char_traits@D@std@@@std@@YAAEAV?$basic_ostream@DU?$char_traits@D@std@@@0@AEAV10@D@Z" = comdat any
+
+$"??$?6DU?$char_traits@D@std@@@std@@YAAEAV?$basic_ostream@DU?$char_traits@D@std@@@0@AEAV10@Vid@thread@0@@Z" = comdat any
+
+$"??$?6U?$char_traits@D@std@@@std@@YAAEAV?$basic_ostream@DU?$char_traits@D@std@@@0@AEAV10@PEBD@Z" = comdat any
+
+$"?get_id@this_thread@std@@YA?AVid@thread@2@XZ" = comdat any
+
+$"?return_void@promise_type@task@@QEAAXXZ" = comdat any
+
+$"?unhandled_exception@promise_type@task@@QEAAXXZ" = comdat any
+
+$"?final_suspend@promise_type@task@@QEAA?AUsuspend_never@std@@XZ" = comdat any
+
+$"??0jthread@std@@QEAA@XZ" = comdat any
+
+$"??1jthread@std@@QEAA@XZ" = comdat any
+
+$"??0?$coroutine_handle@Upromise_type@task@@@std@@QEAA@XZ" = comdat any
+
+$"?from_address@?$coroutine_handle@X@std@@SA?AU12@QEAX@Z" = comdat any
+
+$"??0?$coroutine_handle@X@std@@QEAA@XZ" = comdat any
+
+$"??0id@thread@std@@AEAA@I@Z" = comdat any
+
+$"?joinable@jthread@std@@QEBA_NXZ" = comdat any
+
+$"??0runtime_error@std@@QEAA@PEBD@Z" = comdat any
+
+$"??0runtime_error@std@@QEAA@AEBV01@@Z" = comdat any
+
+$"??0exception@std@@QEAA@AEBV01@@Z" = comdat any
+
+$"??1runtime_error@std@@UEAA@XZ" = comdat any
+
+$"??4jthread@std@@QEAAAEAV01@$$QEAV01@@Z" = comdat any
+
+$"?get_id@jthread@std@@QEBA?AVid@thread@2@XZ" = comdat any
+
+$"?joinable@thread@std@@QEBA_NXZ" = comdat any
+
+$"??0exception@std@@QEAA@QEBD@Z" = comdat any
+
+$"??1exception@std@@UEAA@XZ" = comdat any
+
+$"??_Gruntime_error@std@@UEAAPEAXI@Z" = comdat any
+
+$"?what@exception@std@@UEBAPEBDXZ" = comdat any
+
+$"??_Gexception@std@@UEAAPEAXI@Z" = comdat any
+
+$"??0thread@std@@QEAA@XZ" = comdat any
+
+$"??0stop_source@std@@QEAA@XZ" = comdat any
+
+$"??1stop_source@std@@QEAA@XZ" = comdat any
+
+$"??1thread@std@@QEAA@XZ" = comdat any
+
+$"??0_Stop_state@std@@QEAA@XZ" = comdat any
+
+$"??0?$_Locked_pointer@V_Stop_callback_base@std@@@std@@QEAA@XZ" = comdat any
+
+$"??0?$_Atomic_storage@I$03@std@@QEAA@I@Z" = comdat any
+
+$"??0?$atomic@_K@std@@QEAA@XZ" = comdat any
+
+$"??0?$_Atomic_storage@PEBV_Stop_callback_base@std@@$07@std@@QEAA@QEBV_Stop_callback_base@1@@Z" = comdat any
+
+$"??$?0U_Exact_args_t@std@@$0A@@?$tuple@$$V@std@@QEAA@U_Exact_args_t@1@@Z" = comdat any
+
+$"?resume@?$coroutine_handle@X@std@@QEBAXXZ" = comdat any
+
+$"?fetch_sub@?$_Atomic_integral_facade@I@std@@QEAAIIW4memory_order@2@@Z" = comdat any
+
+$"?fetch_add@?$_Atomic_integral@I$03@std@@QEAAIIW4memory_order@2@@Z" = comdat any
+
+$"?_Negate@?$_Atomic_integral_facade@I@std@@SAII@Z" = comdat any
+
+$_Check_memory_order = comdat any
+
+$"??$_Atomic_address_as@JU?$_Atomic_padded@I@std@@@std@@YAPECJAEAU?$_Atomic_padded@I@0@@Z" = comdat any
+
+$"?_Try_cancel_and_join@jthread@std@@AEAAXXZ" = comdat any
+
+$"??4thread@std@@QEAAAEAV01@$$QEAV01@@Z" = comdat any
+
+$"??4stop_source@std@@QEAAAEAV01@$$QEAV01@@Z" = comdat any
+
+$"?request_stop@stop_source@std@@QEAA_NXZ" = comdat any
+
+$"?join@thread@std@@QEAAXXZ" = comdat any
+
+$"?_Request_stop@_Stop_state@std@@QEAA_NXZ" = comdat any
+
+$"?fetch_or@?$_Atomic_integral@I$03@std@@QEAAIIW4memory_order@2@@Z" = comdat any
+
+$"?_Lock_and_load@?$_Locked_pointer@V_Stop_callback_base@std@@@std@@QEAAPEAV_Stop_callback_base@2@XZ" = comdat any
+
+$"?store@?$_Atomic_storage@PEBV_Stop_callback_base@std@@$07@std@@QEAAXQEBV_Stop_callback_base@2@W4memory_order@2@@Z" = comdat any
+
+$"?notify_all@?$_Atomic_storage@PEBV_Stop_callback_base@std@@$07@std@@QEAAXXZ" = comdat any
+
+$"?_Store_and_unlock@?$_Locked_pointer@V_Stop_callback_base@std@@@std@@QEAAXQEAV_Stop_callback_base@2@@Z" = comdat any
+
+$"??$exchange@PEAV_Stop_callback_base@std@@$$T@std@@YAPEAV_Stop_callback_base@0@AEAPEAV10@$$QEA$$T@Z" = comdat any
+
+$"?load@?$_Atomic_storage@_K$07@std@@QEBA_KW4memory_order@2@@Z" = comdat any
+
+$"?compare_exchange_weak@?$atomic@_K@std@@QEAA_NAEA_K_K@Z" = comdat any
+
+$"?wait@?$_Atomic_storage@_K$07@std@@QEBAX_KW4memory_order@2@@Z" = comdat any
+
+$"??$_Atomic_address_as@_JU?$_Atomic_padded@_K@std@@@std@@YAPED_JAEBU?$_Atomic_padded@_K@0@@Z" = comdat any
+
+$"?compare_exchange_strong@?$_Atomic_storage@_K$07@std@@QEAA_NAEA_K_KW4memory_order@2@@Z" = comdat any
+
+$"??$_Atomic_reinterpret_as@_J_K@std@@YA_JAEB_K@Z" = comdat any
+
+$"??$_Atomic_address_as@_JU?$_Atomic_padded@_K@std@@@std@@YAPEC_JAEAU?$_Atomic_padded@_K@0@@Z" = comdat any
+
+$"??$_Atomic_wait_direct@_K_J@std@@YAXQEBU?$_Atomic_storage@_K$07@0@_JW4memory_order@0@@Z" = comdat any
+
+$"??$_Atomic_address_as@_JU?$_Atomic_padded@PEBV_Stop_callback_base@std@@@std@@@std@@YAPEC_JAEAU?$_Atomic_padded@PEBV_Stop_callback_base@std@@@0@@Z" = comdat any
+
+$"??$_Atomic_reinterpret_as@_JPEBV_Stop_callback_base@std@@@std@@YA_JAEBQEBV_Stop_callback_base@0@@Z" = comdat any
+
+$"?store@?$_Atomic_storage@PEBV_Stop_callback_base@std@@$07@std@@QEAAXQEBV_Stop_callback_base@2@@Z" = comdat any
+
+$"?exchange@?$_Atomic_storage@_K$07@std@@QEAA_K_KW4memory_order@2@@Z" = comdat any
+
+$"?notify_all@?$_Atomic_storage@_K...
[truncated]

@tzuralon tzuralon changed the title [Coroutines] fix coroutines + std::unique_ptr with async exceptions validation errors [Coroutines] fix coroutines + std::unique_ptr with async exceptions validation errors [WIP] Jul 20, 2025
@tzuralon tzuralon changed the title [Coroutines] fix coroutines + std::unique_ptr with async exceptions validation errors [WIP] [WIP] [Coroutines] fix coroutines + std::unique_ptr with async exceptions validation errors Jul 20, 2025
Copy link
Contributor

@NewSigma NewSigma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some comments on styles. Could you please explain the problem and your solution in the pr description?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please further reduce this test? A test file of ~7000 lines is much larger than the others.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a non-reduced version, I guess I can remove it

continue;
}
}
} while (FunctionHasSomeBlockNotDominatedByCoroBegin);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks expensive. Is it possible to avoid this do-while?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, this while was redundant and I removed it.


; Function Attrs: nocallback nofree nosync nounwind willreturn memory(argmem: read)
declare token @llvm.coro.id(i32, ptr readnone, ptr nocapture readonly, ptr) #2

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intrinsic function declarations are redundant. Could you please drop them?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure

@tzuralon tzuralon changed the title [WIP] [Coroutines] fix coroutines + std::unique_ptr with async exceptions validation errors [Coroutines] fix coroutines + std::unique_ptr with async exceptions validation errors Aug 3, 2025
@efriedma-quic efriedma-quic requested review from rnk and ChuanqiXu9 August 7, 2025 16:50
Copy link

github-actions bot commented Aug 7, 2025

⚠️ C/C++ code formatter, clang-format found issues in your code. ⚠️

You can test this locally with the following command:
git-clang-format --diff HEAD~1 HEAD --extensions cpp -- llvm/lib/Transforms/Coroutines/CoroFrame.cpp llvm/lib/Transforms/Coroutines/CoroSplit.cpp
View the diff from clang-format here.
diff --git a/llvm/lib/Transforms/Coroutines/CoroFrame.cpp b/llvm/lib/Transforms/Coroutines/CoroFrame.cpp
index bfe69f589..133f3b400 100644
--- a/llvm/lib/Transforms/Coroutines/CoroFrame.cpp
+++ b/llvm/lib/Transforms/Coroutines/CoroFrame.cpp
@@ -987,21 +987,21 @@ static void finalizeBasicBlockCloneAndTrackSuccessors(
   //  it will use VMap to do so
   // in addition, it will add the node successors to SuccessorBlocksSet
 
-        // Remap the instructions, VMap here aggregates instructions across
-        // multiple BasicBlocks, and we assume that traversal is reversed post-order,
-        // therefore successor blocks (for example instructions having funclet
-        // tags) will be mapped correctly to the new cloned cleanuppad
-        for (Instruction &ClonedBlockInstruction : *ClonedBlock) {
-          RemapInstruction(&ClonedBlockInstruction, VMap,
-RF_NoModuleLevelChanges | RF_IgnoreMissingLocals);
-        }
+  // Remap the instructions, VMap here aggregates instructions across
+  // multiple BasicBlocks, and we assume that traversal is reversed post-order,
+  // therefore successor blocks (for example instructions having funclet
+  // tags) will be mapped correctly to the new cloned cleanuppad
+  for (Instruction &ClonedBlockInstruction : *ClonedBlock) {
+    RemapInstruction(&ClonedBlockInstruction, VMap,
+                     RF_NoModuleLevelChanges | RF_IgnoreMissingLocals);
+  }
 
-        const auto &InitialBlockTerminator = InitialBlock->getTerminator();
+  const auto &InitialBlockTerminator = InitialBlock->getTerminator();
 
-        // If it's cleanupret, find the correspondant cleanuppad (use the VMap to
-        // find it).
-        if (auto *InitialBlockTerminatorCleanupReturn =
-dyn_cast<CleanupReturnInst>(InitialBlockTerminator)) {
+  // If it's cleanupret, find the correspondant cleanuppad (use the VMap to
+  // find it).
+  if (auto *InitialBlockTerminatorCleanupReturn =
+          dyn_cast<CleanupReturnInst>(InitialBlockTerminator)) {
     // if none found do nothing
     if (VMap.find(InitialBlockTerminatorCleanupReturn->getCleanupPad()) ==
         VMap.end()) {
@@ -1009,32 +1009,32 @@ dyn_cast<CleanupReturnInst>(InitialBlockTerminator)) {
     }
 
     auto *ClonedBlockTerminatorCleanupReturn =
-cast<CleanupReturnInst>(ClonedBlock->getTerminator());
+        cast<CleanupReturnInst>(ClonedBlock->getTerminator());
 
-          // Assuming reversed post-order traversal
+    // Assuming reversed post-order traversal
     llvm::Value *ClonedBlockCleanupPadValue =
         VMap[InitialBlockTerminatorCleanupReturn->getCleanupPad()];
     auto *ClonedBlockCleanupPad =
-cast<CleanupPadInst>(ClonedBlockCleanupPadValue);
-            ClonedBlockTerminatorCleanupReturn->setCleanupPad(ClonedBlockCleanupPad);
-          
-        // If it's a branch/invoke, keep track of its successors, we want to
-          // calculate dominance between CoroBegin and them also
-        } else if (auto *InitialBlockTerminatorBranch =
-dyn_cast<BranchInst>(InitialBlockTerminator)) {
-          for (unsigned int successorIdx = 0;
-successorIdx < InitialBlockTerminatorBranch->getNumSuccessors();
-++successorIdx) {
-            SuccessorBlocksSet.insert(
+        cast<CleanupPadInst>(ClonedBlockCleanupPadValue);
+    ClonedBlockTerminatorCleanupReturn->setCleanupPad(ClonedBlockCleanupPad);
+
+    // If it's a branch/invoke, keep track of its successors, we want to
+    // calculate dominance between CoroBegin and them also
+  } else if (auto *InitialBlockTerminatorBranch =
+                 dyn_cast<BranchInst>(InitialBlockTerminator)) {
+    for (unsigned int successorIdx = 0;
+         successorIdx < InitialBlockTerminatorBranch->getNumSuccessors();
+         ++successorIdx) {
+      SuccessorBlocksSet.insert(
           InitialBlockTerminatorBranch->getSuccessor(successorIdx));
-          }
-        } else if (auto *InitialBlockTerminatorInvoke =
-dyn_cast<InvokeInst>(InitialBlockTerminator)) {
+    }
+  } else if (auto *InitialBlockTerminatorInvoke =
+                 dyn_cast<InvokeInst>(InitialBlockTerminator)) {
     SuccessorBlocksSet.insert(InitialBlockTerminatorInvoke->getUnwindDest());
   } else if (isa<ReturnInst>(InitialBlockTerminator)) {
     // No action needed
-        } else {
-          InitialBlockTerminator->print(dbgs());
+  } else {
+    InitialBlockTerminator->print(dbgs());
     report_fatal_error("Terminator is not implemented in "
                        "finalizeBasicBlockCloneAndTrackSuccessors");
   }
@@ -1051,12 +1051,12 @@ void splitIfBasicBlockPredecessors(
   std::copy_if(Predecessors.begin(), Predecessors.end(),
                std::back_inserter(InitialBlockPredecessors), Predicate);
 
-        // Fixups on the predecessors terminator - point them to ReplacementBlock.
+  // Fixups on the predecessors terminator - point them to ReplacementBlock.
   for (auto *InitialBlockPredecessor : InitialBlockPredecessors) {
     auto *InitialBlockPredecessorTerminator =
         InitialBlockPredecessor->getTerminator();
     if (auto *InitialBlockPredecessorTerminatorInvoke =
-dyn_cast<InvokeInst>(InitialBlockPredecessorTerminator)) {
+            dyn_cast<InvokeInst>(InitialBlockPredecessorTerminator)) {
       if (InitialBlock ==
           InitialBlockPredecessorTerminatorInvoke->getUnwindDest()) {
         InitialBlockPredecessorTerminatorInvoke->setUnwindDest(
@@ -1066,16 +1066,16 @@ dyn_cast<InvokeInst>(InitialBlockPredecessorTerminator)) {
             ReplacementBlock);
       }
     } else if (auto *InitialBlockPredecessorTerminatorBranch =
-dyn_cast<BranchInst>(InitialBlockPredecessorTerminator)) {
-            for (unsigned int successorIdx = 0;
-                successorIdx <
+                   dyn_cast<BranchInst>(InitialBlockPredecessorTerminator)) {
+      for (unsigned int successorIdx = 0;
+           successorIdx <
            InitialBlockPredecessorTerminatorBranch->getNumSuccessors();
-                ++successorIdx) {
-              if (InitialBlock ==
-InitialBlockPredecessorTerminatorBranch->getSuccessor(
-successorIdx)) {
-                InitialBlockPredecessorTerminatorBranch->setSuccessor(
-successorIdx, ReplacementBlock);
+           ++successorIdx) {
+        if (InitialBlock ==
+            InitialBlockPredecessorTerminatorBranch->getSuccessor(
+                successorIdx)) {
+          InitialBlockPredecessorTerminatorBranch->setSuccessor(
+              successorIdx, ReplacementBlock);
         }
       }
     } else if (auto *InitialBlockPredecessorTerminatorCleanupReturn =
@@ -1083,8 +1083,8 @@ successorIdx, ReplacementBlock);
                        InitialBlockPredecessorTerminator)) {
       InitialBlockPredecessorTerminatorCleanupReturn->setUnwindDest(
           ReplacementBlock);
-          } else {
-            InitialBlockPredecessorTerminator->print(dbgs());
+    } else {
+      InitialBlockPredecessorTerminator->print(dbgs());
       report_fatal_error(
           "Terminator is not implemented in splitIfBasicBlockPredecessors");
     }
@@ -1111,15 +1111,15 @@ splitBasicBlocksNotDominatedByCoroBegin(const FrameDataInfo &FrameData,
       if (!DT.dominates(Shape.CoroBegin, CurrentBlock)) {
         SpillUserBlocksSet.insert(CurrentBlock);
       }
-          }
-        }
-        
-        // Run is in reversed post order, to enforce visiting predecessors before
+    }
+  }
+
+  // Run is in reversed post order, to enforce visiting predecessors before
   // successors
   for (BasicBlock *CurrentBlock : ReversePostOrderTraversal<Function *>(F)) {
     if (!SpillUserBlocksSet.contains(CurrentBlock)) {
-        continue;
-      }
+      continue;
+    }
     SpillUserBlocksSet.erase(CurrentBlock);
 
     // Preserve the current node. the duplicate will become the unspilled
diff --git a/llvm/lib/Transforms/Coroutines/CoroSplit.cpp b/llvm/lib/Transforms/Coroutines/CoroSplit.cpp
index f95d214f5..ceac1cc71 100644
--- a/llvm/lib/Transforms/Coroutines/CoroSplit.cpp
+++ b/llvm/lib/Transforms/Coroutines/CoroSplit.cpp
@@ -377,7 +377,7 @@ static void replaceUnwindCoroEnd(AnyCoroEndInst *End, const coro::Shape &Shape,
   if (auto Bundle = End->getOperandBundle(LLVMContext::OB_funclet)) {
     auto *FromPad = cast<CleanupPadInst>(Bundle->Inputs[0]);
 
-    // If the terminator is an invoke, 
+    // If the terminator is an invoke,
     // set the cleanupret unwind destination the same as the other edges, to
     // avoid validation errors
     BasicBlock *UBB = nullptr;

@tzuralon
Copy link
Author

Fixed formatter comments

@tzuralon tzuralon requested a review from NewSigma August 17, 2025 05:53
Copy link
Contributor

@NewSigma NewSigma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. It looks much better. Just a few questions about implementation.

}

// Dominance issue fixer for each predecessor satisfying predicate function
void splitIfBasicBlockPredecessors(
Copy link
Contributor

@NewSigma NewSigma Aug 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

static void?

// mapping to their corresponding clones
static void finalizeBasicBlockCloneAndTrackSuccessors(
BasicBlock *InitialBlock, BasicBlock *ClonedBlock, ValueToValueMapTy &VMap,
SmallSet<BasicBlock *, 20> &SuccessorBlocksSet) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could use SmallSetImpl to avoid explicitly passing 20.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess you meant SmallPtrSetImpl? It is also bound to storage (in ctor), it expects the caller to pass a storage buffer

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, SmallPtrSetImpl is correct.

const coro::Shape &Shape, Function *F,
DominatorTree &DT) {
ValueToValueMapTy VMap;
SmallSet<BasicBlock *, 20> SpillUserBlocksSet;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the value 20 intentional? I suspect the number of basic blocks that are not dominated by coro.begin is small.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In all my tests, I found <=2, so I will change it to 3


// If the terminator is an invoke,
// set the cleanupret unwind destination the same as the other edges, to
// avoid validation errors
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "validation errors" is confusing to maintainers that not familiar to this bug. Maybe we could drop it.

BTW, there are some similar comments that are context-dependent. It would be great if you could reword them.

// Preserve the current node. the duplicate will become the unspilled
// alternative
auto UnspilledAlternativeBlock =
CloneBasicBlock(CurrentBlock, VMap, ".unspilled_alternative", F);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we clear VMap before reuse it?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought it's mandatory to keep it across traverse (when tested it previously), but I don't find an actual example to prove it, so I will clear it for now.

; In coro-split, this coroutine code reduced IR, produced using clang with async-exceptions
; crashed before fix because of the validation mismatch of Instruction does not dominate all uses!
; RUN: opt < %s -passes='coro-split' -S 2>&1 | FileCheck %s --implicit-check-not="Instruction does not dominate all uses!"
; RUN: opt < %s -passes='default<Os>,coro-split' -S 2>&1 | FileCheck %s --implicit-check-not="Instruction does not dominate all uses!"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you explain why do we need coro-split after default<Os>?

; In coro-split, this coroutine code reduced IR, produced using clang with async-exceptions
; crashed before fix because of the validation mismatch of Instruction does not dominate all uses!
; RUN: opt < %s -passes='coro-split' -S 2>&1 | FileCheck %s --implicit-check-not="Instruction does not dominate all uses!"
; RUN: opt < %s -passes='default<Os>,coro-split' -S 2>&1 | FileCheck %s --implicit-check-not="Instruction does not dominate all uses!"
Copy link
Contributor

@NewSigma NewSigma Aug 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checking for crashes/errors is already handled implicitly, so it doesn't need explicit mention. Perhaps we could go further by using utils/update_test_checks.py to generate more robust tests?

Copy link
Member

@ChuanqiXu9 ChuanqiXu9 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a impression that this patch tries to fix problematic patterns after the conversion. This is generally unwanted. We prefer to make the checks before the conversion. e.g., in certain cases, let's avoid adding something as the spills or vice versa.

It will be helpful to me to understand the problem if you can give more description to the issue with IR examples.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

clang-cl /EHa /Os crash while compiling coroutines + std::unique_ptr, assertion hit even without std::unique_ptr, since 17.x.x
4 participants