Skip to content

Conversation

RossBrunton
Copy link
Contributor

No description provided.

@llvmbot
Copy link
Member

llvmbot commented Aug 18, 2025

@llvm/pr-subscribers-offload

@llvm/pr-subscribers-backend-amdgpu

Author: Ross Brunton (RossBrunton)

Changes

Full diff: https://github.com/llvm/llvm-project/pull/154131.diff

1 Files Affected:

  • (modified) offload/plugins-nextgen/amdgpu/src/rtl.cpp (+27-17)
diff --git a/offload/plugins-nextgen/amdgpu/src/rtl.cpp b/offload/plugins-nextgen/amdgpu/src/rtl.cpp
index 83280fe0a49c9..f7cf09da7a62d 100644
--- a/offload/plugins-nextgen/amdgpu/src/rtl.cpp
+++ b/offload/plugins-nextgen/amdgpu/src/rtl.cpp
@@ -1063,18 +1063,19 @@ struct AMDGPUStreamTy {
   /// Indicate to spread data transfers across all available SDMAs
   bool UseMultipleSdmaEngines;
 
+  struct CallbackDataType {
+    void (*UserFn)(void *);
+    void *UserData;
+    AMDGPUSignalTy *OutputSignal;
+  };
   /// Wrapper function for implementing host callbacks
-  static void CallbackWrapper(AMDGPUSignalTy *InputSignal,
-                              AMDGPUSignalTy *OutputSignal,
-                              void (*Callback)(void *), void *UserData) {
-    // The wait call will not error in this context.
-    if (InputSignal)
-      if (auto Err = InputSignal->wait())
-        reportFatalInternalError(std::move(Err));
-
-    Callback(UserData);
-
-    OutputSignal->signal();
+  static bool callbackWrapper([[maybe_unused]] hsa_signal_value_t Signal,
+                              void *UserData) {
+    auto CallbackData = reinterpret_cast<CallbackDataType *>(UserData);
+    CallbackData->UserFn(CallbackData->UserData);
+    CallbackData->OutputSignal->signal();
+    delete CallbackData;
+    return false;
   }
 
   /// Return the current number of asynchronous operations on the stream.
@@ -1525,13 +1526,22 @@ struct AMDGPUStreamTy {
       InputSignal = consume(OutputSignal).second;
     }
 
-    // "Leaking" the thread here is consistent with other work added to the
-    // queue. The input and output signals will remain valid until the output is
-    // signaled.
-    std::thread(CallbackWrapper, InputSignal, OutputSignal, Callback, UserData)
-        .detach();
+    auto *CallbackData = new CallbackDataType{Callback, UserData, OutputSignal};
+    if (InputSignal && InputSignal->load()) {
+      hsa_status_t Status = hsa_amd_signal_async_handler(
+          InputSignal->get(), HSA_SIGNAL_CONDITION_EQ, 0, callbackWrapper,
+          CallbackData);
 
-    return Plugin::success();
+      return Plugin::check(Status, "error in hsa_amd_signal_async_handler: %s");
+    } else {
+      // No dependencies - schedule it now.
+      // Using a seperate thread because this function should run asynchronously
+      // and not block the main thread.
+      std::thread([](void *CallbackData) { callbackWrapper(0, CallbackData); },
+                  CallbackData)
+          .detach();
+      return Plugin::success();
+    }
   }
 
   /// Synchronize with the stream. The current thread waits until all operations

@RossBrunton
Copy link
Contributor Author

RossBrunton commented Aug 18, 2025

So it turns out that callbacks used by the AMD RTL actually do run in response to the jobs. The docs I was looking at didn't include amd_signal_async_handler so I wasn't aware such a thing existed.

EDIT: Some context for why we can't use the existing Callbacks/ActionArgs system. For each task, we submit to the queue, we expect the task to send a signal when it is completed. This is what the Callbacks/ActionsArgs listen for. However, when scheduling a user-provided function, that function (or rather, the wrapper function) needs to send that signal, so it can't be grouped with the rest of the callbacks. Hence the need for a different flow for it.

@RossBrunton RossBrunton requested a review from arsenm August 20, 2025 10:12
@RossBrunton
Copy link
Contributor Author

@arsenm @jhuber6 Can I get this looked at?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants