[Offload] Use `amd_signal_async_handler` for host function calls #154131

RossBrunton · 2025-08-18T15:07:48Z

No description provided.

llvmbot · 2025-08-18T15:08:21Z

@llvm/pr-subscribers-offload

@llvm/pr-subscribers-backend-amdgpu

Author: Ross Brunton (RossBrunton)

Changes

Full diff: https://github.com/llvm/llvm-project/pull/154131.diff

1 Files Affected:

(modified) offload/plugins-nextgen/amdgpu/src/rtl.cpp (+27-17)

diff --git a/offload/plugins-nextgen/amdgpu/src/rtl.cpp b/offload/plugins-nextgen/amdgpu/src/rtl.cpp
index 83280fe0a49c9..f7cf09da7a62d 100644
--- a/offload/plugins-nextgen/amdgpu/src/rtl.cpp
+++ b/offload/plugins-nextgen/amdgpu/src/rtl.cpp
@@ -1063,18 +1063,19 @@ struct AMDGPUStreamTy {
   /// Indicate to spread data transfers across all available SDMAs
   bool UseMultipleSdmaEngines;
 
+  struct CallbackDataType {
+    void (*UserFn)(void *);
+    void *UserData;
+    AMDGPUSignalTy *OutputSignal;
+  };
   /// Wrapper function for implementing host callbacks
-  static void CallbackWrapper(AMDGPUSignalTy *InputSignal,
-                              AMDGPUSignalTy *OutputSignal,
-                              void (*Callback)(void *), void *UserData) {
-    // The wait call will not error in this context.
-    if (InputSignal)
-      if (auto Err = InputSignal->wait())
-        reportFatalInternalError(std::move(Err));
-
-    Callback(UserData);
-
-    OutputSignal->signal();
+  static bool callbackWrapper([[maybe_unused]] hsa_signal_value_t Signal,
+                              void *UserData) {
+    auto CallbackData = reinterpret_cast<CallbackDataType *>(UserData);
+    CallbackData->UserFn(CallbackData->UserData);
+    CallbackData->OutputSignal->signal();
+    delete CallbackData;
+    return false;
   }
 
   /// Return the current number of asynchronous operations on the stream.
@@ -1525,13 +1526,22 @@ struct AMDGPUStreamTy {
       InputSignal = consume(OutputSignal).second;
     }
 
-    // "Leaking" the thread here is consistent with other work added to the
-    // queue. The input and output signals will remain valid until the output is
-    // signaled.
-    std::thread(CallbackWrapper, InputSignal, OutputSignal, Callback, UserData)
-        .detach();
+    auto *CallbackData = new CallbackDataType{Callback, UserData, OutputSignal};
+    if (InputSignal && InputSignal->load()) {
+      hsa_status_t Status = hsa_amd_signal_async_handler(
+          InputSignal->get(), HSA_SIGNAL_CONDITION_EQ, 0, callbackWrapper,
+          CallbackData);
 
-    return Plugin::success();
+      return Plugin::check(Status, "error in hsa_amd_signal_async_handler: %s");
+    } else {
+      // No dependencies - schedule it now.
+      // Using a seperate thread because this function should run asynchronously
+      // and not block the main thread.
+      std::thread([](void *CallbackData) { callbackWrapper(0, CallbackData); },
+                  CallbackData)
+          .detach();
+      return Plugin::success();
+    }
   }
 
   /// Synchronize with the stream. The current thread waits until all operations

RossBrunton · 2025-08-18T15:08:58Z

So it turns out that callbacks used by the AMD RTL actually do run in response to the jobs. The docs I was looking at didn't include amd_signal_async_handler so I wasn't aware such a thing existed.

EDIT: Some context for why we can't use the existing Callbacks/ActionArgs system. For each task, we submit to the queue, we expect the task to send a signal when it is completed. This is what the Callbacks/ActionsArgs listen for. However, when scheduling a user-provided function, that function (or rather, the wrapper function) needs to send that signal, so it can't be grouped with the rest of the callbacks. Hence the need for a different flow for it.

offload/plugins-nextgen/amdgpu/src/rtl.cpp

RossBrunton · 2025-09-01T10:29:14Z

@arsenm @jhuber6 Can I get this looked at?

offload/plugins-nextgen/amdgpu/src/rtl.cpp

[Offload] Use amd_signal_async_handler for host function calls

b02f33a

RossBrunton requested a review from jhuber6 August 18, 2025 15:07

llvmbot added backend:AMDGPU offload labels Aug 18, 2025

RossBrunton requested a review from callumfare August 18, 2025 15:08

arsenm reviewed Aug 18, 2025

View reviewed changes

offload/plugins-nextgen/amdgpu/src/rtl.cpp Outdated Show resolved Hide resolved

Use function pointer type for callback function

a114728

RossBrunton requested a review from arsenm August 20, 2025 10:12

arsenm reviewed Sep 1, 2025

View reviewed changes

offload/plugins-nextgen/amdgpu/src/rtl.cpp Outdated Show resolved Hide resolved

Style

8a8b9f5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Offload] Use `amd_signal_async_handler` for host function calls #154131

[Offload] Use `amd_signal_async_handler` for host function calls #154131

RossBrunton commented Aug 18, 2025

Uh oh!

llvmbot commented Aug 18, 2025 •

edited

Loading

Uh oh!

RossBrunton commented Aug 18, 2025 •

edited

Loading

Uh oh!

Uh oh!

RossBrunton commented Sep 1, 2025

Uh oh!

Uh oh!

Uh oh!

[Offload] Use amd_signal_async_handler for host function calls #154131

Are you sure you want to change the base?

[Offload] Use amd_signal_async_handler for host function calls #154131

Conversation

RossBrunton commented Aug 18, 2025

Uh oh!

llvmbot commented Aug 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

RossBrunton commented Aug 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

RossBrunton commented Sep 1, 2025

Uh oh!

Uh oh!

Uh oh!

[Offload] Use `amd_signal_async_handler` for host function calls #154131

[Offload] Use `amd_signal_async_handler` for host function calls #154131

llvmbot commented Aug 18, 2025 •

edited

Loading

RossBrunton commented Aug 18, 2025 •

edited

Loading