-
Notifications
You must be signed in to change notification settings - Fork 14.9k
[Offload] Use amd_signal_async_handler
for host function calls
#154131
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@llvm/pr-subscribers-offload @llvm/pr-subscribers-backend-amdgpu Author: Ross Brunton (RossBrunton) ChangesFull diff: https://github.com/llvm/llvm-project/pull/154131.diff 1 Files Affected:
diff --git a/offload/plugins-nextgen/amdgpu/src/rtl.cpp b/offload/plugins-nextgen/amdgpu/src/rtl.cpp
index 83280fe0a49c9..f7cf09da7a62d 100644
--- a/offload/plugins-nextgen/amdgpu/src/rtl.cpp
+++ b/offload/plugins-nextgen/amdgpu/src/rtl.cpp
@@ -1063,18 +1063,19 @@ struct AMDGPUStreamTy {
/// Indicate to spread data transfers across all available SDMAs
bool UseMultipleSdmaEngines;
+ struct CallbackDataType {
+ void (*UserFn)(void *);
+ void *UserData;
+ AMDGPUSignalTy *OutputSignal;
+ };
/// Wrapper function for implementing host callbacks
- static void CallbackWrapper(AMDGPUSignalTy *InputSignal,
- AMDGPUSignalTy *OutputSignal,
- void (*Callback)(void *), void *UserData) {
- // The wait call will not error in this context.
- if (InputSignal)
- if (auto Err = InputSignal->wait())
- reportFatalInternalError(std::move(Err));
-
- Callback(UserData);
-
- OutputSignal->signal();
+ static bool callbackWrapper([[maybe_unused]] hsa_signal_value_t Signal,
+ void *UserData) {
+ auto CallbackData = reinterpret_cast<CallbackDataType *>(UserData);
+ CallbackData->UserFn(CallbackData->UserData);
+ CallbackData->OutputSignal->signal();
+ delete CallbackData;
+ return false;
}
/// Return the current number of asynchronous operations on the stream.
@@ -1525,13 +1526,22 @@ struct AMDGPUStreamTy {
InputSignal = consume(OutputSignal).second;
}
- // "Leaking" the thread here is consistent with other work added to the
- // queue. The input and output signals will remain valid until the output is
- // signaled.
- std::thread(CallbackWrapper, InputSignal, OutputSignal, Callback, UserData)
- .detach();
+ auto *CallbackData = new CallbackDataType{Callback, UserData, OutputSignal};
+ if (InputSignal && InputSignal->load()) {
+ hsa_status_t Status = hsa_amd_signal_async_handler(
+ InputSignal->get(), HSA_SIGNAL_CONDITION_EQ, 0, callbackWrapper,
+ CallbackData);
- return Plugin::success();
+ return Plugin::check(Status, "error in hsa_amd_signal_async_handler: %s");
+ } else {
+ // No dependencies - schedule it now.
+ // Using a seperate thread because this function should run asynchronously
+ // and not block the main thread.
+ std::thread([](void *CallbackData) { callbackWrapper(0, CallbackData); },
+ CallbackData)
+ .detach();
+ return Plugin::success();
+ }
}
/// Synchronize with the stream. The current thread waits until all operations
|
So it turns out that callbacks used by the AMD RTL actually do run in response to the jobs. The docs I was looking at didn't include EDIT: Some context for why we can't use the existing Callbacks/ActionArgs system. For each task, we submit to the queue, we expect the task to send a signal when it is completed. This is what the Callbacks/ActionsArgs listen for. However, when scheduling a user-provided function, that function (or rather, the wrapper function) needs to send that signal, so it can't be grouped with the rest of the callbacks. Hence the need for a different flow for it. |
No description provided.