-
Notifications
You must be signed in to change notification settings - Fork 14.9k
[Offload] Full AMD support for olMemFill #154958
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -89,6 +89,40 @@ template <typename Fn> inline void threadify(Fn body) { | |
} | ||
} | ||
|
||
/// Enqueues a task to the queue that can be manually resolved. | ||
// It will block until `trigger` is called. | ||
struct ManuallyTriggeredTask { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not entirely sure why we need this but I'll leave it be There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The AMD driver has two different code paths depending on whether the queue is empty or not. This struct allows the test framework to "hold" a task in the queue making it non-empty so that the non-empty path can be tested. |
||
std::mutex M; | ||
std::condition_variable CV; | ||
bool Flag = false; | ||
ol_event_handle_t CompleteEvent; | ||
|
||
ol_result_t enqueue(ol_queue_handle_t Queue) { | ||
if (auto Err = olLaunchHostFunction( | ||
Queue, | ||
[](void *That) { | ||
static_cast<ManuallyTriggeredTask *>(That)->wait(); | ||
}, | ||
this)) | ||
return Err; | ||
|
||
return olCreateEvent(Queue, &CompleteEvent); | ||
} | ||
|
||
void wait() { | ||
std::unique_lock<std::mutex> lk(M); | ||
CV.wait_for(lk, std::chrono::milliseconds(1000), [&] { return Flag; }); | ||
EXPECT_TRUE(Flag); | ||
} | ||
|
||
ol_result_t trigger() { | ||
Flag = true; | ||
CV.notify_one(); | ||
|
||
return olSyncEvent(CompleteEvent); | ||
} | ||
}; | ||
|
||
struct OffloadTest : ::testing::Test { | ||
ol_device_handle_t Host = TestEnvironment::getHostDevice(); | ||
}; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should try as hard as possible not to just roll over and die inside of the plugin. We don't do a great job of it so far.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, but that would require liboffload/PluginInterface to have some kind of async error handling. Maybe
olPlatformPopAsyncError()
that pulls from a queue in the platform that the async handlers can push to?Anyway, I think that should be a separate task. What I'm doing here is the same as
asyncActionCallback
, and such a solution would touch both.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's probably something we should try to figure out at some point. I forget if we discussed it before, but we definitely don't want anything like
cudaGetLastError
. I could see the result of a stream result in an error and allowing users to check for it through an event or something.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we have discussed it before, that memory is lost to me. What are your particular grievances with
cudaGetLastError
? Would havingolSyncEvent
/Queue
return any pending async errors work? What about adding an error callback function via something likeolSetAsyncErrorHandler([](ol_result_t Result) {...})
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same reason that
errno
is a nightmare to deal with. Global errors aren't a good fit for an API that is supposed to be asynchronous. It's confusing who should own it and it's difficult to reason with, hence why CUDA had to provide bothcudaGetLastError
andcudaPeekLastError
to let people figure out if they should even be using the error on the stack.I think HSA uses callbacks in a similar way, but in my head it would probably be best if we just made it an event or something on the stream the user can query if needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the idea of linking errors to streams (e.g. the stream is put into an error state and
syncQueue
somehow returns an error code), but that has problems with cross-queue waits. For example, if queue A waits on queue B, and queue B reports an error, what happens with queue A?I think this is something that warrants its own thoughts and discussion. Can I merge this as is (assuming no other issues) just to unblock AMD and look into a design for error handling as a separate change?