-
Notifications
You must be signed in to change notification settings - Fork 14.9k
[SYCL] SYCL host kernel launch support for the sycl_kernel_entry_point attribute. #152403
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[SYCL] SYCL host kernel launch support for the sycl_kernel_entry_point attribute. #152403
Conversation
// FIXME: An extern variable declaration with assignment to the kernel | ||
// name expression is added to Stmts as a temporary measure to see results. | ||
// reflected in tests. The kernel name expression will need to be passed as | ||
// the first function argument in a call to sycl_enqueue_kernel_launch. | ||
QualType ExternVarType = Ctx.getPointerType(Ctx.CharTy.withConst()); | ||
const IdentifierInfo *ExternVarName = | ||
SemaRef.getPreprocessor().getIdentifierInfo("kernel_name"); | ||
VarDecl *ExternVarDecl = VarDecl::Create( | ||
Ctx, FD, SourceLocation(), SourceLocation(), ExternVarName, ExternVarType, | ||
/*TInfo*/ nullptr, SC_Extern); | ||
DeclStmt *ExternVarDeclStmt = new (Ctx) | ||
DeclStmt(DeclGroupRef(ExternVarDecl), SourceLocation(), SourceLocation()); | ||
Stmts.push_back(ExternVarDeclStmt); | ||
DeclRefExpr *ExternVarDeclRef = new (Ctx) DeclRefExpr( | ||
Ctx, ExternVarDecl, /*RefersToEnclosingVariableOrCapture*/ false, | ||
ExternVarType, VK_LValue, SourceLocation()); | ||
ImplicitCastExpr *KernelNameArrayDecayExpr = new (Ctx) ImplicitCastExpr( | ||
ImplicitCastExpr::OnStack, ExternVarType, CK_ArrayToPointerDecay, | ||
KernelNameExpr, VK_PRValue, FPOptionsOverride()); | ||
BinaryOperator *AssignmentExpr = BinaryOperator::Create( | ||
Ctx, ExternVarDeclRef, KernelNameArrayDecayExpr, BO_Assign, ExternVarType, | ||
VK_LValue, OK_Ordinary, SourceLocation(), FPOptionsOverride()); | ||
Stmts.push_back(AssignmentExpr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Fznamznon, I added this just for temporary testing purposes so that I could validate that code generation was working as expected. Some of this might be useful to you; the creation of the implicit array-to-pointer-decay cast for example. Use whatever you find useful and remove the rest.
// Perform overload resolution for a call to an accessible (member) function | ||
// template named 'sycl_enqueue_kernel_launch' from within the definition of | ||
// FD where: | ||
// - The kernel name type is passed as the first template argument. | ||
// - Any remaining template parameters are deduced from the function arguments | ||
// or assigned by default template arguments. | ||
// - 'this' is passed as the implicit function argument if 'FD' is a | ||
// non-static member function. | ||
// - The name of the kernel, expressed as a string literal, is passed as the | ||
// first function argument. | ||
// - The parameters of FD are forwarded as-if by 'std::forward()' as the | ||
// remaining explicit function arguments. | ||
// - Any remaining function arguments are initialized by default arguments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Fznamznon, this is the major work that is left to do. Note that name lookup and overload resolution should be performed from the point of definition of the sycl_kernel_entry_point
attributed function or, if that function is an instantiated function template specialization, the point of instantiation. I'm hopeful that SemaRef.CurContext
reflects this point; see the associated assert
in SemaSYCL::BuildSYCLKernelCallStmt()
below.
We should consider whether to support implicit object parameter declarations or, at a minimum, add a test to ensure a proper diagnostic is issued for an example like this:
struct sycl_kernel_launcher {
template<typename KN, typename... Ts>
void sycl_enqueue_kernel_launch(this S* self, const char *name, Ts...);
template<typename KN, typename KT>
[[clang::sycl_kernel_entry_point(KN)]]
void kernel_entry_point(KT kernel) { ... }
};
There is a design decision we can consider here too. Should we allow the sycl_enqueue_kernel_launch
name to be a default name that can be customized by an additional argument to the sycl_kernel_entry_point
attribute? Let me know if you have opinions on it. I'm otherwise content to leave such considerations to future work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we are going to go with sycl_enqueue_kernel_launch
as a default name, it should either be reserved by SYCL spec or we need to add __ to it, otherwise it may/will clash with user's code. Same goes with class's name sycl_kernel_launcher
. I don't have a strong preference on whether it should be possible to customize sycl_enqueue_kernel_launch
name, I think people from Runtime library side may have a better understanding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A reserved name isn't needed since the SYCL RT will provide the sycl_enqueue_kernel_launch
declaration. We aren't implicitly declaring any entities with this approach; we're just performing lookup for entities declared by the SYCL RT.
The sycl_kernel_launcher
name isn't special in any way; I just used that name as an example. In this example, lookup for sycl_enqueue_kernel_launch
would be performed from the definition of sycl_kernel_launcher::kernel_entry_point()
and sycl_kernel_launcher::sycl_enqueue_kernel_launch()
is in scope for name lookup at that point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We aren't implicitly declaring any entities with this approach; we're just performing lookup for entities declared by the SYCL RT.
I understand that. Still it may be unexpected for the users that SYCL RT declares that name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. We'll need to ensure the diagnostic issued when overload resolution fails is clear. I hope we can do better than what Clang does for CUDA when the __cudaPushCallConfiguration()
declaration is missing:
t.cu:6:9: error: use of undeclared identifier __cudaPushCallConfiguration
6 | kernel<<<1,1>>>([] __attribute__((device)) {});
| ^
56407fb
to
5b42f6b
Compare
// CHECK-HOST-LINUX-NEXT: ret void | ||
// CHECK-HOST-LINUX-NEXT: } | ||
// | ||
// CHECK-HOST-LINUX: define internal void @"_Z18kernel_single_taskI6\CE\B4\CF\84\CF\87Z4mainEUliE_EvT0_"() #{{[0-9]+}} { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tahonermann , shouldn't that function and others like it accept kernel object parameter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per offline discussion, yes it should be kind of maybe. I think what is happening here is that the kernel object is passed by value and, since the kernel object type is an empty class, the target ABI specifies that no arguments actually be passed; the (stateless) parameter object is constructed in the body of the function.
We could add state to the kernel object types to force an argument to be provided. I don't think it matters much though; all we really need to validate here is that the synthesized call to the sycl_enqueue_kernel_launch
function is present and looks correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another thing that we should validate is exception handling. I don't know if any actual code changes will be required. Scenarios to be considered:
- The
sycl_kernel_entry_point
function is declarednoexcept
; an exception propagating from the call to thesycl_enqueue_kernel_launch()
function (including from construction of arguments) should result in a call tostd::terminate()
. - The
sycl_kernel_entry_point
function is not declarednoexcept
; if the calledsycl_enqueue_kernel_launch()
function is also not declarednoexcept
, then exception handling scaffolding might be required (e.g., to destruct value parameters that have non-trivial destruction) and a thrown exception should propagate. - The
sycl_kernel_entry_point
function is not declarednoexcept
; if the calledsycl_enqueue_kernel_launch()
function is declarednoexcept
, then exception handling scaffolding can be omitted.
…t attribute. The `sycl_kernel_entry_point` attribute facilitates the generation of an offload kernel entry point function with parameters corresponding to the (potentially decomposed) kernel arguments and a body that (potentially reconstructs the arguments and) executes the kernel. This change adds symmetric support for the SYCL host through an interface that provides symbol names and (potentially decomposed) kernel arguments to the SYCL library. Consider the following function declared with the `sycl_kernel_entry_point` attribute with a call to this function occurring in the implementation of a SYCL kernel invocation function such as `sycl::handler::single_task()`. template<typename KernelNameType, typename KernelType> [[clang::sycl_kernel_entry_point(KernelNameType)]] void kernel_entry_point(KernelType kerne) { kernel(); } The body of the above function specifies the parameters and body of the generated offload kernel entry point. Clearly, a call to the above function by a SYCL kernel invocation function is not intended to execute the body as written. Previously, code generation emitted an empty function body so that calls to the function had no effect other than to trigger the generation of the offload kernel entry point. The function body is therefore available to hook for SYCL library support and is now substituted with a call to a (SYCL library provided) function template named `sycl_enqueue_kernel_launch()` with the kernel name type passed as the first template argument, the symbol name of the offload kernel entry point passed as a string literal for the first function argument, and the (possibly decomposed) parameters passed as the remaining explicit function arguments. Given a call like this: kernel_entry_point<struct KN>([]{}) the body of the instantiated `kernel_entry_point()` specialization would be substituted as follows with "kernel-symbol-name" substituted for the generated symbol name and `kernel` forwarded (This assumes no kernel argument decomposition; if decomposition was required, `kernel` would be replaced with its corresponding decomposed arguments). sycl_enqueue_kernel_launch<KN>("kernel-symbol-name", kernel) Name lookup and overload resolution for the `sycl_enqueue_kernel_launch()` function is performed at the point of definition of the `sycl_kernel_entry_point` attributed function (or the point of instantiation for an instantiated function template specialization). If overload resolution fails, the program is ill-formed. Implementation of the `sycl_enqueue_kernel_launch()` function might require additional information provided by the SYCL library. This is facilitated by removing the previous prohibition against use of the `sycl_kernel_entry_point` attribute with a non-static member function. If the `sycl_kernel_entry_point` attributed function is a non-static member function, then overload resolution for the `sycl_enqueue_kernel_launch()` function template may select a non-static member function in which case, `this` will be implicitly passed as the implicit object argument. If a `sycl_kernel_entry_point` attributed function is a non-static member function, use of `this` in a potentially evaluated expression is prohibited in the definition (since `this` is not a kernel argument and will not be available within the generated offload kernel entry point function). Support for kernel argument decomposition and reconstruction is not yet implemented.
7a913b2
to
70f34c3
Compare
The
sycl_kernel_entry_point
attribute facilitates the generation of an offload kernel entry point function with parameters corresponding to the (potentially decomposed) kernel arguments and a body that (potentially reconstructs the arguments and) executes the kernel. This change adds symmetric support for the SYCL host through an interface that provides symbol names and (potentially decomposed) kernel arguments to the SYCL library.Consider the following function declared with the
sycl_kernel_entry_point
attribute with a call to this function occurring in the implementation of a SYCL kernel invocation function such assycl::handler::single_task()
.The body of the above function specifies the parameters and body of the generated offload kernel entry point. Clearly, a call to the above function by a SYCL kernel invocation function is not intended to execute the body as written. Previously, code generation emitted an empty function body so that calls to the function had no effect other than to trigger the generation of the offload kernel entry point. The function body is therefore available to hook for SYCL library support and is now substituted with a call to a (SYCL library provided) function template named
sycl_enqueue_kernel_launch()
with the kernel name type passed as the first template argument, the symbol name of the offload kernel entry point passed as a string literal for the first function argument, and the (possibly decomposed) parameters passed as the remaining explicit function arguments. Given a call like this:the body of the instantiated
kernel_entry_point()
specialization would be substituted as follows with "kernel-symbol-name" substituted for the generated symbol name andkernel
forwarded (This assumes no kernel argument decomposition; if decomposition was required,kernel
would be replaced with its corresponding decomposed arguments).Name lookup and overload resolution for the
sycl_enqueue_kernel_launch()
function is performed at the point of definition of thesycl_kernel_entry_point
attributed function (or the point of instantiation for an instantiated function template specialization). If overload resolution fails, the program is ill-formed.Implementation of the
sycl_enqueue_kernel_launch()
function might require additional information provided by the SYCL library. This is facilitated by removing the previous prohibition against use of thesycl_kernel_entry_point
attribute with a non-static member function. If thesycl_kernel_entry_point
attributed function is a non-static member function, then overload resolution for thesycl_enqueue_kernel_launch()
function template may select a non-static member function in which case,this
will be implicitly passed as the implicit object argument.If a
sycl_kernel_entry_point
attributed function is a non-static member function, use ofthis
in a potentially evaluated expression is prohibited in the definition (sincethis
is not a kernel argument and will not be available within the generated offload kernel entry point function).Support for kernel argument decomposition and reconstruction is not yet implemented.