Skip to content

Conversation

tahonermann
Copy link
Contributor

The sycl_kernel_entry_point attribute facilitates the generation of an offload kernel entry point function with parameters corresponding to the (potentially decomposed) kernel arguments and a body that (potentially reconstructs the arguments and) executes the kernel. This change adds symmetric support for the SYCL host through an interface that provides symbol names and (potentially decomposed) kernel arguments to the SYCL library.

Consider the following function declared with the sycl_kernel_entry_point attribute with a call to this function occurring in the implementation of a SYCL kernel invocation function such as sycl::handler::single_task().

  template<typename KernelNameType, typename KernelType>
  [[clang::sycl_kernel_entry_point(KernelNameType)]]
  void kernel_entry_point(KernelType kerne) {
    kernel();
  }

The body of the above function specifies the parameters and body of the generated offload kernel entry point. Clearly, a call to the above function by a SYCL kernel invocation function is not intended to execute the body as written. Previously, code generation emitted an empty function body so that calls to the function had no effect other than to trigger the generation of the offload kernel entry point. The function body is therefore available to hook for SYCL library support and is now substituted with a call to a (SYCL library provided) function template named sycl_enqueue_kernel_launch() with the kernel name type passed as the first template argument, the symbol name of the offload kernel entry point passed as a string literal for the first function argument, and the (possibly decomposed) parameters passed as the remaining explicit function arguments. Given a call like this:

  kernel_entry_point<struct KN>([]{})

the body of the instantiated kernel_entry_point() specialization would be substituted as follows with "kernel-symbol-name" substituted for the generated symbol name and kernel forwarded (This assumes no kernel argument decomposition; if decomposition was required, kernel would be replaced with its corresponding decomposed arguments).

  sycl_enqueue_kernel_launch<KN>("kernel-symbol-name", kernel)

Name lookup and overload resolution for the sycl_enqueue_kernel_launch() function is performed at the point of definition of the sycl_kernel_entry_point attributed function (or the point of instantiation for an instantiated function template specialization). If overload resolution fails, the program is ill-formed.

Implementation of the sycl_enqueue_kernel_launch() function might require additional information provided by the SYCL library. This is facilitated by removing the previous prohibition against use of the sycl_kernel_entry_point attribute with a non-static member function. If the sycl_kernel_entry_point attributed function is a non-static member function, then overload resolution for the sycl_enqueue_kernel_launch() function template may select a non-static member function in which case, this will be implicitly passed as the implicit object argument.

If a sycl_kernel_entry_point attributed function is a non-static member function, use of this in a potentially evaluated expression is prohibited in the definition (since this is not a kernel argument and will not be available within the generated offload kernel entry point function).

Support for kernel argument decomposition and reconstruction is not yet implemented.

@tahonermann tahonermann added the SYCL https://registry.khronos.org/SYCL label Aug 6, 2025
@tahonermann tahonermann requested a review from bader August 6, 2025 22:57
Comment on lines +391 to +432
// FIXME: An extern variable declaration with assignment to the kernel
// name expression is added to Stmts as a temporary measure to see results.
// reflected in tests. The kernel name expression will need to be passed as
// the first function argument in a call to sycl_enqueue_kernel_launch.
QualType ExternVarType = Ctx.getPointerType(Ctx.CharTy.withConst());
const IdentifierInfo *ExternVarName =
SemaRef.getPreprocessor().getIdentifierInfo("kernel_name");
VarDecl *ExternVarDecl = VarDecl::Create(
Ctx, FD, SourceLocation(), SourceLocation(), ExternVarName, ExternVarType,
/*TInfo*/ nullptr, SC_Extern);
DeclStmt *ExternVarDeclStmt = new (Ctx)
DeclStmt(DeclGroupRef(ExternVarDecl), SourceLocation(), SourceLocation());
Stmts.push_back(ExternVarDeclStmt);
DeclRefExpr *ExternVarDeclRef = new (Ctx) DeclRefExpr(
Ctx, ExternVarDecl, /*RefersToEnclosingVariableOrCapture*/ false,
ExternVarType, VK_LValue, SourceLocation());
ImplicitCastExpr *KernelNameArrayDecayExpr = new (Ctx) ImplicitCastExpr(
ImplicitCastExpr::OnStack, ExternVarType, CK_ArrayToPointerDecay,
KernelNameExpr, VK_PRValue, FPOptionsOverride());
BinaryOperator *AssignmentExpr = BinaryOperator::Create(
Ctx, ExternVarDeclRef, KernelNameArrayDecayExpr, BO_Assign, ExternVarType,
VK_LValue, OK_Ordinary, SourceLocation(), FPOptionsOverride());
Stmts.push_back(AssignmentExpr);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Fznamznon, I added this just for temporary testing purposes so that I could validate that code generation was working as expected. Some of this might be useful to you; the creation of the implicit array-to-pointer-decay cast for example. Use whatever you find useful and remove the rest.

Comment on lines +415 to +446
// Perform overload resolution for a call to an accessible (member) function
// template named 'sycl_enqueue_kernel_launch' from within the definition of
// FD where:
// - The kernel name type is passed as the first template argument.
// - Any remaining template parameters are deduced from the function arguments
// or assigned by default template arguments.
// - 'this' is passed as the implicit function argument if 'FD' is a
// non-static member function.
// - The name of the kernel, expressed as a string literal, is passed as the
// first function argument.
// - The parameters of FD are forwarded as-if by 'std::forward()' as the
// remaining explicit function arguments.
// - Any remaining function arguments are initialized by default arguments.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Fznamznon, this is the major work that is left to do. Note that name lookup and overload resolution should be performed from the point of definition of the sycl_kernel_entry_point attributed function or, if that function is an instantiated function template specialization, the point of instantiation. I'm hopeful that SemaRef.CurContext reflects this point; see the associated assert in SemaSYCL::BuildSYCLKernelCallStmt() below.

We should consider whether to support implicit object parameter declarations or, at a minimum, add a test to ensure a proper diagnostic is issued for an example like this:

struct sycl_kernel_launcher {
  template<typename KN, typename... Ts>
  void sycl_enqueue_kernel_launch(this S* self, const char *name, Ts...);
  template<typename KN, typename KT>
  [[clang::sycl_kernel_entry_point(KN)]]
  void kernel_entry_point(KT kernel) { ... }
};

There is a design decision we can consider here too. Should we allow the sycl_enqueue_kernel_launch name to be a default name that can be customized by an additional argument to the sycl_kernel_entry_point attribute? Let me know if you have opinions on it. I'm otherwise content to leave such considerations to future work.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are going to go with sycl_enqueue_kernel_launch as a default name, it should either be reserved by SYCL spec or we need to add __ to it, otherwise it may/will clash with user's code. Same goes with class's name sycl_kernel_launcher. I don't have a strong preference on whether it should be possible to customize sycl_enqueue_kernel_launch name, I think people from Runtime library side may have a better understanding.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A reserved name isn't needed since the SYCL RT will provide the sycl_enqueue_kernel_launch declaration. We aren't implicitly declaring any entities with this approach; we're just performing lookup for entities declared by the SYCL RT.

The sycl_kernel_launcher name isn't special in any way; I just used that name as an example. In this example, lookup for sycl_enqueue_kernel_launch would be performed from the definition of sycl_kernel_launcher::kernel_entry_point() and sycl_kernel_launcher::sycl_enqueue_kernel_launch() is in scope for name lookup at that point.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We aren't implicitly declaring any entities with this approach; we're just performing lookup for entities declared by the SYCL RT.

I understand that. Still it may be unexpected for the users that SYCL RT declares that name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. We'll need to ensure the diagnostic issued when overload resolution fails is clear. I hope we can do better than what Clang does for CUDA when the __cudaPushCallConfiguration() declaration is missing:

t.cu:6:9: error: use of undeclared identifier __cudaPushCallConfiguration
   6 |   kernel<<<1,1>>>([] __attribute__((device)) {});
     |         ^

@tahonermann tahonermann force-pushed the sycl-upstream-fe-sycl_kernel_entry_point-host branch from 56407fb to 5b42f6b Compare August 7, 2025 02:27
// CHECK-HOST-LINUX-NEXT: ret void
// CHECK-HOST-LINUX-NEXT: }
//
// CHECK-HOST-LINUX: define internal void @"_Z18kernel_single_taskI6\CE\B4\CF\84\CF\87Z4mainEUliE_EvT0_"() #{{[0-9]+}} {
Copy link
Contributor

@Fznamznon Fznamznon Aug 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tahonermann , shouldn't that function and others like it accept kernel object parameter?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per offline discussion, yes it should be kind of maybe. I think what is happening here is that the kernel object is passed by value and, since the kernel object type is an empty class, the target ABI specifies that no arguments actually be passed; the (stateless) parameter object is constructed in the body of the function.

We could add state to the kernel object types to force an argument to be provided. I don't think it matters much though; all we really need to validate here is that the synthesized call to the sycl_enqueue_kernel_launch function is present and looks correct.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another thing that we should validate is exception handling. I don't know if any actual code changes will be required. Scenarios to be considered:

  • The sycl_kernel_entry_point function is declared noexcept; an exception propagating from the call to the sycl_enqueue_kernel_launch() function (including from construction of arguments) should result in a call to std::terminate().
  • The sycl_kernel_entry_point function is not declared noexcept; if the called sycl_enqueue_kernel_launch() function is also not declared noexcept, then exception handling scaffolding might be required (e.g., to destruct value parameters that have non-trivial destruction) and a thrown exception should propagate.
  • The sycl_kernel_entry_point function is not declared noexcept; if the called sycl_enqueue_kernel_launch() function is declared noexcept, then exception handling scaffolding can be omitted.

…t attribute.

The `sycl_kernel_entry_point` attribute facilitates the generation of an
offload kernel entry point function with parameters corresponding to the
(potentially decomposed) kernel arguments and a body that (potentially
reconstructs the arguments and) executes the kernel. This change adds
symmetric support for the SYCL host through an interface that provides
symbol names and (potentially decomposed) kernel arguments to the SYCL
library.

Consider the following function declared with the `sycl_kernel_entry_point`
attribute with a call to this function occurring in the implementation of
a SYCL kernel invocation function such as `sycl::handler::single_task()`.
  template<typename KernelNameType, typename KernelType>
  [[clang::sycl_kernel_entry_point(KernelNameType)]]
  void kernel_entry_point(KernelType kerne) {
    kernel();
  }

The body of the above function specifies the parameters and body of the
generated offload kernel entry point. Clearly, a call to the above function
by a SYCL kernel invocation function is not intended to execute the body
as written. Previously, code generation emitted an empty function body so
that calls to the function had no effect other than to trigger the generation
of the offload kernel entry point. The function body is therefore available
to hook for SYCL library support and is now substituted with a call to a
(SYCL library provided) function template named `sycl_enqueue_kernel_launch()`
with the kernel name type passed as the first template argument, the
symbol name of the offload kernel entry point passed as a string literal for
the first function argument, and the (possibly decomposed) parameters passed
as the remaining explicit function arguments. Given a call like this:
  kernel_entry_point<struct KN>([]{})
the body of the instantiated `kernel_entry_point()` specialization would be
substituted as follows with "kernel-symbol-name" substituted for the
generated symbol name and `kernel` forwarded (This assumes no kernel
argument decomposition; if decomposition was required, `kernel` would be
replaced with its corresponding decomposed arguments).
  sycl_enqueue_kernel_launch<KN>("kernel-symbol-name", kernel)

Name lookup and overload resolution for the `sycl_enqueue_kernel_launch()`
function is performed at the point of definition of the
`sycl_kernel_entry_point` attributed function (or the point of instantiation
for an instantiated function template specialization). If overload
resolution fails, the program is ill-formed.

Implementation of the `sycl_enqueue_kernel_launch()` function might require
additional information provided by the SYCL library. This is facilitated by
removing the previous prohibition against use of the `sycl_kernel_entry_point`
attribute with a non-static member function. If the `sycl_kernel_entry_point`
attributed function is a non-static member function, then overload resolution
for the `sycl_enqueue_kernel_launch()` function template may select a
non-static member function in which case, `this` will be implicitly passed
as the implicit object argument.

If a `sycl_kernel_entry_point` attributed function is a non-static member
function, use of `this` in a potentially evaluated expression is prohibited
in the definition (since `this` is not a kernel argument and will not be
available within the generated offload kernel entry point function).

Support for kernel argument decomposition and reconstruction is not yet
implemented.
@tahonermann tahonermann force-pushed the sycl-upstream-fe-sycl_kernel_entry_point-host branch from 7a913b2 to 70f34c3 Compare August 25, 2025 18:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
SYCL https://registry.khronos.org/SYCL
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants