[OpenMP][Offload] Add SPMD-No-Loop mode to OpenMP offload runtime #154105

DominikAdamski · 2025-08-18T12:12:47Z

Kernels which are marked as SPMD-No-Loop should be launched with sufficient number of teams and threads to cover loop iteration space.

No-Loop mode is described in RFC:
https://discourse.llvm.org/t/rfc-no-loop-mode-for-openmp-gpu-kernels/87517/

Kernels which are marked as SPMD-No-Loop should be launched with sufficient number of teams and threads to cover loop iteration space. No-Loop mode is described in RFC: https://discourse.llvm.org/t/rfc-no-loop-mode-for-openmp-gpu-kernels/87517/

llvmbot · 2025-08-18T12:13:19Z

@llvm/pr-subscribers-offload

@llvm/pr-subscribers-flang-openmp

Author: Dominik Adamski (DominikAdamski)

Changes

Kernels which are marked as SPMD-No-Loop should be launched with sufficient number of teams and threads to cover loop iteration space.

No-Loop mode is described in RFC:
https://discourse.llvm.org/t/rfc-no-loop-mode-for-openmp-gpu-kernels/87517/

Full diff: https://github.com/llvm/llvm-project/pull/154105.diff

3 Files Affected:

(modified) llvm/include/llvm/Frontend/OpenMP/OMPDeviceConstants.h (+2-1)
(modified) offload/plugins-nextgen/common/include/PluginInterface.h (+11-1)
(modified) offload/plugins-nextgen/common/src/PluginInterface.cpp (+22)

diff --git a/llvm/include/llvm/Frontend/OpenMP/OMPDeviceConstants.h b/llvm/include/llvm/Frontend/OpenMP/OMPDeviceConstants.h
index 3ae447b14f320..c41b4d1e9844c 100644
--- a/llvm/include/llvm/Frontend/OpenMP/OMPDeviceConstants.h
+++ b/llvm/include/llvm/Frontend/OpenMP/OMPDeviceConstants.h
@@ -23,7 +23,8 @@ enum OMPTgtExecModeFlags : unsigned char {
   OMP_TGT_EXEC_MODE_GENERIC = 1 << 0,
   OMP_TGT_EXEC_MODE_SPMD = 1 << 1,
   OMP_TGT_EXEC_MODE_GENERIC_SPMD =
-      OMP_TGT_EXEC_MODE_GENERIC | OMP_TGT_EXEC_MODE_SPMD
+      OMP_TGT_EXEC_MODE_GENERIC | OMP_TGT_EXEC_MODE_SPMD,
+  OMP_TGT_EXEC_MODE_SPMD_NO_LOOP = 1 << 2 | OMP_TGT_EXEC_MODE_SPMD
 };
 
 } // end namespace omp
diff --git a/offload/plugins-nextgen/common/include/PluginInterface.h b/offload/plugins-nextgen/common/include/PluginInterface.h
index a448721755a6f..47e72147b1cc3 100644
--- a/offload/plugins-nextgen/common/include/PluginInterface.h
+++ b/offload/plugins-nextgen/common/include/PluginInterface.h
@@ -431,6 +431,8 @@ struct GenericKernelTy {
       return "Generic";
     case OMP_TGT_EXEC_MODE_GENERIC_SPMD:
       return "Generic-SPMD";
+    case OMP_TGT_EXEC_MODE_SPMD_NO_LOOP:
+      return "SPMD-No-Loop";
     }
     llvm_unreachable("Unknown execution mode!");
   }
@@ -468,7 +470,8 @@ struct GenericKernelTy {
                         uint32_t BlockLimitClause[3], uint64_t LoopTripCount,
                         uint32_t &NumThreads, bool IsNumThreadsFromUser) const;
 
-  /// Indicate if the kernel works in Generic SPMD, Generic or SPMD mode.
+  /// Indicate if the kernel works in Generic SPMD, Generic, No-Loop
+  /// or SPMD mode.
   bool isGenericSPMDMode() const {
     return KernelEnvironment.Configuration.ExecMode ==
            OMP_TGT_EXEC_MODE_GENERIC_SPMD;
@@ -483,6 +486,10 @@ struct GenericKernelTy {
   bool isBareMode() const {
     return KernelEnvironment.Configuration.ExecMode == OMP_TGT_EXEC_MODE_BARE;
   }
+  bool isNoLoopMode() const {
+    return KernelEnvironment.Configuration.ExecMode ==
+           OMP_TGT_EXEC_MODE_SPMD_NO_LOOP;
+  }
 
   /// The kernel name.
   std::string Name;
@@ -1152,6 +1159,9 @@ struct GenericDeviceTy : public DeviceAllocatorTy {
   /// deallocated by the allocator.
   llvm::SmallVector<DeviceImageTy *> LoadedImages;
 
+  /// Return value of OMP_TEAMS_THREAD_LIMIT environment variable
+  int32_t getOMPTeamsThreadLimit() const { return OMP_TeamsThreadLimit; }
+
 private:
   /// Get and set the stack size and heap size for the device. If not used, the
   /// plugin can implement the setters as no-op and setting the output
diff --git a/offload/plugins-nextgen/common/src/PluginInterface.cpp b/offload/plugins-nextgen/common/src/PluginInterface.cpp
index c06c35e1e6a5b..72d75010d9657 100644
--- a/offload/plugins-nextgen/common/src/PluginInterface.cpp
+++ b/offload/plugins-nextgen/common/src/PluginInterface.cpp
@@ -640,6 +640,18 @@ uint32_t GenericKernelTy::getNumThreads(GenericDeviceTy &GenericDevice,
   if (ThreadLimitClause[0] > 0 && isGenericMode())
     ThreadLimitClause[0] += GenericDevice.getWarpSize();
 
+  // Honor OMP_TEAMS_THREAD_LIMIT environment variable and
+  // num_threads/thread_limit clause for NoLoop kernel types.
+  int32_t TeamsThreadLimitEnvVar = GenericDevice.getOMPTeamsThreadLimit();
+  uint16_t ConstWGSize = GenericDevice.getDefaultNumThreads();
+  if (isNoLoopMode()) {
+    if (TeamsThreadLimitEnvVar > 0)
+      return std::min(static_cast<int32_t>(ConstWGSize),
+                      TeamsThreadLimitEnvVar);
+    if ((ThreadLimitClause[0] > 0) && (ThreadLimitClause[0] != (uint32_t)-1))
+      return std::min(static_cast<uint32_t>(ConstWGSize), ThreadLimitClause[0]);
+    return ConstWGSize;
+  }
   return std::min(MaxNumThreads, (ThreadLimitClause[0] > 0)
                                      ? ThreadLimitClause[0]
                                      : PreferredNumThreads);
@@ -662,6 +674,16 @@ uint32_t GenericKernelTy::getNumBlocks(GenericDeviceTy &GenericDevice,
     return std::min(NumTeamsClause[0], GenericDevice.getBlockLimit());
   }
 
+  const auto getNumGroupsFromThreadsAndTripCount =
+      [](const uint64_t TripCount, const uint32_t NumThreads) {
+        return ((TripCount - 1) / NumThreads) + 1;
+      };
+  if (isNoLoopMode()) {
+    return LoopTripCount > 0
+               ? getNumGroupsFromThreadsAndTripCount(LoopTripCount, NumThreads)
+               : 1;
+  }
+
   uint64_t DefaultNumBlocks = GenericDevice.getDefaultNumBlocks();
   uint64_t TripCountNumBlocks = std::numeric_limits<uint64_t>::max();
   if (LoopTripCount > 0) {

DominikAdamski · 2025-08-18T12:14:14Z

PR for changes in the device RTL: #151959

jhuber6

Did you miss the one in the DeviceRTL?

DominikAdamski · 2025-08-18T12:18:37Z

Did you miss the one in the DeviceRTL?

Please see: #151959 . I am planning to add support for No-Loop mode initially for Fortran OpenMP kernels.

jhuber6 · 2025-08-18T12:19:41Z

Did you miss the one in the DeviceRTL?

Please see: #151959 . I am planning to add support for No-Loop mode initially for Fortran OpenMP kernels.

I meant

llvm-project/offload/DeviceRTL/src/Kernel.cpp

Line 28 in c6fe567

enum OMPTgtExecModeFlags : unsigned char {

DominikAdamski · 2025-08-18T12:28:15Z

Did you miss the one in the DeviceRTL?

Please see: #151959 . I am planning to add support for No-Loop mode initially for Fortran OpenMP kernels.

I meant

llvm-project/offload/DeviceRTL/src/Kernel.cpp

Line 28 in c6fe567

enum OMPTgtExecModeFlags : unsigned char {

Added

Meinersbur

LGTM

offload/plugins-nextgen/common/src/PluginInterface.cpp

dhruvachak · 2025-08-25T16:47:33Z

offload/plugins-nextgen/common/src/PluginInterface.cpp

@@ -640,6 +640,18 @@ uint32_t GenericKernelTy::getNumThreads(GenericDeviceTy &GenericDevice,
  if (ThreadLimitClause[0] > 0 && isGenericMode())
    ThreadLimitClause[0] += GenericDevice.getWarpSize();

+  // Honor OMP_TEAMS_THREAD_LIMIT environment variable and


Taking a step back, what is the reason for special-casing getNumThreads handling for No-Loop kernels? I believe the existing logic (involving MaxNumThreads and PreferredNumThreads) handles both thread-related envars and OpenMP clauses. Is there a test case for No-Loop that does not work with the existing logic?

The primary change required for No-Loop kernels is making sure that the grid size is appropriate and that is ensured by the change in getNumBlocks. I am wondering whether this special handling in getNumThreads can be removed altogether.

Is there a test case for No-Loop that does not work with the existing logic?
I haven't found and I removed unnecessary code.

shiltian · 2025-08-27T04:41:08Z

offload/plugins-nextgen/common/src/PluginInterface.cpp

+    if (TeamsThreadLimitEnvVar > 0)
+      return std::min(static_cast<int32_t>(ConstWGSize),
+                      TeamsThreadLimitEnvVar);
+    if ((ThreadLimitClause[0] > 0) && (ThreadLimitClause[0] != (uint32_t)-1))


Suggested change

if ((ThreadLimitClause[0] > 0) && (ThreadLimitClause[0] != (uint32_t)-1))

if ((ThreadLimitClause[0] > 0) && (ThreadLimitClause[0] != ~0U))

shiltian · 2025-08-27T04:41:52Z

offload/plugins-nextgen/common/src/PluginInterface.cpp

@@ -662,6 +674,16 @@ uint32_t GenericKernelTy::getNumBlocks(GenericDeviceTy &GenericDevice,
    return std::min(NumTeamsClause[0], GenericDevice.getBlockLimit());
  }

+  const auto getNumGroupsFromThreadsAndTripCount =


What is the point of doing a lambda here?

I removed lambda

shiltian · 2025-08-27T16:05:55Z

offload/plugins-nextgen/common/include/PluginInterface.h

@@ -1167,6 +1174,9 @@ struct GenericDeviceTy : public DeviceAllocatorTy {
  /// deallocated by the allocator.
  llvm::SmallVector<DeviceImageTy *> LoadedImages;

+  /// Return value of OMP_TEAMS_THREAD_LIMIT environment variable
+  int32_t getOMPTeamsThreadLimit() const { return OMP_TeamsThreadLimit; }


This is not used anymore.

shiltian · 2025-08-27T16:06:07Z

offload/plugins-nextgen/common/src/PluginInterface.cpp

+  if (isNoLoopMode()) {
+    return LoopTripCount > 0 ? (((LoopTripCount - 1) / NumThreads) + 1) : 1;
+  }


Suggested change

if (isNoLoopMode()) {

return LoopTripCount > 0 ? (((LoopTripCount - 1) / NumThreads) + 1) : 1;

}

if (isNoLoopMode())

return LoopTripCount > 0 ? (((LoopTripCount - 1) / NumThreads) + 1) : 1;

dhruvachak

LGTM

DominikAdamski requested review from Meinersbur, skatrak, kevinsala, shiltian, jhuber6 and dhruvachak August 18, 2025 12:12

llvmbot added flang:openmp clang:openmp OpenMP related changes to Clang offload labels Aug 18, 2025

jhuber6 reviewed Aug 18, 2025

View reviewed changes

Applied remarks

2beb3b4

jhuber6 approved these changes Aug 18, 2025

View reviewed changes

Meinersbur approved these changes Aug 19, 2025

View reviewed changes

DominikAdamski requested a review from jdoerfert August 19, 2025 14:04

dhruvachak reviewed Aug 22, 2025

View reviewed changes

offload/plugins-nextgen/common/src/PluginInterface.cpp Outdated Show resolved Hide resolved

DominikAdamski added 2 commits August 25, 2025 05:48

applied_fix

6e549d9

Merge branch 'main' into omp_no_loop_host_side

935cd3b

dhruvachak reviewed Aug 25, 2025

View reviewed changes

Merge branch 'main' into omp_no_loop_host_side

4e95a3c

shiltian reviewed Aug 27, 2025

View reviewed changes

Applied remarks

b5d8b41

shiltian approved these changes Aug 27, 2025

View reviewed changes

Applied remarks

0aee4df

dhruvachak approved these changes Aug 27, 2025

View reviewed changes

kevinsala approved these changes Aug 27, 2025

View reviewed changes

DominikAdamski merged commit 87db8e9 into llvm:main Aug 28, 2025
9 checks passed

DominikAdamski mentioned this pull request Aug 28, 2025

[Flang][OpenMP] Enable no-loop kernels #155818

Open

	if ((ThreadLimitClause[0] > 0) && (ThreadLimitClause[0] != (uint32_t)-1))
	if ((ThreadLimitClause[0] > 0) && (ThreadLimitClause[0] != ~0U))

[OpenMP][Offload] Add SPMD-No-Loop mode to OpenMP offload runtime #154105

[OpenMP][Offload] Add SPMD-No-Loop mode to OpenMP offload runtime #154105

Conversation

DominikAdamski commented Aug 18, 2025

Uh oh!

llvmbot commented Aug 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DominikAdamski commented Aug 18, 2025

Uh oh!

jhuber6 left a comment

Choose a reason for hiding this comment

Uh oh!

DominikAdamski commented Aug 18, 2025

Uh oh!

jhuber6 commented Aug 18, 2025

Uh oh!

DominikAdamski commented Aug 18, 2025

Uh oh!

Meinersbur left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

dhruvachak Aug 25, 2025

Choose a reason for hiding this comment

Uh oh!

DominikAdamski Aug 27, 2025

Choose a reason for hiding this comment

Uh oh!

shiltian Aug 27, 2025

Choose a reason for hiding this comment

Uh oh!

shiltian Aug 27, 2025

Choose a reason for hiding this comment

Uh oh!

DominikAdamski Aug 27, 2025

Choose a reason for hiding this comment

Uh oh!

shiltian Aug 27, 2025

Choose a reason for hiding this comment

Uh oh!

shiltian Aug 27, 2025

Choose a reason for hiding this comment

Uh oh!

dhruvachak left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

llvmbot commented Aug 18, 2025 •

edited

Loading