-
Notifications
You must be signed in to change notification settings - Fork 14.9k
[SYCL] Add offload wrapping for SYCL kind. #147508
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@llvm/pr-subscribers-llvm-binary-utilities @llvm/pr-subscribers-clang Author: Maksim Sabianin (maksimsab) ChangesPatch is 34.84 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/147508.diff 11 Files Affected:
diff --git a/clang/test/Driver/linker-wrapper-image.c b/clang/test/Driver/linker-wrapper-image.c
index c0de56d58196a..67bb21bfe49b4 100644
--- a/clang/test/Driver/linker-wrapper-image.c
+++ b/clang/test/Driver/linker-wrapper-image.c
@@ -1,6 +1,7 @@
// REQUIRES: x86-registered-target
// REQUIRES: nvptx-registered-target
// REQUIRES: amdgpu-registered-target
+// REQUIRES: spirv-registered-target
// RUN: %clang -cc1 %s -triple x86_64-unknown-linux-gnu -emit-obj -o %t.elf.o
@@ -263,3 +264,37 @@
// HIP: while.end:
// HIP-NEXT: ret void
// HIP-NEXT: }
+
+// RUN: clang-offload-packager -o %t.out --image=file=%t.elf.o,kind=sycl,triple=spirv64-unknown-unknown,arch=generic
+// RUN: %clang -cc1 %s -triple x86_64-unknown-linux-gnu -emit-obj -o %t.o \
+// RUN: -fembed-offload-object=%t.out
+// RUN: clang-linker-wrapper --print-wrapped-module --dry-run --host-triple=x86_64-unknown-linux-gnu \
+// RUN: --linker-path=/usr/bin/ld %t.o -o a.out 2>&1 | FileCheck %s --check-prefixes=SYCL
+// RUN: clang-linker-wrapper --print-wrapped-module --dry-run --host-triple=x86_64-unknown-linux-gnu -r \
+// RUN: --linker-path=/usr/bin/ld %t.o -o a.out 2>&1 | FileCheck %s --check-prefixes=SYCL
+
+// SYCL: %__sycl.tgt_device_image = type { i16, i8, i8, ptr, ptr, ptr, ptr, ptr, ptr, ptr, ptr, ptr }
+// SYCL-NEXT: %__sycl.tgt_bin_desc = type { i16, i16, ptr, ptr, ptr }
+
+// SYCL: @.sycl_offloading.target.0 = internal unnamed_addr constant [1 x i8] zeroinitializer
+// SYCL-NEXT: @.sycl_offloading.opts.compile.0 = internal unnamed_addr constant [1 x i8] zeroinitializer
+// SYCL-NEXT: @.sycl_offloading.opts.link.0 = internal unnamed_addr constant [1 x i8] zeroinitializer
+// SYCL-NEXT: @.sycl_offloading.0.data = internal unnamed_addr constant [0 x i8] zeroinitializer
+// SYCL-NEXT: @.sycl_offloading.0.info = internal local_unnamed_addr constant [2 x i64] [i64 ptrtoint (ptr @.sycl_offloading.0.data to i64), i64 0], section ".tgtimg", align 16
+// SYCL-NEXT: @llvm.used = appending global [1 x ptr] [ptr @.sycl_offloading.0.info], section "llvm.metadata"
+// SYCL-NEXT: @.sycl_offloading.device_images = internal unnamed_addr constant [1 x %__sycl.tgt_device_image] [%__sycl.tgt_device_image { i16 3, i8 8, i8 0, ptr @.sycl_offloading.target.0, ptr @.sycl_offloading.opts.compile.0, ptr @.sycl_offloading.opts.link.0, ptr @.sycl_offloading.0.data, ptr @.sycl_offloading.0.data, ptr null, ptr null, ptr null, ptr null }]
+// SYCL-NEXT: @.sycl_offloading.descriptor = internal constant %__sycl.tgt_bin_desc { i16 1, i16 1, ptr @.sycl_offloading.device_images, ptr null, ptr null }
+// SYCL-NEXT: @llvm.global_ctors = appending global [1 x { i32, ptr, ptr }] [{ i32, ptr, ptr } { i32 1, ptr @sycl.descriptor_reg, ptr null }]
+// SYCL-NEXT: @llvm.global_dtors = appending global [1 x { i32, ptr, ptr }] [{ i32, ptr, ptr } { i32 1, ptr @sycl.descriptor_unreg, ptr null }]
+
+// SYCL: define internal void @sycl.descriptor_reg() section ".text.startup" {
+// SYCL-NEXT: entry:
+// SYCL-NEXT: call void @__sycl_register_lib(ptr @.sycl_offloading.descriptor)
+// SYCL-NEXT: ret void
+// SYCL-NEXT: }
+
+// SYCL: define internal void @sycl.descriptor_unreg() section ".text.startup" {
+// SYCL-NEXT: entry:
+// SYCL-NEXT: call void @__sycl_unregister_lib(ptr @.sycl_offloading.descriptor)
+// SYCL-NEXT: ret void
+// SYCL-NEXT: }
diff --git a/clang/test/Driver/linker-wrapper.c b/clang/test/Driver/linker-wrapper.c
index 80b1a5745a123..5ab8a09660e57 100644
--- a/clang/test/Driver/linker-wrapper.c
+++ b/clang/test/Driver/linker-wrapper.c
@@ -54,7 +54,7 @@ __attribute__((visibility("protected"), used)) int x;
// RUN: clang-offload-packager -o %t.out \
// RUN: --image=file=%t.spirv.bc,kind=sycl,triple=spirv64-unknown-unknown,arch=generic
// RUN: %clang -cc1 %s -triple x86_64-unknown-linux-gnu -emit-obj -o %t.o -fembed-offload-object=%t.out
-// RUN: not clang-linker-wrapper --host-triple=x86_64-unknown-linux-gnu --dry-run \
+// RUN: clang-linker-wrapper --host-triple=x86_64-unknown-linux-gnu --dry-run \
// RUN: --linker-path=/usr/bin/ld %t.o -o a.out 2>&1 | FileCheck %s --check-prefix=SPIRV-LINK
// SPIRV-LINK: clang{{.*}} -o {{.*}}.img --target=spirv64-unknown-unknown {{.*}}.o --sycl-link -Xlinker -triple=spirv64-unknown-unknown -Xlinker -arch=
diff --git a/clang/tools/clang-linker-wrapper/CMakeLists.txt b/clang/tools/clang-linker-wrapper/CMakeLists.txt
index bf37d8031025e..741e3fbbefb74 100644
--- a/clang/tools/clang-linker-wrapper/CMakeLists.txt
+++ b/clang/tools/clang-linker-wrapper/CMakeLists.txt
@@ -16,6 +16,7 @@ set(LLVM_LINK_COMPONENTS
CodeGen
LTO
FrontendOffloading
+ FrontendSYCL
)
set(LLVM_TARGET_DEFINITIONS LinkerWrapperOpts.td)
diff --git a/clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp b/clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp
index 0f1fa8b329fd6..9a466d6e69c31 100644
--- a/clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp
+++ b/clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp
@@ -22,6 +22,7 @@
#include "llvm/CodeGen/CommandFlags.h"
#include "llvm/Frontend/Offloading/OffloadWrapper.h"
#include "llvm/Frontend/Offloading/Utility.h"
+#include "llvm/Frontend/SYCL/OffloadWrapper.h"
#include "llvm/IR/Constants.h"
#include "llvm/IR/DiagnosticPrinter.h"
#include "llvm/IR/Module.h"
@@ -711,6 +712,13 @@ wrapDeviceImages(ArrayRef<std::unique_ptr<MemoryBuffer>> Buffers,
M, BuffersToWrap.front(), offloading::getOffloadEntryArray(M)))
return std::move(Err);
break;
+ case OFK_SYCL: {
+ offloading::sycl::SYCLWrappingOptions WrappingOptions;
+ if (Error Err = offloading::sycl::wrapSYCLBinaries(M, BuffersToWrap,
+ WrappingOptions))
+ return Err;
+ break;
+ }
default:
return createStringError(getOffloadKindName(Kind) +
" wrapping is not supported");
@@ -748,6 +756,36 @@ bundleOpenMP(ArrayRef<OffloadingImage> Images) {
return std::move(Buffers);
}
+Expected<SmallVector<std::unique_ptr<MemoryBuffer>>>
+bundleSYCL(ArrayRef<OffloadingImage> Images) {
+ SmallVector<std::unique_ptr<MemoryBuffer>> Buffers;
+ if (DryRun) {
+ // In dry-run mode there is an empty input which is insufficient for
+ // the testing. Therefore, we insert a stub value.
+ OffloadBinary::OffloadingImage Image;
+ Image.TheOffloadKind = OffloadKind::OFK_SYCL;
+ Image.Image = MemoryBuffer::getMemBufferCopy("");
+ SmallString<0> SerializedImage = OffloadBinary::write(Image);
+ Buffers.emplace_back(MemoryBuffer::getMemBufferCopy(SerializedImage));
+ return Buffers;
+ }
+
+ for (const OffloadingImage &TheImage : Images) {
+ SmallVector<OffloadFile> OffloadBinaries;
+ if (Error E = extractOffloadBinaries(*TheImage.Image, OffloadBinaries))
+ return E;
+
+ for (const OffloadFile &File : OffloadBinaries) {
+ const OffloadBinary &Binary = *File.getBinary();
+ SmallString<0> SerializedImage =
+ OffloadBinary::write(Binary.getOffloadingImage());
+ Buffers.emplace_back(MemoryBuffer::getMemBufferCopy(SerializedImage));
+ }
+ }
+
+ return Buffers;
+}
+
Expected<SmallVector<std::unique_ptr<MemoryBuffer>>>
bundleCuda(ArrayRef<OffloadingImage> Images, const ArgList &Args) {
SmallVector<std::pair<StringRef, StringRef>, 4> InputFiles;
@@ -800,8 +838,9 @@ bundleLinkedOutput(ArrayRef<OffloadingImage> Images, const ArgList &Args,
llvm::TimeTraceScope TimeScope("Bundle linked output");
switch (Kind) {
case OFK_OpenMP:
- case OFK_SYCL:
return bundleOpenMP(Images);
+ case OFK_SYCL:
+ return bundleSYCL(Images);
case OFK_Cuda:
return bundleCuda(Images, Args);
case OFK_HIP:
diff --git a/llvm/include/llvm/Frontend/SYCL/OffloadWrapper.h b/llvm/include/llvm/Frontend/SYCL/OffloadWrapper.h
new file mode 100644
index 0000000000000..f89411c86984d
--- /dev/null
+++ b/llvm/include/llvm/Frontend/SYCL/OffloadWrapper.h
@@ -0,0 +1,44 @@
+//===----- OffloadWrapper.h -------------------------------------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef LLVM_FRONTEND_SYCL_OFFLOAD_WRAPPER_H
+#define LLVM_FRONTEND_SYCL_OFFLOAD_WRAPPER_H
+
+#include "llvm/ADT/ArrayRef.h"
+#include "llvm/Object/OffloadBinary.h"
+
+#include <string>
+
+namespace llvm {
+
+class Module;
+
+namespace offloading {
+namespace sycl {
+
+struct SYCLWrappingOptions {
+ // target/compiler specific options what are suggested to use to "compile"
+ // program at runtime.
+ std::string CompileOptions;
+ // Target/Compiler specific options that are suggested to use to "link"
+ // program at runtime.
+ std::string LinkOptions;
+};
+
+/// Wraps OffloadBinaries in the given \p Buffers into the module \p M
+/// as global symbols and registers the images with the SYCL Runtime.
+/// \param Options Settings that allows to turn on optional data and settings.
+llvm::Error
+wrapSYCLBinaries(llvm::Module &M, llvm::ArrayRef<llvm::ArrayRef<char>> Buffers,
+ SYCLWrappingOptions Options = SYCLWrappingOptions());
+
+} // namespace sycl
+} // namespace offloading
+} // namespace llvm
+
+#endif // LLVM_FRONTEND_SYCL_OFFLOAD_WRAPPER_H
diff --git a/llvm/include/llvm/Object/OffloadBinary.h b/llvm/include/llvm/Object/OffloadBinary.h
index b5c845fa8eb70..9d137db834f08 100644
--- a/llvm/include/llvm/Object/OffloadBinary.h
+++ b/llvm/include/llvm/Object/OffloadBinary.h
@@ -48,6 +48,7 @@ enum ImageKind : uint16_t {
IMG_Cubin,
IMG_Fatbinary,
IMG_PTX,
+ IMG_SPIRV,
IMG_LAST,
};
@@ -70,9 +71,9 @@ class OffloadBinary : public Binary {
/// The offloading metadata that will be serialized to a memory buffer.
struct OffloadingImage {
- ImageKind TheImageKind;
- OffloadKind TheOffloadKind;
- uint32_t Flags;
+ ImageKind TheImageKind = ImageKind::IMG_None;
+ OffloadKind TheOffloadKind = OffloadKind::OFK_None;
+ uint32_t Flags = 0;
MapVector<StringRef, StringRef> StringData;
std::unique_ptr<MemoryBuffer> Image;
};
@@ -84,6 +85,8 @@ class OffloadBinary : public Binary {
/// Serialize the contents of \p File to a binary buffer to be read later.
LLVM_ABI static SmallString<0> write(const OffloadingImage &);
+ OffloadingImage getOffloadingImage() const;
+
static uint64_t getAlignment() { return 8; }
ImageKind getImageKind() const { return TheEntry->TheImageKind; }
diff --git a/llvm/lib/Frontend/CMakeLists.txt b/llvm/lib/Frontend/CMakeLists.txt
index 3b31e6f8dec96..6c4b8362c04fd 100644
--- a/llvm/lib/Frontend/CMakeLists.txt
+++ b/llvm/lib/Frontend/CMakeLists.txt
@@ -5,3 +5,4 @@ add_subdirectory(HLSL)
add_subdirectory(OpenACC)
add_subdirectory(OpenMP)
add_subdirectory(Offloading)
+add_subdirectory(SYCL)
diff --git a/llvm/lib/Frontend/SYCL/CMakeLists.txt b/llvm/lib/Frontend/SYCL/CMakeLists.txt
new file mode 100644
index 0000000000000..355ae5f7955a8
--- /dev/null
+++ b/llvm/lib/Frontend/SYCL/CMakeLists.txt
@@ -0,0 +1,14 @@
+add_llvm_component_library(LLVMFrontendSYCL
+ OffloadWrapper.cpp
+
+ ADDITIONAL_HEADER_DIRS
+ ${LLVM_MAIN_INCLUDE_DIR}/llvm/Frontend
+ ${LLVM_MAIN_INCLUDE_DIR}/llvm/Frontend/SYCL
+
+ LINK_COMPONENTS
+ Core
+ FrontendOffloading
+ Object
+ Support
+ TransformUtils
+ )
diff --git a/llvm/lib/Frontend/SYCL/OffloadWrapper.cpp b/llvm/lib/Frontend/SYCL/OffloadWrapper.cpp
new file mode 100644
index 0000000000000..c0d160c39e93d
--- /dev/null
+++ b/llvm/lib/Frontend/SYCL/OffloadWrapper.cpp
@@ -0,0 +1,513 @@
+//===- SYCLOffloadWrapper.cpp -----------------------------------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#include "llvm/Frontend/SYCL/OffloadWrapper.h"
+#include "llvm/ADT/ArrayRef.h"
+#include "llvm/ADT/SmallVector.h"
+#include "llvm/ADT/StringRef.h"
+#include "llvm/ADT/Twine.h"
+#include "llvm/Frontend/Offloading/Utility.h"
+#include "llvm/IR/Constants.h"
+#include "llvm/IR/DerivedTypes.h"
+#include "llvm/IR/GlobalVariable.h"
+#include "llvm/IR/IRBuilder.h"
+#include "llvm/IR/LLVMContext.h"
+#include "llvm/IR/Module.h"
+#include "llvm/IR/Type.h"
+#include "llvm/Object/OffloadBinary.h"
+#include "llvm/Support/Error.h"
+#include "llvm/Support/ErrorHandling.h"
+#include "llvm/Support/LineIterator.h"
+#include "llvm/Support/MemoryBufferRef.h"
+#include "llvm/Transforms/Utils/ModuleUtils.h"
+
+#include <memory>
+#include <string>
+#include <utility>
+
+using namespace llvm;
+using namespace llvm::object;
+using namespace llvm::offloading;
+using namespace llvm::offloading::sycl;
+
+using OffloadingImage = OffloadBinary::OffloadingImage;
+
+namespace {
+
+/// Wrapper helper class that creates all LLVM IRs wrapping given images.
+struct Wrapper {
+ Module &M;
+ LLVMContext &C;
+ SYCLWrappingOptions Options;
+
+ StructType *EntryTy = nullptr;
+ StructType *SyclDeviceImageTy = nullptr;
+ StructType *SyclBinDescTy = nullptr;
+
+ Wrapper(Module &M, const SYCLWrappingOptions &Options)
+ : M(M), C(M.getContext()), Options(Options) {
+
+ EntryTy = offloading::getEntryTy(M);
+ SyclDeviceImageTy = getSyclDeviceImageTy();
+ SyclBinDescTy = getSyclBinDescTy();
+ }
+
+ IntegerType *getSizeTTy() {
+ switch (M.getDataLayout().getPointerSize()) {
+ case 4:
+ return Type::getInt32Ty(C);
+ case 8:
+ return Type::getInt64Ty(C);
+ }
+ llvm_unreachable("unsupported pointer type size");
+ }
+
+ SmallVector<Constant *, 2> getSizetConstPair(size_t First, size_t Second) {
+ IntegerType *SizeTTy = getSizeTTy();
+ return SmallVector<Constant *, 2>{ConstantInt::get(SizeTTy, First),
+ ConstantInt::get(SizeTTy, Second)};
+ }
+
+ /// Note: Properties aren't supported and the support is going
+ /// to be added later.
+ /// Creates a structure corresponding to:
+ /// SYCL specific image descriptor type.
+ /// \code
+ /// struct __sycl.tgt_device_image {
+ /// // version of this structure - for backward compatibility;
+ /// // all modifications which change order/type/offsets of existing fields
+ /// // should increment the version.
+ /// uint16_t Version;
+ /// // the kind of offload model the image employs.
+ /// uint8_t OffloadKind;
+ /// // format of the image data - SPIRV, LLVMIR bitcode, etc
+ /// uint8_t Format;
+ /// // null-terminated string representation of the device's target
+ /// // architecture
+ /// const char *Arch;
+ /// // a null-terminated string; target- and compiler-specific options
+ /// // which are suggested to use to "compile" program at runtime
+ /// const char *CompileOptions;
+ /// // a null-terminated string; target- and compiler-specific options
+ /// // which are suggested to use to "link" program at runtime
+ /// const char *LinkOptions;
+ /// // Pointer to the device binary image start
+ /// void *ImageStart;
+ /// // Pointer to the device binary image end
+ /// void *ImageEnd;
+ /// // the entry table
+ /// __tgt_offload_entry *EntriesBegin;
+ /// __tgt_offload_entry *EntriesEnd;
+ /// const char *PropertiesBegin;
+ /// const char *PropertiesEnd;
+ /// };
+ /// \endcode
+ StructType *getSyclDeviceImageTy() {
+ return StructType::create(
+ {
+ Type::getInt16Ty(C), // Version
+ Type::getInt8Ty(C), // OffloadKind
+ Type::getInt8Ty(C), // Format
+ PointerType::getUnqual(C), // Arch
+ PointerType::getUnqual(C), // CompileOptions
+ PointerType::getUnqual(C), // LinkOptions
+ PointerType::getUnqual(C), // ImageStart
+ PointerType::getUnqual(C), // ImageEnd
+ PointerType::getUnqual(C), // EntriesBegin
+ PointerType::getUnqual(C), // EntriesEnd
+ PointerType::getUnqual(C), // PropertiesBegin
+ PointerType::getUnqual(C) // PropertiesEnd
+ },
+ "__sycl.tgt_device_image");
+ }
+
+ /// Creates a structure for SYCL specific binary descriptor type. Corresponds
+ /// to:
+ ///
+ /// \code
+ /// struct __sycl.tgt_bin_desc {
+ /// // version of this structure - for backward compatibility;
+ /// // all modifications which change order/type/offsets of existing fields
+ /// // should increment the version.
+ /// uint16_t Version;
+ /// uint16_t NumDeviceImages;
+ /// __sycl.tgt_device_image *DeviceImages;
+ /// // the offload entry table
+ /// __tgt_offload_entry *HostEntriesBegin;
+ /// __tgt_offload_entry *HostEntriesEnd;
+ /// };
+ /// \endcode
+ StructType *getSyclBinDescTy() {
+ return StructType::create(
+ {Type::getInt16Ty(C), Type::getInt16Ty(C), PointerType::getUnqual(C),
+ PointerType::getUnqual(C), PointerType::getUnqual(C)},
+ "__sycl.tgt_bin_desc");
+ }
+
+ /// Adds a global readonly variable that is initialized by given
+ /// \p Initializer to the module.
+ GlobalVariable *addGlobalArrayVariable(const Twine &Name,
+ ArrayRef<char> Initializer,
+ const Twine &Section = "") {
+ auto *Arr = ConstantDataArray::get(M.getContext(), Initializer);
+ auto *Var = new GlobalVariable(M, Arr->getType(), /*isConstant*/ true,
+ GlobalVariable::InternalLinkage, Arr, Name);
+ Var->setUnnamedAddr(GlobalValue::UnnamedAddr::Global);
+
+ SmallVector<char, 32> NameBuf;
+ auto SectionName = Section.toStringRef(NameBuf);
+ if (!SectionName.empty())
+ Var->setSection(SectionName);
+ return Var;
+ }
+
+ /// Adds given \p Buf as a global variable into the module.
+ /// \returns Pair of pointers that point at the beginning and the end of the
+ /// variable.
+ std::pair<Constant *, Constant *>
+ addArrayToModule(ArrayRef<char> Buf, const Twine &Name,
+ const Twine &Section = "") {
+ auto *Var = addGlobalArrayVariable(Name, Buf, Section);
+ auto *ImageB = ConstantExpr::getGetElementPtr(Var->getValueType(), Var,
+ getSizetConstPair(0, 0));
+ auto *ImageE = ConstantExpr::getGetElementPtr(
+ Var->getValueType(), Var, getSizetConstPair(0, Buf.size()));
+ return std::make_pair(ImageB, ImageE);
+ }
+
+ /// Adds given \p Data as constant byte array in the module.
+ /// \returns Constant pointer to the added data. The pointer type does not
+ /// carry size information.
+ Constant *addRawDataToModule(ArrayRef<char> Data, const Twine &Name) {
+ auto *Var = addGlobalArrayVariable(Name, Data);
+ auto *DataPtr = ConstantExpr::getGetElementPtr(Var->getValueType(), Var,
+ getSizetConstPair(0, 0));
+ return DataPtr;
+ }
+
+ /// Creates a global variable of const char* type and creates an
+ /// initializer that initializes it with \p Str.
+ ///
+ /// \returns Link-time constant pointer (constant expr) to that
+ /// variable.
+ Constant *addStringToModule(StringRef Str, const Twine &Name) {
+ auto *Arr = ConstantDataArray::getString(C, Str);
+ auto *Var = new GlobalVariable(M, Arr->getType(), /*isConstant*/ true,
+ GlobalVariable::InternalLinkage, Arr, Name);
+ Var->setUnnamedAddr(GlobalValue::UnnamedAddr::Global);
+ auto *Zero = ConstantInt::get(getSizeTTy(), 0);
+ Constant *ZeroZero[] = {Zero, Zero};
+ return ConstantExpr::getGetElementPtr(Var->getValueType(), Var, ZeroZero);
+ }
+
+ /// Creates a global variable of array of structs and initializes
+ /// it with the given values in \p ArrayData.
+ ///
+ /// \returns Pair of Constants that point at array content.
+ /// If \p ArrayData is empty then a returned pair contains nullptrs.
+ std::pair<Constant *, Constant *>
+ addStructArrayToModule(ArrayRef<Constant *> ArrayData, Type *ElemTy) {
+ if (ArrayData.empty()) {
+ auto *PtrTy = llvm::PointerType::getUnqual(ElemTy->getContext());
+ auto *NullPtr = Constant::getNullValue(PtrTy)...
[truncated]
|
Hi @jdoerfert @jhuber6 ! We are adding offload wrapping for SYCL in this PR. Do you have any suggestions for whom I should request for a code review? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need a separate directory for the offload wrapper?
// SYCL-NEXT: @.sycl_offloading.0.info = internal local_unnamed_addr constant [2 x i64] [i64 ptrtoint (ptr @.sycl_offloading.0.data to i64), i64 0], section ".tgtimg", align 16 | ||
// SYCL-NEXT: @llvm.used = appending global [1 x ptr] [ptr @.sycl_offloading.0.info], section "llvm.metadata" | ||
// SYCL-NEXT: @.sycl_offloading.device_images = internal unnamed_addr constant [1 x %__sycl.tgt_device_image] [%__sycl.tgt_device_image { i16 3, i8 8, i8 0, ptr @.sycl_offloading.target.0, ptr @.sycl_offloading.opts.compile.0, ptr @.sycl_offloading.opts.link.0, ptr @.sycl_offloading.0.data, ptr @.sycl_offloading.0.data, ptr null, ptr null, ptr null, ptr null }] | ||
// SYCL-NEXT: @.sycl_offloading.descriptor = internal constant %__sycl.tgt_bin_desc { i16 1, i16 1, ptr @.sycl_offloading.device_images, ptr null, ptr null } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this just copied from OpenMP?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RUN
lines are copied from OpenMP and adjusted to SYCL case. Output's checks are copied from the clang-linker-wrapper
's output.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess I'm just wondering why everything's called tgt_bin_desc
and similar.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initially, we customized old clang-offload-wrapper
and then we extracted SYCL logic in a separate place. That is how these names stuck with us.
@jhuber6 |
OpenMP and HIP don't look similar either but I figured it was easier to just put them all together because they're basically exporting the same interface. |
I've merged SYCL wrapper with all others. |
/// | ||
/// \returns Pair of Constants that point at entries content. | ||
std::pair<Constant *, Constant *> | ||
addOffloadEntriesToModule(StringRef Entries) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this stuff not common with the other targets?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see a divergence with offloading::getOffloadEntryArray
. Other targets don't set char *SymbolName
for some reason.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you elaborate a bit more? Other targets definitely set the symbol name when they call emitOffloadingEntry
. We shouldn't be duplicating functionality here. Is there something preventing you from just updating the other function with whatever you need?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Entries could be emitted by emitOffloadEntry
. Then I should create an array that refers to these entries.
I don't see the appropriate other function that does that. The most closest one is offloading::getOffloadEntryArray
but this function creates an empty array if I get it right. Therefore, I would need to modify that function in a strange way.
If you have other opinion regarding getOffloadEntryArray
please let me know.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
getOffloadEntryArray
definitely does not create an empty array. It creates a pair of symbols that will let you iterate the offloading entries emitted in each TU once the linker combines them into a single section.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd greatly encourage SYCL to be consistent. The only roadblock I can think of is if you ship on Mac OS, because I never got around to making runtime arrays work there, but for Widnows / Linux it does.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, our compilation flow consists of steps:
- Link all modules together in Module
LM
. - Then split
LM
into several partsSM_1
, ...,SM_N
.
If we decide to stick with linker trick then it would complicate introduced splitting functionality in #131347. We would have to introduce some SYCL specific handling of emitted entry arrays. Also, the mentioned Mac OS case concerns me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, our compilation flow consists of steps:
Link all modules together in Module LM.
Then split LM into several parts SM_1, ..., SM_N.
Could you explain this a bit further? Trivially you could just put some metadata in the offloading entries and skip over them as-needed in your wrapping code. I fail to see how that's a blocker, though it might be slightly annoying if you split them out later. I don't have the full view of how you choose to do your splitting. Another option, and one I do for relocatable links, is to just rename the section.
Also, the mentioned Mac OS case concerns me.
There is a way to do it in Mac OS, I just haven't gotten around to writing it, there's no incentive since GPU offloading on Mac-OS is effectively dead as far as I'm aware.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's consider the following example.
One TU has 2 kernels (A, B) and the user requests a split by kernels.
In unified approach we would have the output from the Codegen like the following:
kernel A() {...}
kernel B() {...}
@entry_A = %tgt_offload_entry {...} ; info about kernel A
@entry_B = %tgt_offload_entry {...} ; info about kernel B.
Then we have to split these kernels using splitModuleTransitiveFromEntryPoints
added in #131347. In order to correctly move these entries, we have to add such SYCL specific handling functionality in splitModuleTransitiveFromEntryPoints
, which is very discouraged according to my understanding. That is only my concern about that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't fully understand the SYCL workflow unfortunately. If we are splitting the module (serveral IR blobs which may or may not contain the requested function) we could possibly just try each entry for each image. It's N^2 technically but I feel like these lists will be small enough that it won't blow up. I.e.
for image in images:
for entry in entries:
if (entry.name in image.symbols)
register(entry, image);
offloading::SYCLWrappingOptions WrappingOptions; | ||
if (Error Err = | ||
offloading::wrapSYCLBinaries(M, BuffersToWrap, WrappingOptions)) | ||
return Err; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit
return Err; | |
return std::move(Err); |
No description provided.