-
Notifications
You must be signed in to change notification settings - Fork 14.9k
[Flang][OpenMP] Defer descriptor mapping for assumed dummy argument types #154349
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@llvm/pr-subscribers-offload @llvm/pr-subscribers-flang-openmp Author: None (agozillon) ChangesThis PR adds deferral of descriptor maps until they are necessary for assumed dummy argument types. The intent is to avoid a problem where a user can inadvertently map a temporary local descriptor to device without their knowledge and proceed to never unmap it. This temporary local descriptor remains lodged in OpenMP device memory and the next time another variable or descriptor residing in the same stack address is mapped we incur a runtime OpenMP map error as we try to remap the same address. This fix was discussed with the OpenMP committee and applies to OpenMP 5.2 and below, future versions of OpenMP can avoid this issue via the attach semantics added to the specification. Patch is 20.54 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/154349.diff 4 Files Affected:
diff --git a/flang/lib/Optimizer/OpenMP/MapInfoFinalization.cpp b/flang/lib/Optimizer/OpenMP/MapInfoFinalization.cpp
index 57be863cfa1b8..4940cbc8f668d 100644
--- a/flang/lib/Optimizer/OpenMP/MapInfoFinalization.cpp
+++ b/flang/lib/Optimizer/OpenMP/MapInfoFinalization.cpp
@@ -75,6 +75,38 @@ class MapInfoFinalizationPass
/// | |
std::map<mlir::Operation *, mlir::Value> localBoxAllocas;
+ // List of deferrable descriptors to process at the end of
+ // the pass.
+ llvm::SmallVector<mlir::Operation *> deferrableDesc;
+
+ // Check if the declaration operation we have refers to a dummy
+ // function argument.
+ bool isDummyArgument(mlir::Operation *op) {
+ if (auto declareOp = mlir::dyn_cast<hlfir::DeclareOp>(op))
+ if (auto dummyScope = declareOp.getDummyScope())
+ return true;
+ return false;
+ }
+
+ // Relevant for OpenMP < 5.2, where attach semantics and rules don't exist.
+ // As descriptors were an unspoken implementation detail in these versions
+ // there's certain cases where the user (and the compiler implementation)
+ // can create data mapping errors by having temporary descriptors stuck
+ // in memory. To avoid this we can defer the descriptor mapping in these
+ // cases until target or target data regions, when we can be sure they
+ // have a clear limited scope on device.
+ bool canDeferDescriptorMapping(mlir::Value descriptor) {
+ if (!fir::isAllocatableType(descriptor.getType()) ||
+ !fir::isPointerType(descriptor.getType())) {
+ if (isDummyArgument(descriptor.getDefiningOp()) &&
+ (fir::isAssumedType(descriptor.getType()) ||
+ fir::isAssumedShape(descriptor.getType()))) {
+ return true;
+ }
+ }
+ return false;
+ }
+
/// getMemberUserList gathers all users of a particular MapInfoOp that are
/// other MapInfoOp's and places them into the mapMemberUsers list, which
/// records the map that the current argument MapInfoOp "op" is part of
@@ -126,13 +158,16 @@ class MapInfoFinalizationPass
/// fir::BoxOffsetOp we utilise to access the descriptor datas
/// base address can be utilised.
mlir::Value getDescriptorFromBoxMap(mlir::omp::MapInfoOp boxMap,
- fir::FirOpBuilder &builder) {
+ fir::FirOpBuilder &builder,
+ bool &canDescBeDeferred) {
mlir::Value descriptor = boxMap.getVarPtr();
if (!fir::isTypeWithDescriptor(boxMap.getVarType()))
if (auto addrOp = mlir::dyn_cast_if_present<fir::BoxAddrOp>(
boxMap.getVarPtr().getDefiningOp()))
descriptor = addrOp.getVal();
+ canDescBeDeferred = canDeferDescriptorMapping(descriptor);
+
if (!mlir::isa<fir::BaseBoxType>(descriptor.getType()) &&
!fir::factory::isOptionalArgument(descriptor.getDefiningOp()))
return descriptor;
@@ -294,6 +329,28 @@ class MapInfoFinalizationPass
return false;
}
+ bool isUseDeviceAddr(mlir::omp::MapInfoOp mapOp, mlir::Operation *userOp) {
+ assert(userOp && "Expecting non-null argument");
+ if (auto targetDataOp = llvm::dyn_cast<mlir::omp::TargetDataOp>(userOp)) {
+ for (mlir::Value uda : targetDataOp.getUseDeviceAddrVars()) {
+ if (uda.getDefiningOp() == mapOp)
+ return true;
+ }
+ }
+ return false;
+ }
+
+ bool isUseDevicePtr(mlir::omp::MapInfoOp mapOp, mlir::Operation *userOp) {
+ assert(userOp && "Expecting non-null argument");
+ if (auto targetDataOp = llvm::dyn_cast<mlir::omp::TargetDataOp>(userOp)) {
+ for (mlir::Value udp : targetDataOp.getUseDevicePtrVars()) {
+ if (udp.getDefiningOp() == mapOp)
+ return true;
+ }
+ }
+ return false;
+ }
+
mlir::omp::MapInfoOp genBoxcharMemberMap(mlir::omp::MapInfoOp op,
fir::FirOpBuilder &builder) {
if (!op.getMembers().empty())
@@ -358,7 +415,9 @@ class MapInfoFinalizationPass
// TODO: map the addendum segment of the descriptor, similarly to the
// base address/data pointer member.
- mlir::Value descriptor = getDescriptorFromBoxMap(op, builder);
+ bool descCanBeDeferred = false;
+ mlir::Value descriptor =
+ getDescriptorFromBoxMap(op, builder, descCanBeDeferred);
mlir::ArrayAttr newMembersAttr;
mlir::SmallVector<mlir::Value> newMembers;
@@ -445,6 +504,10 @@ class MapInfoFinalizationPass
/*partial_map=*/builder.getBoolAttr(false));
op.replaceAllUsesWith(newDescParentMapOp.getResult());
op->erase();
+
+ if (descCanBeDeferred)
+ deferrableDesc.push_back(newDescParentMapOp);
+
return newDescParentMapOp;
}
@@ -593,6 +656,125 @@ class MapInfoFinalizationPass
return nullptr;
}
+ void addImplictDescriptorMapToTargetDataOp(mlir::omp::MapInfoOp op,
+ fir::FirOpBuilder &builder,
+ mlir::Operation *target) {
+ // Checks if the map is present as an explicit map already on the target
+ // data directive, and not just present on a use_device_addr/ptr, as if
+ // that's the case, we should not need to add an implicit map for the
+ // descriptor.
+ auto explicitMappingPresent = [](mlir::omp::MapInfoOp op,
+ mlir::omp::TargetDataOp tarData) {
+ // Verify top-level descriptor mapping is at least equal with same
+ // varPtr, the map type should always be To for a descriptor, which is
+ // all we really care about for this mapping as we aim to make sure the
+ // descriptor is always present on device if we're expecting to access
+ // the underlying data.
+ if (tarData.getMapVars().empty())
+ return false;
+
+ for (mlir::Value mapVar : tarData.getMapVars()) {
+ auto mapOp =
+ llvm::dyn_cast<mlir::omp::MapInfoOp>(mapVar.getDefiningOp());
+ if (mapOp.getVarPtr() == op.getVarPtr() &&
+ mapOp.getVarPtrPtr() == mapOp.getVarPtrPtr()) {
+ return true;
+ }
+ }
+
+ return false;
+ };
+
+ // if we're not a top level descriptor with members (e.g. member of a
+ // derived type), we do not want to perform this step.
+ if (!llvm::isa<mlir::omp::TargetDataOp>(target) || op.getMembers().empty())
+ return;
+
+ if (!isUseDeviceAddr(op, target) && !isUseDevicePtr(op, target))
+ return;
+
+ auto targetDataOp = llvm::dyn_cast<mlir::omp::TargetDataOp>(target);
+ if (explicitMappingPresent(op, targetDataOp))
+ return;
+
+ mlir::omp::MapInfoOp newDescParentMapOp =
+ builder.create<mlir::omp::MapInfoOp>(
+ op->getLoc(), op.getResult().getType(), op.getVarPtr(),
+ op.getVarTypeAttr(),
+ builder.getIntegerAttr(
+ builder.getIntegerType(64, false),
+ llvm::to_underlying(
+ llvm::omp::OpenMPOffloadMappingFlags::OMP_MAP_TO |
+ llvm::omp::OpenMPOffloadMappingFlags::OMP_MAP_ALWAYS)),
+ op.getMapCaptureTypeAttr(), /*varPtrPtr=*/mlir::Value{},
+ mlir::SmallVector<mlir::Value>{}, mlir::ArrayAttr{},
+ /*bounds=*/mlir::SmallVector<mlir::Value>{},
+ /*mapperId*/ mlir::FlatSymbolRefAttr(), op.getNameAttr(),
+ /*partial_map=*/builder.getBoolAttr(false));
+
+ targetDataOp.getMapVarsMutable().append({newDescParentMapOp});
+ }
+
+ void removeTopLevelDescriptor(mlir::omp::MapInfoOp op,
+ fir::FirOpBuilder &builder,
+ mlir::Operation *target) {
+ if (llvm::isa<mlir::omp::TargetOp, mlir::omp::TargetDataOp,
+ mlir::omp::DeclareMapperInfoOp>(target))
+ return;
+
+ // if we're not a top level descriptor with members (e.g. member of a
+ // derived type), we do not want to perform this step.
+ if (op.getMembers().empty())
+ return;
+
+ mlir::SmallVector<mlir::Value> newMembers = op.getMembers();
+ mlir::omp::MapInfoOp baseAddr =
+ mlir::dyn_cast_or_null<mlir::omp::MapInfoOp>(
+ newMembers.front().getDefiningOp());
+ assert(baseAddr && "Expected member to be MapInfoOp");
+ newMembers.erase(newMembers.begin());
+
+ llvm::SmallVector<llvm::SmallVector<int64_t>> memberIndices;
+ getMemberIndicesAsVectors(op, memberIndices);
+
+ // Can skip the extra processing if there's only 1 member as it'd
+ // be the base addresses, which we're promoting to the parent.
+ mlir::ArrayAttr newMembersAttr;
+ if (memberIndices.size() > 1) {
+ memberIndices.erase(memberIndices.begin());
+ newMembersAttr = builder.create2DI64ArrayAttr(memberIndices);
+ }
+
+ // VarPtrPtr is tied to detecting if something is a pointer in the later
+ // lowering currently, this at the moment comes tied with
+ // OMP_MAP_PTR_AND_OBJ being applied which breaks the problem this tries to
+ // solve by emitting a 8-byte mapping tied to the descriptor address (even
+ // if we only emit a single map). So we circumvent this by removing the
+ // varPtrPtr mapping, however, a side affect of this is we lose the
+ // additional load from the backend tied to this which is required for
+ // correctness and getting the correct address of the data to perform our
+ // mapping. So we do our load at this stage.
+ // TODO/FIXME: Tidy up the OMP_MAP_PTR_AND_OBJ and varPtrPtr being tied to
+ // if something is a pointer to try and tidy up the implementation a bit.
+ // This is an unfortunate complexity from push-back from upstream. We
+ // could also emit a load at this level for all base addresses as well,
+ // which in turn will simplify the later lowering a bit as well. But first
+ // need to see how well this alteration works.
+ auto loadBaseAddr =
+ builder.loadIfRef(op->getLoc(), baseAddr.getVarPtrPtr());
+ mlir::omp::MapInfoOp newBaseAddrMapOp =
+ builder.create<mlir::omp::MapInfoOp>(
+ op->getLoc(), loadBaseAddr.getType(), loadBaseAddr,
+ baseAddr.getVarTypeAttr(), baseAddr.getMapTypeAttr(),
+ baseAddr.getMapCaptureTypeAttr(), mlir::Value{}, newMembers,
+ newMembersAttr, baseAddr.getBounds(),
+ /*mapperId*/ mlir::FlatSymbolRefAttr(), op.getNameAttr(),
+ /*partial_map=*/builder.getBoolAttr(false));
+ op.replaceAllUsesWith(newBaseAddrMapOp.getResult());
+ op->erase();
+ baseAddr.erase();
+ }
+
// This pass executes on omp::MapInfoOp's containing descriptor based types
// (allocatables, pointers, assumed shape etc.) and expanding them into
// multiple omp::MapInfoOp's for each pointer member contained within the
@@ -622,6 +804,7 @@ class MapInfoFinalizationPass
// clear all local allocations we made for any boxes in any prior
// iterations from previous function scopes.
localBoxAllocas.clear();
+ deferrableDesc.clear();
// First, walk `omp.map.info` ops to see if any of them have varPtrs
// with an underlying type of fir.char<k, ?>, i.e a character
@@ -864,6 +1047,36 @@ class MapInfoFinalizationPass
}
});
+ // Now that we've expanded all of our boxes into a descriptor and base
+ // address map where necessary, we check if the map owner is an
+ // enter/exit/target data directive, and if they are we drop the initial
+ // descriptor (top-level parent) and replace it with the
+ // base_address/data.
+ //
+ // This circumvents issues with stack allocated descriptors bound to
+ // device colliding which in Flang is rather trivial for a user to do by
+ // accident due to the rather pervasive local intermediate descriptor
+ // generation that occurs whenever you pass boxes around different scopes.
+ // In OpenMP 6+ mapping these would be a user error as the tools required
+ // to circumvent these issues are provided by the spec (ref_ptr/ptee map
+ // types), but in prior specifications these tools are not available and
+ // it becomes an implementation issue for us to solve.
+ //
+ // We do this by dropping the top-level descriptor which will be the stack
+ // descriptor when we perform enter/exit maps, as we don't want these to
+ // be bound until necessary which is when we utilise the descriptor type
+ // within a target region. At which point we map the relevant descriptor
+ // data and the runtime should correctly associate the data with the
+ // descriptor and bind together and allow clean mapping and execution.
+ for (auto *op : deferrableDesc) {
+ auto mapOp = llvm::dyn_cast<mlir::omp::MapInfoOp>(op);
+ mlir::Operation *targetUser = getFirstTargetUser(mapOp);
+ assert(targetUser && "expected user of map operation was not found");
+ builder.setInsertionPoint(mapOp);
+ removeTopLevelDescriptor(mapOp, builder, targetUser);
+ addImplictDescriptorMapToTargetDataOp(mapOp, builder, targetUser);
+ }
+
// Wait until after we have generated all of our maps to add them onto
// the target's block arguments, simplifying the process as there would be
// no need to avoid accidental duplicate additions.
diff --git a/flang/test/Lower/OpenMP/map-descriptor-deferral.f90 b/flang/test/Lower/OpenMP/map-descriptor-deferral.f90
new file mode 100644
index 0000000000000..a7ce1edd94b12
--- /dev/null
+++ b/flang/test/Lower/OpenMP/map-descriptor-deferral.f90
@@ -0,0 +1,44 @@
+!RUN: %flang_fc1 -emit-hlfir -fopenmp %s -o - | FileCheck %s
+
+subroutine assume_map_target_enter_exit(assumed_arr)
+ integer :: assumed_arr(:)
+ !$omp target enter data map(to: assumed_arr)
+ !$omp target
+ assumed_arr(1) = 10
+ !$omp end target
+ !$omp target exit data map(from: assumed_arr)
+end subroutine
+
+!CHECK-LABEL: func.func @_QPassume_map_target_enter_exit(
+!CHECK: %[[BOX_ADDR:.*]] = fir.box_offset %{{.*}} base_addr : (!fir.ref<!fir.box<!fir.array<?xi32>>>) -> !fir.llvm_ptr<!fir.ref<!fir.array<?xi32>>>
+!CHECK: %[[LOAD_BOX:.*]] = fir.load %[[BOX_ADDR]] : !fir.llvm_ptr<!fir.ref<!fir.array<?xi32>>>
+!CHECK: %[[MAP_ADDR:.*]] = omp.map.info var_ptr(%[[LOAD_BOX]] : !fir.ref<!fir.array<?xi32>>, i32) map_clauses(to) capture(ByRef) bounds(%{{.*}}) -> !fir.ref<!fir.array<?xi32>> {name = "assumed_arr"}
+!CHECK: omp.target_enter_data map_entries(%[[MAP_ADDR]] : !fir.ref<!fir.array<?xi32>>)
+!CHECK: %[[BOX_ADDR:.*]] = fir.box_offset %{{.*}} base_addr : (!fir.ref<!fir.box<!fir.array<?xi32>>>) -> !fir.llvm_ptr<!fir.ref<!fir.array<?xi32>>>
+!CHECK: %[[MAP_ADDR:.*]] = omp.map.info var_ptr(%{{.*}} : !fir.ref<!fir.box<!fir.array<?xi32>>>, i32) map_clauses(implicit, tofrom) capture(ByRef) var_ptr_ptr(%[[BOX_ADDR]] : !fir.llvm_ptr<!fir.ref<!fir.array<?xi32>>>) bounds(%{{.*}}) -> !fir.llvm_ptr<!fir.ref<!fir.array<?xi32>>> {name = ""}
+!CHECK: %[[MAP_BOX:.*]] = omp.map.info var_ptr(%{{.*}} : !fir.ref<!fir.box<!fir.array<?xi32>>>, !fir.box<!fir.array<?xi32>>) map_clauses(implicit, to) capture(ByRef) members(%{{.*}} : [0] : !fir.llvm_ptr<!fir.ref<!fir.array<?xi32>>>) -> !fir.ref<!fir.array<?xi32>> {name = "assumed_arr"}
+!CHECK: omp.target map_entries(%[[MAP_BOX]] -> %{{.*}}, %[[MAP_ADDR]] -> %{{.*}} : !fir.ref<!fir.array<?xi32>>, !fir.llvm_ptr<!fir.ref<!fir.array<?xi32>>>) {
+!CHECK: %[[BOX_ADDR:.*]] = fir.box_offset %{{.*}} base_addr : (!fir.ref<!fir.box<!fir.array<?xi32>>>) -> !fir.llvm_ptr<!fir.ref<!fir.array<?xi32>>>
+!CHECK: %[[LOAD_BOX:.*]] = fir.load %[[BOX_ADDR]] : !fir.llvm_ptr<!fir.ref<!fir.array<?xi32>>>
+!CHECK: %[[MAP_ADDR:.*]] = omp.map.info var_ptr(%[[LOAD_BOX]] : !fir.ref<!fir.array<?xi32>>, i32) map_clauses(from) capture(ByRef) bounds(%{{.*}}) -> !fir.ref<!fir.array<?xi32>> {name = "assumed_arr"}
+!CHECK: omp.target_exit_data map_entries(%[[MAP_ADDR]] : !fir.ref<!fir.array<?xi32>>)
+
+subroutine assume_map_target_data(assumed_arr)
+ integer :: assumed_arr(:)
+ !$omp target data map(to: assumed_arr)
+ !$omp target
+ assumed_arr(1) = 10
+ !$omp end target
+ !$omp end target data
+end subroutine
+
+!CHECK-LABEL: func.func @_QPassume_map_target_data(
+!CHECK: %[[BOX_ADDR:.*]] = fir.box_offset %{{.*}} base_addr : (!fir.ref<!fir.box<!fir.array<?xi32>>>) -> !fir.llvm_ptr<!fir.ref<!fir.array<?xi32>>>
+!CHECK: %[[MAP_ADDR:.*]] = omp.map.info var_ptr(%{{.*}} : !fir.ref<!fir.box<!fir.array<?xi32>>>, i32) map_clauses(to) capture(ByRef) var_ptr_ptr(%[[BOX_ADDR]] : !fir.llvm_ptr<!fir.ref<!fir.array<?xi32>>>) bounds(%{{.*}}) -> !fir.llvm_ptr<!fir.ref<!fir.array<?xi32>>> {name = ""}
+!CHECK: %[[MAP_BOX:.*]] = omp.map.info var_ptr(%{{.*}} : !fir.ref<!fir.box<!fir.array<?xi32>>>, !fir.box<!fir.array<?xi32>>) map_clauses(to) capture(ByRef) members(%[[MAP_ADDR]] : [0] : !fir.llvm_ptr<!fir.ref<!fir.array<?xi32>>>) -> !fir.ref<!fir.array<?xi32>> {name = "assumed_arr"}
+!CHECK: omp.target_data map_entries(%[[MAP_BOX]], %[[MAP_ADDR]] : !fir.ref<!fir.array<?xi32>>, !fir.llvm_ptr<!fir.ref<!fir.array<?xi32>>>) {
+!CHECK: %[[BOX_ADDR:.*]] = fir.box_offset %{{.*}} base_addr : (!fir.ref<!fir.box<!fir.array<?xi32>>>) -> !fir.llvm_ptr<!fir.ref<!fir.array<?xi32>>>
+!CHECK: %[[MAP_ADDR:.*]] = omp.map.info var_ptr(%{{.*}} : !fir.ref<!fir.box<!fir.array<?xi32>>>, i32) map_clauses(implicit, tofrom) capture(ByRef) var_ptr_ptr(%[[BOX_ADDR]] : !fir.llvm_ptr<!fir.ref<!fir.array<?xi32>>>) bounds(%{{.*}}) -> !fir.llvm_ptr<!fir.ref<!fir.array<?xi32>>> {name = ""}
+!CHECK: %[[MAP_BOX:.*]] = omp.map.info var_ptr(%{{.*}} : !fir.ref<!fir.box<!fir.array<?xi32>>>, !fir.box<!fir.array<?xi32>>) map_clauses(implicit, to) capture(ByRef) members(%[[MAP_ADDR]] : [0] : !fir.llvm_ptr<!fir.ref<!fir.array<?xi32>>>) -> !fir.ref<!fir.array<?xi32>> {name = "assumed_arr"}
+!CHECK: omp.target map_entries(%[[MAP_BOX]] -> %{{.*}}, %[[MAP_ADDR]] -> %{{.*}} : !fir.ref<!fir.array<?xi32>>, !fir.llvm_ptr<!fir.ref<!fir.array<?xi32>>>) {
+
diff --git a/flang/test/Transforms/omp-map-info-finalization.fir b/flang/test/Transforms/omp-map-info-finalization.fir
index ed814cdcfc728..7bc0ae4a72b05 100644
--- a/flang/test/Transforms/omp-map-info-finalization.fir
+++ b/flang/test/Transforms/omp-map-info-finalization.fir
@@ -326,15 +326,15 @@ func.func @_QPreuse_alloca(%arg0: !fir.box<!fir.array<?xf64>> {fir.bindc_name =
// CHECK: %{{[0-9]+}} = omp.map.info var_ptr(%[[ALLOCA]]
// CHECK: %{{[0-9]+}} = omp.map.info var_ptr(%[[ALLOCA]]
// CHECK: omp.target_data map_entries
-// CHECK: %{{[0-9]+}} = omp.map.info var_ptr(%[[ALLOCA]]
-// CHECK: %{{[0-9]+}} = omp.map.info var_ptr(%[[ALLOCA]]
+// CHECK: %[[BOX_OFFSET:.*]] = fir.box_offset %[[ALLOCA]]
+// CHECK: %[[LOAD_OFFSET:.*]] = fir.load %[[BOX_OFFSET]] : !fir.llvm_ptr<!fir.ref<!fir.array<?xf64>>>
+// CHECK: %{{[0-9]+}} = omp.map.info var_ptr(%[[LOAD_OFFSET]]
// CHECK: omp.target_update map_entries
// CHECK: omp.terminator
// CHECK: }
// CHECK: return
-
omp.private {type = firstprivate} @boxchar.privatizer : !fir.boxchar<1> copy {
^bb0(%arg0: !fir.boxchar<1>, %arg1: !fir.boxchar<1>):
omp.yield(%arg0 : !fir.boxchar<1>)
diff --git a/offload/test/offloading/fortran/descriptor-stack-jam-regression.f90 b/offload/test/offloading/fortran/descriptor-stack-jam-regression.f90
new file mode 100644
index 0000000000000..bb077613503d1
--- /dev/null
+++ b/offload/test/offloading/fortran/descriptor-stack-jam-regression.f90
@@ -0,0 +1,52 @@
+! This test doesn't expect any results, the pass condition is running to completion
+! without any memory access errors on device or mapping issues from descriptor
+! collisions due to local descriptors being placed on device and not being unampped
+! before a subsequent local descriptor residing at the same address is mapped to
+! device.
+! REQUIRES: flang, amdgpu
+
+! RUN: %libomptarget-compile-fortran-run-and-check-generic
+module test
+contains
+ subroutine kernel_1d(array)
+ implicit none
+ real, dimension(:) :: array
+ integer :: i
+
+ print *, 'kernel_1d'
+ !$omp target enter data map(alloc:array)
+ !$omp target teams distribute parallel do
+ do i=1, ubound(array, 1)
+ array(i) = 42.0
+ end do
+ end subroutine
+
+ subroutine kernel_2d(array)
+ implicit none
+ real, dimension(:,...
[truncated]
|
@llvm/pr-subscribers-flang-fir-hlfir Author: None (agozillon) ChangesThis PR adds deferral of descriptor maps until they are necessary for assumed dummy argument types. The intent is to avoid a problem where a user can inadvertently map a temporary local descriptor to device without their knowledge and proceed to never unmap it. This temporary local descriptor remains lodged in OpenMP device memory and the next time another variable or descriptor residing in the same stack address is mapped we incur a runtime OpenMP map error as we try to remap the same address. This fix was discussed with the OpenMP committee and applies to OpenMP 5.2 and below, future versions of OpenMP can avoid this issue via the attach semantics added to the specification. Patch is 20.54 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/154349.diff 4 Files Affected:
diff --git a/flang/lib/Optimizer/OpenMP/MapInfoFinalization.cpp b/flang/lib/Optimizer/OpenMP/MapInfoFinalization.cpp
index 57be863cfa1b8..4940cbc8f668d 100644
--- a/flang/lib/Optimizer/OpenMP/MapInfoFinalization.cpp
+++ b/flang/lib/Optimizer/OpenMP/MapInfoFinalization.cpp
@@ -75,6 +75,38 @@ class MapInfoFinalizationPass
/// | |
std::map<mlir::Operation *, mlir::Value> localBoxAllocas;
+ // List of deferrable descriptors to process at the end of
+ // the pass.
+ llvm::SmallVector<mlir::Operation *> deferrableDesc;
+
+ // Check if the declaration operation we have refers to a dummy
+ // function argument.
+ bool isDummyArgument(mlir::Operation *op) {
+ if (auto declareOp = mlir::dyn_cast<hlfir::DeclareOp>(op))
+ if (auto dummyScope = declareOp.getDummyScope())
+ return true;
+ return false;
+ }
+
+ // Relevant for OpenMP < 5.2, where attach semantics and rules don't exist.
+ // As descriptors were an unspoken implementation detail in these versions
+ // there's certain cases where the user (and the compiler implementation)
+ // can create data mapping errors by having temporary descriptors stuck
+ // in memory. To avoid this we can defer the descriptor mapping in these
+ // cases until target or target data regions, when we can be sure they
+ // have a clear limited scope on device.
+ bool canDeferDescriptorMapping(mlir::Value descriptor) {
+ if (!fir::isAllocatableType(descriptor.getType()) ||
+ !fir::isPointerType(descriptor.getType())) {
+ if (isDummyArgument(descriptor.getDefiningOp()) &&
+ (fir::isAssumedType(descriptor.getType()) ||
+ fir::isAssumedShape(descriptor.getType()))) {
+ return true;
+ }
+ }
+ return false;
+ }
+
/// getMemberUserList gathers all users of a particular MapInfoOp that are
/// other MapInfoOp's and places them into the mapMemberUsers list, which
/// records the map that the current argument MapInfoOp "op" is part of
@@ -126,13 +158,16 @@ class MapInfoFinalizationPass
/// fir::BoxOffsetOp we utilise to access the descriptor datas
/// base address can be utilised.
mlir::Value getDescriptorFromBoxMap(mlir::omp::MapInfoOp boxMap,
- fir::FirOpBuilder &builder) {
+ fir::FirOpBuilder &builder,
+ bool &canDescBeDeferred) {
mlir::Value descriptor = boxMap.getVarPtr();
if (!fir::isTypeWithDescriptor(boxMap.getVarType()))
if (auto addrOp = mlir::dyn_cast_if_present<fir::BoxAddrOp>(
boxMap.getVarPtr().getDefiningOp()))
descriptor = addrOp.getVal();
+ canDescBeDeferred = canDeferDescriptorMapping(descriptor);
+
if (!mlir::isa<fir::BaseBoxType>(descriptor.getType()) &&
!fir::factory::isOptionalArgument(descriptor.getDefiningOp()))
return descriptor;
@@ -294,6 +329,28 @@ class MapInfoFinalizationPass
return false;
}
+ bool isUseDeviceAddr(mlir::omp::MapInfoOp mapOp, mlir::Operation *userOp) {
+ assert(userOp && "Expecting non-null argument");
+ if (auto targetDataOp = llvm::dyn_cast<mlir::omp::TargetDataOp>(userOp)) {
+ for (mlir::Value uda : targetDataOp.getUseDeviceAddrVars()) {
+ if (uda.getDefiningOp() == mapOp)
+ return true;
+ }
+ }
+ return false;
+ }
+
+ bool isUseDevicePtr(mlir::omp::MapInfoOp mapOp, mlir::Operation *userOp) {
+ assert(userOp && "Expecting non-null argument");
+ if (auto targetDataOp = llvm::dyn_cast<mlir::omp::TargetDataOp>(userOp)) {
+ for (mlir::Value udp : targetDataOp.getUseDevicePtrVars()) {
+ if (udp.getDefiningOp() == mapOp)
+ return true;
+ }
+ }
+ return false;
+ }
+
mlir::omp::MapInfoOp genBoxcharMemberMap(mlir::omp::MapInfoOp op,
fir::FirOpBuilder &builder) {
if (!op.getMembers().empty())
@@ -358,7 +415,9 @@ class MapInfoFinalizationPass
// TODO: map the addendum segment of the descriptor, similarly to the
// base address/data pointer member.
- mlir::Value descriptor = getDescriptorFromBoxMap(op, builder);
+ bool descCanBeDeferred = false;
+ mlir::Value descriptor =
+ getDescriptorFromBoxMap(op, builder, descCanBeDeferred);
mlir::ArrayAttr newMembersAttr;
mlir::SmallVector<mlir::Value> newMembers;
@@ -445,6 +504,10 @@ class MapInfoFinalizationPass
/*partial_map=*/builder.getBoolAttr(false));
op.replaceAllUsesWith(newDescParentMapOp.getResult());
op->erase();
+
+ if (descCanBeDeferred)
+ deferrableDesc.push_back(newDescParentMapOp);
+
return newDescParentMapOp;
}
@@ -593,6 +656,125 @@ class MapInfoFinalizationPass
return nullptr;
}
+ void addImplictDescriptorMapToTargetDataOp(mlir::omp::MapInfoOp op,
+ fir::FirOpBuilder &builder,
+ mlir::Operation *target) {
+ // Checks if the map is present as an explicit map already on the target
+ // data directive, and not just present on a use_device_addr/ptr, as if
+ // that's the case, we should not need to add an implicit map for the
+ // descriptor.
+ auto explicitMappingPresent = [](mlir::omp::MapInfoOp op,
+ mlir::omp::TargetDataOp tarData) {
+ // Verify top-level descriptor mapping is at least equal with same
+ // varPtr, the map type should always be To for a descriptor, which is
+ // all we really care about for this mapping as we aim to make sure the
+ // descriptor is always present on device if we're expecting to access
+ // the underlying data.
+ if (tarData.getMapVars().empty())
+ return false;
+
+ for (mlir::Value mapVar : tarData.getMapVars()) {
+ auto mapOp =
+ llvm::dyn_cast<mlir::omp::MapInfoOp>(mapVar.getDefiningOp());
+ if (mapOp.getVarPtr() == op.getVarPtr() &&
+ mapOp.getVarPtrPtr() == mapOp.getVarPtrPtr()) {
+ return true;
+ }
+ }
+
+ return false;
+ };
+
+ // if we're not a top level descriptor with members (e.g. member of a
+ // derived type), we do not want to perform this step.
+ if (!llvm::isa<mlir::omp::TargetDataOp>(target) || op.getMembers().empty())
+ return;
+
+ if (!isUseDeviceAddr(op, target) && !isUseDevicePtr(op, target))
+ return;
+
+ auto targetDataOp = llvm::dyn_cast<mlir::omp::TargetDataOp>(target);
+ if (explicitMappingPresent(op, targetDataOp))
+ return;
+
+ mlir::omp::MapInfoOp newDescParentMapOp =
+ builder.create<mlir::omp::MapInfoOp>(
+ op->getLoc(), op.getResult().getType(), op.getVarPtr(),
+ op.getVarTypeAttr(),
+ builder.getIntegerAttr(
+ builder.getIntegerType(64, false),
+ llvm::to_underlying(
+ llvm::omp::OpenMPOffloadMappingFlags::OMP_MAP_TO |
+ llvm::omp::OpenMPOffloadMappingFlags::OMP_MAP_ALWAYS)),
+ op.getMapCaptureTypeAttr(), /*varPtrPtr=*/mlir::Value{},
+ mlir::SmallVector<mlir::Value>{}, mlir::ArrayAttr{},
+ /*bounds=*/mlir::SmallVector<mlir::Value>{},
+ /*mapperId*/ mlir::FlatSymbolRefAttr(), op.getNameAttr(),
+ /*partial_map=*/builder.getBoolAttr(false));
+
+ targetDataOp.getMapVarsMutable().append({newDescParentMapOp});
+ }
+
+ void removeTopLevelDescriptor(mlir::omp::MapInfoOp op,
+ fir::FirOpBuilder &builder,
+ mlir::Operation *target) {
+ if (llvm::isa<mlir::omp::TargetOp, mlir::omp::TargetDataOp,
+ mlir::omp::DeclareMapperInfoOp>(target))
+ return;
+
+ // if we're not a top level descriptor with members (e.g. member of a
+ // derived type), we do not want to perform this step.
+ if (op.getMembers().empty())
+ return;
+
+ mlir::SmallVector<mlir::Value> newMembers = op.getMembers();
+ mlir::omp::MapInfoOp baseAddr =
+ mlir::dyn_cast_or_null<mlir::omp::MapInfoOp>(
+ newMembers.front().getDefiningOp());
+ assert(baseAddr && "Expected member to be MapInfoOp");
+ newMembers.erase(newMembers.begin());
+
+ llvm::SmallVector<llvm::SmallVector<int64_t>> memberIndices;
+ getMemberIndicesAsVectors(op, memberIndices);
+
+ // Can skip the extra processing if there's only 1 member as it'd
+ // be the base addresses, which we're promoting to the parent.
+ mlir::ArrayAttr newMembersAttr;
+ if (memberIndices.size() > 1) {
+ memberIndices.erase(memberIndices.begin());
+ newMembersAttr = builder.create2DI64ArrayAttr(memberIndices);
+ }
+
+ // VarPtrPtr is tied to detecting if something is a pointer in the later
+ // lowering currently, this at the moment comes tied with
+ // OMP_MAP_PTR_AND_OBJ being applied which breaks the problem this tries to
+ // solve by emitting a 8-byte mapping tied to the descriptor address (even
+ // if we only emit a single map). So we circumvent this by removing the
+ // varPtrPtr mapping, however, a side affect of this is we lose the
+ // additional load from the backend tied to this which is required for
+ // correctness and getting the correct address of the data to perform our
+ // mapping. So we do our load at this stage.
+ // TODO/FIXME: Tidy up the OMP_MAP_PTR_AND_OBJ and varPtrPtr being tied to
+ // if something is a pointer to try and tidy up the implementation a bit.
+ // This is an unfortunate complexity from push-back from upstream. We
+ // could also emit a load at this level for all base addresses as well,
+ // which in turn will simplify the later lowering a bit as well. But first
+ // need to see how well this alteration works.
+ auto loadBaseAddr =
+ builder.loadIfRef(op->getLoc(), baseAddr.getVarPtrPtr());
+ mlir::omp::MapInfoOp newBaseAddrMapOp =
+ builder.create<mlir::omp::MapInfoOp>(
+ op->getLoc(), loadBaseAddr.getType(), loadBaseAddr,
+ baseAddr.getVarTypeAttr(), baseAddr.getMapTypeAttr(),
+ baseAddr.getMapCaptureTypeAttr(), mlir::Value{}, newMembers,
+ newMembersAttr, baseAddr.getBounds(),
+ /*mapperId*/ mlir::FlatSymbolRefAttr(), op.getNameAttr(),
+ /*partial_map=*/builder.getBoolAttr(false));
+ op.replaceAllUsesWith(newBaseAddrMapOp.getResult());
+ op->erase();
+ baseAddr.erase();
+ }
+
// This pass executes on omp::MapInfoOp's containing descriptor based types
// (allocatables, pointers, assumed shape etc.) and expanding them into
// multiple omp::MapInfoOp's for each pointer member contained within the
@@ -622,6 +804,7 @@ class MapInfoFinalizationPass
// clear all local allocations we made for any boxes in any prior
// iterations from previous function scopes.
localBoxAllocas.clear();
+ deferrableDesc.clear();
// First, walk `omp.map.info` ops to see if any of them have varPtrs
// with an underlying type of fir.char<k, ?>, i.e a character
@@ -864,6 +1047,36 @@ class MapInfoFinalizationPass
}
});
+ // Now that we've expanded all of our boxes into a descriptor and base
+ // address map where necessary, we check if the map owner is an
+ // enter/exit/target data directive, and if they are we drop the initial
+ // descriptor (top-level parent) and replace it with the
+ // base_address/data.
+ //
+ // This circumvents issues with stack allocated descriptors bound to
+ // device colliding which in Flang is rather trivial for a user to do by
+ // accident due to the rather pervasive local intermediate descriptor
+ // generation that occurs whenever you pass boxes around different scopes.
+ // In OpenMP 6+ mapping these would be a user error as the tools required
+ // to circumvent these issues are provided by the spec (ref_ptr/ptee map
+ // types), but in prior specifications these tools are not available and
+ // it becomes an implementation issue for us to solve.
+ //
+ // We do this by dropping the top-level descriptor which will be the stack
+ // descriptor when we perform enter/exit maps, as we don't want these to
+ // be bound until necessary which is when we utilise the descriptor type
+ // within a target region. At which point we map the relevant descriptor
+ // data and the runtime should correctly associate the data with the
+ // descriptor and bind together and allow clean mapping and execution.
+ for (auto *op : deferrableDesc) {
+ auto mapOp = llvm::dyn_cast<mlir::omp::MapInfoOp>(op);
+ mlir::Operation *targetUser = getFirstTargetUser(mapOp);
+ assert(targetUser && "expected user of map operation was not found");
+ builder.setInsertionPoint(mapOp);
+ removeTopLevelDescriptor(mapOp, builder, targetUser);
+ addImplictDescriptorMapToTargetDataOp(mapOp, builder, targetUser);
+ }
+
// Wait until after we have generated all of our maps to add them onto
// the target's block arguments, simplifying the process as there would be
// no need to avoid accidental duplicate additions.
diff --git a/flang/test/Lower/OpenMP/map-descriptor-deferral.f90 b/flang/test/Lower/OpenMP/map-descriptor-deferral.f90
new file mode 100644
index 0000000000000..a7ce1edd94b12
--- /dev/null
+++ b/flang/test/Lower/OpenMP/map-descriptor-deferral.f90
@@ -0,0 +1,44 @@
+!RUN: %flang_fc1 -emit-hlfir -fopenmp %s -o - | FileCheck %s
+
+subroutine assume_map_target_enter_exit(assumed_arr)
+ integer :: assumed_arr(:)
+ !$omp target enter data map(to: assumed_arr)
+ !$omp target
+ assumed_arr(1) = 10
+ !$omp end target
+ !$omp target exit data map(from: assumed_arr)
+end subroutine
+
+!CHECK-LABEL: func.func @_QPassume_map_target_enter_exit(
+!CHECK: %[[BOX_ADDR:.*]] = fir.box_offset %{{.*}} base_addr : (!fir.ref<!fir.box<!fir.array<?xi32>>>) -> !fir.llvm_ptr<!fir.ref<!fir.array<?xi32>>>
+!CHECK: %[[LOAD_BOX:.*]] = fir.load %[[BOX_ADDR]] : !fir.llvm_ptr<!fir.ref<!fir.array<?xi32>>>
+!CHECK: %[[MAP_ADDR:.*]] = omp.map.info var_ptr(%[[LOAD_BOX]] : !fir.ref<!fir.array<?xi32>>, i32) map_clauses(to) capture(ByRef) bounds(%{{.*}}) -> !fir.ref<!fir.array<?xi32>> {name = "assumed_arr"}
+!CHECK: omp.target_enter_data map_entries(%[[MAP_ADDR]] : !fir.ref<!fir.array<?xi32>>)
+!CHECK: %[[BOX_ADDR:.*]] = fir.box_offset %{{.*}} base_addr : (!fir.ref<!fir.box<!fir.array<?xi32>>>) -> !fir.llvm_ptr<!fir.ref<!fir.array<?xi32>>>
+!CHECK: %[[MAP_ADDR:.*]] = omp.map.info var_ptr(%{{.*}} : !fir.ref<!fir.box<!fir.array<?xi32>>>, i32) map_clauses(implicit, tofrom) capture(ByRef) var_ptr_ptr(%[[BOX_ADDR]] : !fir.llvm_ptr<!fir.ref<!fir.array<?xi32>>>) bounds(%{{.*}}) -> !fir.llvm_ptr<!fir.ref<!fir.array<?xi32>>> {name = ""}
+!CHECK: %[[MAP_BOX:.*]] = omp.map.info var_ptr(%{{.*}} : !fir.ref<!fir.box<!fir.array<?xi32>>>, !fir.box<!fir.array<?xi32>>) map_clauses(implicit, to) capture(ByRef) members(%{{.*}} : [0] : !fir.llvm_ptr<!fir.ref<!fir.array<?xi32>>>) -> !fir.ref<!fir.array<?xi32>> {name = "assumed_arr"}
+!CHECK: omp.target map_entries(%[[MAP_BOX]] -> %{{.*}}, %[[MAP_ADDR]] -> %{{.*}} : !fir.ref<!fir.array<?xi32>>, !fir.llvm_ptr<!fir.ref<!fir.array<?xi32>>>) {
+!CHECK: %[[BOX_ADDR:.*]] = fir.box_offset %{{.*}} base_addr : (!fir.ref<!fir.box<!fir.array<?xi32>>>) -> !fir.llvm_ptr<!fir.ref<!fir.array<?xi32>>>
+!CHECK: %[[LOAD_BOX:.*]] = fir.load %[[BOX_ADDR]] : !fir.llvm_ptr<!fir.ref<!fir.array<?xi32>>>
+!CHECK: %[[MAP_ADDR:.*]] = omp.map.info var_ptr(%[[LOAD_BOX]] : !fir.ref<!fir.array<?xi32>>, i32) map_clauses(from) capture(ByRef) bounds(%{{.*}}) -> !fir.ref<!fir.array<?xi32>> {name = "assumed_arr"}
+!CHECK: omp.target_exit_data map_entries(%[[MAP_ADDR]] : !fir.ref<!fir.array<?xi32>>)
+
+subroutine assume_map_target_data(assumed_arr)
+ integer :: assumed_arr(:)
+ !$omp target data map(to: assumed_arr)
+ !$omp target
+ assumed_arr(1) = 10
+ !$omp end target
+ !$omp end target data
+end subroutine
+
+!CHECK-LABEL: func.func @_QPassume_map_target_data(
+!CHECK: %[[BOX_ADDR:.*]] = fir.box_offset %{{.*}} base_addr : (!fir.ref<!fir.box<!fir.array<?xi32>>>) -> !fir.llvm_ptr<!fir.ref<!fir.array<?xi32>>>
+!CHECK: %[[MAP_ADDR:.*]] = omp.map.info var_ptr(%{{.*}} : !fir.ref<!fir.box<!fir.array<?xi32>>>, i32) map_clauses(to) capture(ByRef) var_ptr_ptr(%[[BOX_ADDR]] : !fir.llvm_ptr<!fir.ref<!fir.array<?xi32>>>) bounds(%{{.*}}) -> !fir.llvm_ptr<!fir.ref<!fir.array<?xi32>>> {name = ""}
+!CHECK: %[[MAP_BOX:.*]] = omp.map.info var_ptr(%{{.*}} : !fir.ref<!fir.box<!fir.array<?xi32>>>, !fir.box<!fir.array<?xi32>>) map_clauses(to) capture(ByRef) members(%[[MAP_ADDR]] : [0] : !fir.llvm_ptr<!fir.ref<!fir.array<?xi32>>>) -> !fir.ref<!fir.array<?xi32>> {name = "assumed_arr"}
+!CHECK: omp.target_data map_entries(%[[MAP_BOX]], %[[MAP_ADDR]] : !fir.ref<!fir.array<?xi32>>, !fir.llvm_ptr<!fir.ref<!fir.array<?xi32>>>) {
+!CHECK: %[[BOX_ADDR:.*]] = fir.box_offset %{{.*}} base_addr : (!fir.ref<!fir.box<!fir.array<?xi32>>>) -> !fir.llvm_ptr<!fir.ref<!fir.array<?xi32>>>
+!CHECK: %[[MAP_ADDR:.*]] = omp.map.info var_ptr(%{{.*}} : !fir.ref<!fir.box<!fir.array<?xi32>>>, i32) map_clauses(implicit, tofrom) capture(ByRef) var_ptr_ptr(%[[BOX_ADDR]] : !fir.llvm_ptr<!fir.ref<!fir.array<?xi32>>>) bounds(%{{.*}}) -> !fir.llvm_ptr<!fir.ref<!fir.array<?xi32>>> {name = ""}
+!CHECK: %[[MAP_BOX:.*]] = omp.map.info var_ptr(%{{.*}} : !fir.ref<!fir.box<!fir.array<?xi32>>>, !fir.box<!fir.array<?xi32>>) map_clauses(implicit, to) capture(ByRef) members(%[[MAP_ADDR]] : [0] : !fir.llvm_ptr<!fir.ref<!fir.array<?xi32>>>) -> !fir.ref<!fir.array<?xi32>> {name = "assumed_arr"}
+!CHECK: omp.target map_entries(%[[MAP_BOX]] -> %{{.*}}, %[[MAP_ADDR]] -> %{{.*}} : !fir.ref<!fir.array<?xi32>>, !fir.llvm_ptr<!fir.ref<!fir.array<?xi32>>>) {
+
diff --git a/flang/test/Transforms/omp-map-info-finalization.fir b/flang/test/Transforms/omp-map-info-finalization.fir
index ed814cdcfc728..7bc0ae4a72b05 100644
--- a/flang/test/Transforms/omp-map-info-finalization.fir
+++ b/flang/test/Transforms/omp-map-info-finalization.fir
@@ -326,15 +326,15 @@ func.func @_QPreuse_alloca(%arg0: !fir.box<!fir.array<?xf64>> {fir.bindc_name =
// CHECK: %{{[0-9]+}} = omp.map.info var_ptr(%[[ALLOCA]]
// CHECK: %{{[0-9]+}} = omp.map.info var_ptr(%[[ALLOCA]]
// CHECK: omp.target_data map_entries
-// CHECK: %{{[0-9]+}} = omp.map.info var_ptr(%[[ALLOCA]]
-// CHECK: %{{[0-9]+}} = omp.map.info var_ptr(%[[ALLOCA]]
+// CHECK: %[[BOX_OFFSET:.*]] = fir.box_offset %[[ALLOCA]]
+// CHECK: %[[LOAD_OFFSET:.*]] = fir.load %[[BOX_OFFSET]] : !fir.llvm_ptr<!fir.ref<!fir.array<?xf64>>>
+// CHECK: %{{[0-9]+}} = omp.map.info var_ptr(%[[LOAD_OFFSET]]
// CHECK: omp.target_update map_entries
// CHECK: omp.terminator
// CHECK: }
// CHECK: return
-
omp.private {type = firstprivate} @boxchar.privatizer : !fir.boxchar<1> copy {
^bb0(%arg0: !fir.boxchar<1>, %arg1: !fir.boxchar<1>):
omp.yield(%arg0 : !fir.boxchar<1>)
diff --git a/offload/test/offloading/fortran/descriptor-stack-jam-regression.f90 b/offload/test/offloading/fortran/descriptor-stack-jam-regression.f90
new file mode 100644
index 0000000000000..bb077613503d1
--- /dev/null
+++ b/offload/test/offloading/fortran/descriptor-stack-jam-regression.f90
@@ -0,0 +1,52 @@
+! This test doesn't expect any results, the pass condition is running to completion
+! without any memory access errors on device or mapping issues from descriptor
+! collisions due to local descriptors being placed on device and not being unampped
+! before a subsequent local descriptor residing at the same address is mapped to
+! device.
+! REQUIRES: flang, amdgpu
+
+! RUN: %libomptarget-compile-fortran-run-and-check-generic
+module test
+contains
+ subroutine kernel_1d(array)
+ implicit none
+ real, dimension(:) :: array
+ integer :: i
+
+ print *, 'kernel_1d'
+ !$omp target enter data map(alloc:array)
+ !$omp target teams distribute parallel do
+ do i=1, ubound(array, 1)
+ array(i) = 42.0
+ end do
+ end subroutine
+
+ subroutine kernel_2d(array)
+ implicit none
+ real, dimension(:,...
[truncated]
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Andrew. I have a few quesitons and comments.
@@ -0,0 +1,44 @@ | |||
!RUN: %flang_fc1 -emit-hlfir -fopenmp %s -o - | FileCheck %s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Can you add a test description?
|
||
for (mlir::Value mapVar : tarData.getMapVars()) { | ||
auto mapOp = | ||
llvm::dyn_cast<mlir::omp::MapInfoOp>(mapVar.getDefiningOp()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
llvm::dyn_cast<mlir::omp::MapInfoOp>(mapVar.getDefiningOp()); | |
llvm::cast<mlir::omp::MapInfoOp>(mapVar.getDefiningOp()); |
auto mapOp = | ||
llvm::dyn_cast<mlir::omp::MapInfoOp>(mapVar.getDefiningOp()); | ||
if (mapOp.getVarPtr() == op.getVarPtr() && | ||
mapOp.getVarPtrPtr() == mapOp.getVarPtrPtr()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo?
mapOp.getVarPtrPtr() == mapOp.getVarPtrPtr()) { | |
mapOp.getVarPtrPtr() == op.getVarPtrPtr()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent catch, thank you very much!
// cases until target or target data regions, when we can be sure they | ||
// have a clear limited scope on device. | ||
bool canDeferDescriptorMapping(mlir::Value descriptor) { | ||
if (!fir::isAllocatableType(descriptor.getType()) || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should that be &&
?
if (!fir::isAllocatableType(descriptor.getType()) || | |
if (!fir::isAllocatableType(descriptor.getType()) && |
Also, I think this condition is not covered by a test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added a couple of tests for it!
// cases until target or target data regions, when we can be sure they | ||
// have a clear limited scope on device. | ||
bool canDeferDescriptorMapping(mlir::Value descriptor) { | ||
if (!fir::isAllocatableType(descriptor.getType()) || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use early returns to improve readability?
if (!isUseDeviceAddr(op, target) && !isUseDevicePtr(op, target)) | ||
return; | ||
|
||
auto targetDataOp = llvm::dyn_cast<mlir::omp::TargetDataOp>(target); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
llvm::cast
?
@@ -0,0 +1,52 @@ | |||
! This test doesn't expect any results, the pass condition is running to completion |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might be missing something but running this test without the changes in the PR (I just ran the test using main
) and it still passes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not at all, my fault for trying to reduce the original reproducer a bit! Extended it again a bit closer to the original size, enough so that it should reproduce.
if (op.getMembers().empty()) | ||
return; | ||
|
||
mlir::SmallVector<mlir::Value> newMembers = op.getMembers(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: why new
in newMembers
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
old habits I suppose, it's a new members list for the operation so newMembers! :-) happy to remove it of course.
!$omp end target data | ||
end subroutine | ||
|
||
!CHECK-LABEL: func.func @_QPassume_map_target_data( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any deferral happening here? I expected to see a load
from the stack allocated box that corresponds to assumed_arr
's descriptor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is not, it's intentional to make sure we aren't deferring here and alterations to the pass do not cause deferral without the modifier noticing :-)
// within a target region. At which point we map the relevant descriptor | ||
// data and the runtime should correctly associate the data with the | ||
// descriptor and bind together and allow clean mapping and execution. | ||
for (auto *op : deferrableDesc) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just out of curiosity, is it possible to avoid creating the "wrong" mappings in the first place? Instead of creating them and then editing the mappings as we do in the loop? So instead of deferring the mapping, we do it properly for stack allocated desciptors when we first create them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is likely possible, but I opted for this approach as the logic for the initial generation of the maps (this would have to be done earlier in this pass, as it can't be done in the initial lowering with the current optimisation pass direction) is already fairly complicated and I wasn't too keen on adding more complexity to it and we already have a precedence for composing things into a phased/modular modification of the maps in this pass! And at the time I didn't think it'd be as simple as a simple if check so I opted for this approach initially!
This will also have to be wrapped in an OpenMP versioning check eventually as well, as we'll likely want to switch this off when attach map semantics are implemented in the later versioning, as it shouldn't be needed as the tools will be in the users hands to avoid this issue (in theory at least).
But if we're wanting to save the cycles and don't mind adding more complexity to the initial pass, I could look into how feasible it would be to implement it earlier if you wish! Let me know :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Had a little think about it over the weekend and remembered another reason I opted to not do it is that the map we are removing in this case is the one that's already generated by the lowering, that we use to generate the map we keep. So we'd still basically be doing the same thing, just earlier in the pass and with more complexity, as it'd now make the logic for the initial map expansion a fair bit more complicated. So, this is the lesser of two evils I think and has the side-affect of also working for the char arrays which still needs to be tidied up to be merged in with the rest of the flow in the pass rather than be its own separate segment.
@@ -593,6 +656,125 @@ class MapInfoFinalizationPass | |||
return nullptr; | |||
} | |||
|
|||
void addImplictDescriptorMapToTargetDataOp(mlir::omp::MapInfoOp op, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
void addImplictDescriptorMapToTargetDataOp(mlir::omp::MapInfoOp op, | |
void addImplicitDescriptorMapToTargetDataOp(mlir::omp::MapInfoOp op, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice catch thank you very much!
call kernel_1d(array1) | ||
call kernel_2d(array2) | ||
|
||
print *, "PASS" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please verify the array contents before printing the PASS
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can do, but the point of this test is to see if it even reaches the PASS statement, and if it does it's a success. Prior to this PR this will abort in kernel_2d when it tries to map data and launch the kernel.
…ypes This PR adds deferral of descriptor maps until they are neccessary for assumed dummy argument types. The intent is to avoid a problem where a user can inadvertently map a temporary local descriptor to device without their knowledge and proceed to never unmap it. This temporary local descriptor remains lodged in OpenMP device memory and the next time another variable or descriptor residing in the same stack address is mapped we incur a runtime OpenMP map error as we try to remap the same address. This fix was discussed with the OpenMP committee and applies to OpenMP 5.2 and below, future versions of OpenMP can avoid this issue via the attatch semantics added to the specification.
bb0f8c5
to
d3d6cb1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the PR with reviewer comments as best I could, let me know if I missed anything!
|
||
// Relevant for OpenMP < 5.2, where attach semantics and rules don't exist. | ||
// As descriptors were an unspoken implementation detail in these versions | ||
// there's certain cases where the user (and the compiler implementation) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll try to describe it :-)
!$omp end target data | ||
end subroutine | ||
|
||
!CHECK-LABEL: func.func @_QPassume_map_target_data( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is not, it's intentional to make sure we aren't deferring here and alterations to the pass do not cause deferral without the modifier noticing :-)
// cases until target or target data regions, when we can be sure they | ||
// have a clear limited scope on device. | ||
bool canDeferDescriptorMapping(mlir::Value descriptor) { | ||
if (!fir::isAllocatableType(descriptor.getType()) || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added a couple of tests for it!
@@ -593,6 +656,125 @@ class MapInfoFinalizationPass | |||
return nullptr; | |||
} | |||
|
|||
void addImplictDescriptorMapToTargetDataOp(mlir::omp::MapInfoOp op, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice catch thank you very much!
auto mapOp = | ||
llvm::dyn_cast<mlir::omp::MapInfoOp>(mapVar.getDefiningOp()); | ||
if (mapOp.getVarPtr() == op.getVarPtr() && | ||
mapOp.getVarPtrPtr() == mapOp.getVarPtrPtr()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent catch, thank you very much!
if (op.getMembers().empty()) | ||
return; | ||
|
||
mlir::SmallVector<mlir::Value> newMembers = op.getMembers(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
old habits I suppose, it's a new members list for the operation so newMembers! :-) happy to remove it of course.
@@ -0,0 +1,52 @@ | |||
! This test doesn't expect any results, the pass condition is running to completion |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not at all, my fault for trying to reduce the original reproducer a bit! Extended it again a bit closer to the original size, enough so that it should reproduce.
// within a target region. At which point we map the relevant descriptor | ||
// data and the runtime should correctly associate the data with the | ||
// descriptor and bind together and allow clean mapping and execution. | ||
for (auto *op : deferrableDesc) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is likely possible, but I opted for this approach as the logic for the initial generation of the maps (this would have to be done earlier in this pass, as it can't be done in the initial lowering with the current optimisation pass direction) is already fairly complicated and I wasn't too keen on adding more complexity to it and we already have a precedence for composing things into a phased/modular modification of the maps in this pass! And at the time I didn't think it'd be as simple as a simple if check so I opted for this approach initially!
This will also have to be wrapped in an OpenMP versioning check eventually as well, as we'll likely want to switch this off when attach map semantics are implemented in the later versioning, as it shouldn't be needed as the tools will be in the users hands to avoid this issue (in theory at least).
But if we're wanting to save the cycles and don't mind adding more complexity to the initial pass, I could look into how feasible it would be to implement it earlier if you wish! Let me know :-)
Small ping for some reviewer attention on this if at all possible please :-) |
This PR adds deferral of descriptor maps until they are necessary for assumed dummy argument types. The intent is to avoid a problem where a user can inadvertently map a temporary local descriptor to device without their knowledge and proceed to never unmap it. This temporary local descriptor remains lodged in OpenMP device memory and the next time another variable or descriptor residing in the same stack address is mapped we incur a runtime OpenMP map error as we try to remap the same address.
This fix was discussed with the OpenMP committee and applies to OpenMP 5.2 and below, future versions of OpenMP can avoid this issue via the attach semantics added to the specification.