Skip to content

Conversation

abhinavgaba
Copy link
Contributor

@abhinavgaba abhinavgaba commented Aug 14, 2025

This adds support for using ATTACH map-type for proper pointer-attachment when mapping list-items that have base-pointers.

For example, for the following:

  int *p;
  #pragma omp target enter data map(p[1:10])

The following maps are now emitted by clang:

  (A)
  &p[0], &p[1], 10 * sizeof(p[1]), TO | FROM
  &p, &p[1], sizeof(p), ATTACH

Previously, the two possible maps emitted by clang were:

  (B)
  &p[0], &p[1], 10 * sizeof(p[1]), TO | FROM

  (C)
  &p, &p[1], 10 * sizeof(p[1]), TO | FROM | PTR_AND_OBJ

(B) does not perform any pointer attachment, while (C) also maps the
pointer p, both of which are incorrect.


With this change, we are using ATTACH-style maps, like (A), for cases where the expression has a base-pointer. For example:

  int *p, **pp;
  S *ps, **pps;
  ... map(p[0])
  ... map(p[10:20])
  ... map(*p)
  ... map(([20])p)
  ... map(ps->a)
  ... map(pps->p->a)
  ... map(pp[0][0])
  ... map(*(pp + 10)[0])

Why a large PR

While it's unfortunate that this PR has gotten large and difficult to review, the issue is that all the functional changes have to be made together, to prevent regressions from partially implemented changes.

For example, the changes to capturing were previously done separately (#145454), but they would still cause stability issues in absence of full attach-mapping. And attach-mapping needs those changes to be able to launch kernels.

We can extract the utilities and functions, like those for finding attach-ptrs, or comparing exprs, out as a separate NFC PR that doesn't call those functions, just adds them (#155625). And maybe the change that adds a new error message for use_device_addr on array-sections with non-var base-pointers can be extracted out too (but that will have to be a follow-up change in that case, and we may get comp-fails with this PR when the erroneous case is not caught/diagnosed).

Grouping of maps based on attach base-pointers

We also group mapping of clauses with the same base decl in the order of the increasing complexity of their base-pointers, e.g. for something like:

  S **spp;
  map(spp[0][0], spp[0][0].a), // attach-ptr: spp[0]
  map(spp[0]),                 // attach-ptr: spp
  map(spp),                    // attach-ptr: N/A

We first map spp, then spp[0] then spp[0][0] and spp[0][0].a.

This allows us to also group "struct" allocation based on their attach pointers. This resolves the issues of us always mapping everything from the beginning of the symbol spp. Each group is mapped independently, and at the same level, like spp[0][0] and its member spp[0][0].a, we still get map them together as part of the same contiguous struct spp[0][0]. This resolves issue #141042.

use_device_ptr/addr fixes

The handling of use_device_ptr/addr was updated to use the attach-ptr information, and works for many cases that were failing before. It has to be done as part of this series because otherwise, the switch from ptr_to_obj to attach-style mapping would have caused regressions in existing use_device_ptr/addr tests.

Handling of attach-pointers that are members of implicitly mapped structs:

  • When a struct member-pointer, like p below, is a base-pointer in a map clause on a target construct (like map(p[0:1]), and the base of that struct is either the this pointer (implicitly or explicitly), or a struct that is implicitly mapped on that construct, we add an implicit map(p) so that we don't implicitly map the full struct.
 struct S { int *p;
 void f1() {
   #pragma omp target map(p[0:1]) // Implicitly map this->p, to ensure
                                  // that the implicit map of `this[:]` does
                                  // not map the full struct
      printf("%p %p\n", &p, p);
 }

Scope for improvement:

  • We may be able to compute attach-ptr expr while collecting component-lists in Sema.
    • But we cache the computation results already, and findAttachPtrExpr is fairly simple, and fast.
  • There may be a better way to implement semantic expr comparison.

Needs future work:

  • Attach-style maps not yet emitted for declare mappers.
  • Mapping of class member references: We are still using PTR_AND_OBJ maps for them. We will likely need to change that to handle ref_ptr/ref_ptee, and attach map-type-modifier on them.

For the following:

```c
int *p;
  \#pragma omp target map(p[0]) //    (A)
  (void)p;

  \#pragma omp target map(p) //       (B)
  (void)p;

  \#pragma omp target map(p, p[0]) // (C)
  (void)p;

  \#pragma omp target map(p[0], p) // (D)
  (void)p;
```

For (A), the pointer `p` is predetermined `firstprivate`, so it
should be (and is) captured by-copy. However, for (B), (C), and
(D), since `p` is already listed in a `map` clause, it's not
predetermined `firstprivate`, and hence, should be captured
by-reference, like any other mapped variable.

To ensure the correct handling of (C) and (D), the following
changes were made:

1. In SemaOpenMP, we now ensure that `p` is marked to be
captured by-reference in these cases.

2. We no longer ignore `map(p)` during codegen of `target`
constructs, even if there's another map like `map(p[0])` that
would have been mapped using a PTR_AND_OBJ map.

3. For cases like (D), we now handle `map(p)` before `map(p[0])`,
so the former gets the TARGET_PARAM flag and sets the kernel
argument.
The output of the compile-and-run tests is incorrect. These will be
used for reference in future commits that resolve the issues.

Also updated the existing clang LIT test,
target_map_both_pointer_pointee_codegen.cpp, with more regions and
more narrowed-down update_cc_test_checks filters.
This patch introduces libomptarget support for the ATTACH map-type,
which can be used to implement OpenMP conditional compliant pointer attachment,
based on whether the pointer/pointee is newly mapped on a given construct.

For example, for the following:

```c
  int *p;
  #pragma omp target enter data map(p[1:10])
```

The following maps can be emitted by clang:
```
  (A)
  &p[0], &p[1], 10 * sizeof(p[1]), TO | FROM
  &p, &p[1], sizeof(p), ATTACH
```

Without this map-type, the two possible maps emitted by clang:
```
  (B)
  &p[0], &p[1], 10 * sizeof(p[1]), TO | FROM

  (C)
  &p, &p[1], 10 * sizeof(p[1]), TO | FROM | PTR_AND_OBJ
````

(B) does not perform any pointer attachment, while (C) also maps the
pointer p, which are both incorrect.

In terms of implementation, maps with the ATTACH map-type are handled after
all other maps have been processed, as it requires knowledge of which new
allocations happened as part of the construct. As per OpenMP 5.0, an
attachment should happen only when either the pointer or the pointee was
newly mapped while handling the construct.

Maps with ATTACH map-type-bit do not increase/decrease the ref-count.

With OpenMP 6.1, `attach(always/never)` can be used to force/prevent
attachment. For `attach(always)`, the compiler will insert the ALWAYS
map-type, which would let libomptarget bypass the check about one of the
pointer/pointee being new. With `attach(never)`, the ATTACH map will not
be emitted at all.

The size argument of the ATTACH map-type can specify values greater than
`sizeof(void*)` which can be used to support pointer attachment on Fortran
descriptors. Note that this also requires shadow-pointer tracking to also
support them. That has not been implemented in this patch.

This was worked upon in coordination with Ravi Narayanaswamy, who has
since retired. Happy retirement, Ravi!
This patch introduces libomptarget support for the ATTACH map-type,
which can be used to implement OpenMP conditional compliant pointer attachment,
based on whether the pointer/pointee is newly mapped on a given construct.

For example, for the following:

```c
  int *p;
  #pragma omp target enter data map(p[1:10])
```

The following maps can be emitted by clang:
```
  (A)
  &p[0], &p[1], 10 * sizeof(p[1]), TO | FROM
  &p, &p[1], sizeof(p), ATTACH
```

Without this map-type, the two possible maps emitted by clang:
```
  (B)
  &p[0], &p[1], 10 * sizeof(p[1]), TO | FROM

  (C)
  &p, &p[1], 10 * sizeof(p[1]), TO | FROM | PTR_AND_OBJ
````

(B) does not perform any pointer attachment, while (C) also maps the
pointer p, which are both incorrect.

In terms of implementation, maps with the ATTACH map-type are handled after
all other maps have been processed, as it requires knowledge of which new
allocations happened as part of the construct. As per OpenMP 5.0, an
attachment should happen only when either the pointer or the pointee was
newly mapped while handling the construct.

Maps with ATTACH map-type-bit do not increase/decrease the ref-count.

With OpenMP 6.1, `attach(always/never)` can be used to force/prevent
attachment. For `attach(always)`, the compiler will insert the ALWAYS
map-type, which would let libomptarget bypass the check about one of the
pointer/pointee being new. With `attach(never)`, the ATTACH map will not
be emitted at all.

The size argument of the ATTACH map-type can specify values greater than
`sizeof(void*)` which can be used to support pointer attachment on Fortran
descriptors. Note that this also requires shadow-pointer tracking to also
support them. That has not been implemented in this patch.

This was worked upon in coordination with Ravi Narayanaswamy, who has
since retired. Happy retirement, Ravi!
abhinavgaba added a commit that referenced this pull request Aug 25, 2025
Most tests are either compfailing or runfailing.

They should start passing once we start using ATTACH map-type based
codegen. (#153683)

Even after they start passing, there are a few places where the EXPECTED
and actual CHECKs are different, due to two main issues:
* use_device_ptr translation on `&p[0]` is not succeeding in looking-up
a previously mapped `&p[1]`
* privatization of byref use_device_addr operands is not happening
correctly.

The above should be fixed as separate standalone changes.
@abhinavgaba abhinavgaba changed the title [OpenMP][WIP] Use ATTACH maps for array-sections/subscripts on pointers. [OpenMP] Use ATTACH map-type for mapping expressions with base-pointers. Aug 26, 2025
@abhinavgaba abhinavgaba changed the title [OpenMP] Use ATTACH map-type for mapping expressions with base-pointers. [OpenMP][Clang] Use ATTACH map-type for list-items with base-pointers. Aug 27, 2025
@abhinavgaba abhinavgaba changed the title [OpenMP][Clang] Use ATTACH map-type for list-items with base-pointers. [OpenMP][Clang] Use ATTACH map-type for list-items with base-pointers. Aug 27, 2025
abhinavgaba added a commit to abhinavgaba/llvm-project that referenced this pull request Aug 27, 2025
…base-ptrs.

These have been pulled out of the codegen PR llvm#153683, to reduce the size
of that PR.
@jtb20
Copy link
Contributor

jtb20 commented Aug 28, 2025

I've spent some time looking through these patches. First off, I see that the test case from #141042 is partially fixed -- it works as written in the issue, but not if the commented-out #pragma omp target lines is exchanged for the active one, i.e.:

//#pragma omp target map(tofrom: v->s1->b, v->s1->c, v->s2->b)
#pragma omp target map(to: v[:1]) map(tofrom: v->s1->b, v->s1->c, v->s2->b)
  {
    v->s1->b += 3;
    v->s1->c += 5;
    v->s2->b += 7;
  }

This gives mappings like this:

omptarget --> Entry  0: Base=0x00006019ea405760, Begin=0x00006019ea405760, Size=24, Type=0x21, Name=v[:1]
omptarget --> Entry  1: Base=0x00007ffd26480870, Begin=0x00006019ea405760, Size=8, Type=0x4000, Name=v[:1]
omptarget --> Entry  2: Base=0x00006019ea4057a0, Begin=0x00006019ea4057a0, Size=1, Type=0x0, Name=unknown
omptarget --> Entry  3: Base=0x00006019ea4057a0, Begin=0x00006019ea4057a0, Size=12, Type=0x3000000000001, Name=v->s1[0]
omptarget --> Entry  4: Base=0x00006019ea405768, Begin=0x00006019ea4057a0, Size=8, Type=0x4000, Name=v->s1[0]
omptarget --> Entry  5: Base=0x00006019ea4057a0, Begin=0x00006019ea4057a4, Size=4, Type=0x3000000000003, Name=v->s1->b
omptarget --> Entry  6: Base=0x00006019ea4057a0, Begin=0x00006019ea4057a8, Size=4, Type=0x3000000000003, Name=v->s1->c
omptarget --> Entry  7: Base=0x00006019ea405768, Begin=0x00006019ea4057a0, Size=8, Type=0x4000, Name=v->s1->c
omptarget --> Entry  8: Base=0x00006019ea4057c0, Begin=0x00006019ea4057c4, Size=4, Type=0x3, Name=v->s2->b
omptarget --> Entry  9: Base=0x00006019ea405770, Begin=0x00006019ea4057c4, Size=8, Type=0x4000, Name=v->s2->b

I think entry 2 is bogus, particularly the "size 1" bit.

At a high level, this patch series tackles the problem of parsing complex expressions by keying off the attach-pointer component, whereas my (admittedly less developed!) patch series adjusts the partial-struct (overlap) handling code to group parts of the expression by "containing struct" base. That's similar but not quite the same, and I'm not sure how to reconcile the two approaches, or even if that's necessary -- but if it's not, then I'm not sure how to handle cases such as the above.

(This one, I mean: #153672)

When I worked on this problem for GCC, I ended up creating a kind of tiny DSL just to parse expressions for mapping, described briefly in this patch submission:

https://gcc.gnu.org/pipermail/gcc-patches/2023-August/627897.html

For LLVM, I don't think we should use exactly the same approach, but I was thinking of something conceptually similar, probably along the lines of an iterator that walks over each "chunk" of an expression, with members to query things like attachment points, containing struct bases, innermost and outermost component accesses, or the "leaf"-type of the chunk (array section expr, array shaping expr, and so on). Incrementing the iterator would then (I think) move to the next indirection level, or the next containing-struct base, with details of course TBD.

That would allow the rather complex and hard-to-follow (existing) logic in generateInfoForComponentList to be somewhat simplified, I think.

(That's not the same as the ComponentListRefPtrPteeIterator in my posted patch, but could be done similarly, perhaps.)

Also improves one debug dump regarding pointer-attachment.
@abhinavgaba
Copy link
Contributor Author

First off, I see that the test case from #141042 is partially fixed -- it works as written in the issue, but not if the commented-out #pragma omp target lines is exchanged for the active one, i.e.:

//#pragma omp target map(tofrom: v->s1->b, v->s1->c, v->s2->b)
#pragma omp target map(to: v[:1]) map(tofrom: v->s1->b, v->s1->c, v->s2->b)
  {
    v->s1->b += 3;
    v->s1->c += 5;
    v->s2->b += 7;
  }

Thanks for trying the patch. The commented out code from #141042 is expected to work, but as discussed here, the original is not expected to work as per OpenMP. Since there is no map with v as either the list-item, or the base-pointer of a list-item, the implicit data-mapping rule for v, which is map(v[0:0]) will kick-in. This just does a lookup, and the lookup will fail because v[0] does not have corresponding storage on the device. So, as per OpenMP, v[0:0] will result in v containing its original host value inside the kernel. So, v->s1/v->s2/v->s3 inside the target region will be doing a dereference of the host pointer v.

This gives mappings like this:

omptarget --> Entry  0: Base=0x00006019ea405760, Begin=0x00006019ea405760, Size=24, Type=0x21, Name=v[:1]
omptarget --> Entry  1: Base=0x00007ffd26480870, Begin=0x00006019ea405760, Size=8, Type=0x4000, Name=v[:1]
omptarget --> Entry  2: Base=0x00006019ea4057a0, Begin=0x00006019ea4057a0, Size=1, Type=0x0, Name=unknown
omptarget --> Entry  3: Base=0x00006019ea4057a0, Begin=0x00006019ea4057a0, Size=12, Type=0x3000000000001, Name=v->s1[0]
omptarget --> Entry  4: Base=0x00006019ea405768, Begin=0x00006019ea4057a0, Size=8, Type=0x4000, Name=v->s1[0]
omptarget --> Entry  5: Base=0x00006019ea4057a0, Begin=0x00006019ea4057a4, Size=4, Type=0x3000000000003, Name=v->s1->b
omptarget --> Entry  6: Base=0x00006019ea4057a0, Begin=0x00006019ea4057a8, Size=4, Type=0x3000000000003, Name=v->s1->c
omptarget --> Entry  7: Base=0x00006019ea405768, Begin=0x00006019ea4057a0, Size=8, Type=0x4000, Name=v->s1->c
omptarget --> Entry  8: Base=0x00006019ea4057c0, Begin=0x00006019ea4057c4, Size=4, Type=0x3, Name=v->s2->b
omptarget --> Entry  9: Base=0x00006019ea405770, Begin=0x00006019ea4057c4, Size=8, Type=0x4000, Name=v->s2->b

I think entry 2 is bogus, particularly the "size 1" bit.

Did you try an older version of the patch? With the current version, I get the following:

❯ rg "pragma omp target" llorg-issue141042.cpp                                        
44:/* #pragma omp target map(tofrom: v->s1->b, v->s1->c, v->s2->b) */       
45:#pragma omp target map(to: v[:1]) map(tofrom: v->s1->b, v->s1->c, v->s2->b)

❯ clang -O0 -fopenmp -fopenmp-targets=x86_64 llorg-issue141042.cpp -g && LIBOMPTARGET_DEBUG=1 ./a.out 2>&1 | rg "entry *\d" -i

// For v[:1]
omptarget --> Entry  0: Base=0x00005ecf70c87ad0, Begin=0x00005ecf70c87ad0, Size=24, Type=0x21, Name=v[:1]
omptarget --> Entry  1: Base=0x00007ffcba0be7d8, Begin=0x00005ecf70c87ad0, Size=8, Type=0x4000, Name=v[:1] // ATTACH entry for conditional attachment between v and v[0]

// For v->s1->b and v->s1->c, including the containing structure
omptarget --> Entry  2: Base=0x00005ecf70c87b10, Begin=0x00005ecf70c87b14, Size=8, Type=0x0, Name=unknown
omptarget --> Entry  3: Base=0x00005ecf70c87b10, Begin=0x00005ecf70c87b14, Size=4, Type=0x3000000000003, Name=v->s1->b
omptarget --> Entry  4: Base=0x00005ecf70c87b10, Begin=0x00005ecf70c87b18, Size=4, Type=0x3000000000003, Name=v->s1->c
// A single ATTACH entry (instead of creating one for both v->s1->b and v->s1->c).
// Attaches v->s1 to the contigous pointee that contains v->s1->b and v->s1->c.
omptarget --> Entry  5: Base=0x00005ecf70c87ad8, 
Begin=0x00005ecf70c87b14, Size=8, Type=0x4000, Name=v->s1->c


// For v->s2->b. No containing struct allocation needed as no other member of v->s2[0] is mapped.
omptarget --> Entry  6: Base=0x00005ecf70c87b30, Begin=0x00005ecf70c87b34, Size=4, Type=0x3, Name=v->s2->b
omptarget --> Entry  7: Base=0x00005ecf70c87ae0, Begin=0x00005ecf70c87b34, Size=8, Type=0x4000, Name=v->s2->b // ATTACH

// Attach specific debug logs. 
omptarget --> Processing ATTACH entry 0: HstPtr=0x00007ffcba0be7d8, HstPteeBegin=0x00005ecf70c87ad0, Size=8, Type=0x4000
omptarget --> Skipping ATTACH entry 0: pointer not present on device
omptarget --> Processing ATTACH entry 1: HstPtr=0x00005ecf70c87ad8, HstPteeBegin=0x00005ecf70c87b14, Size=8, Type=0x4000
omptarget --> ATTACH entry 1 processed successfully
omptarget --> Processing ATTACH entry 2: HstPtr=0x00005ecf70c87ae0, HstPteeBegin=0x00005ecf70c87b34, Size=8, Type=0x4000
omptarget --> ATTACH entry 2 processed successfully
omptarget --> Ignoring ATTACH entry 7 in targetDataEnd
omptarget --> Ignoring ATTACH entry 5 in targetDataEnd
omptarget --> Ignoring ATTACH entry 1 in targetDataEnd

At a high level, this patch series tackles the problem of parsing complex expressions by keying off the attach-pointer component, whereas my (admittedly less developed!) patch series adjusts the partial-struct (overlap) handling code to group parts of the expression by "containing struct" base. That's similar but not quite the same, and I'm not sure how to reconcile the two approaches, or even if that's necessary -- but if it's not, then I'm not sure how to handle cases such as the above.

Aside from grouping containing structures, another reason for grouping component-lists starting with the same variable, by the complexity of their attach-pointers, is that the kernel argument is determined based on whether the pointer is mapped, or its pointee (see #145454, or line clang/lib/CodeGen/CGOpenMPRuntime.cpp:9749 in this PR).

Do you think there are any cases that we won't be able to handle with this approach?

With this PR, we are able to reuse the existing logic for determining the partial struct, since the grouping of component-lists happens at a higher level. It might still be desirable to update the partial-struct handling if it makes things easier to read/follow. At the same time, it can continue to assume that it's only working with a set of component-lists that will either have no overlap, or belong to the same containing struct.

@jtb20
Copy link
Contributor

jtb20 commented Aug 28, 2025

First off, I see that the test case from #141042 is partially fixed -- it works as written in the issue, but not if the commented-out #pragma omp target lines is exchanged for the active one, i.e.:

//#pragma omp target map(tofrom: v->s1->b, v->s1->c, v->s2->b)
#pragma omp target map(to: v[:1]) map(tofrom: v->s1->b, v->s1->c, v->s2->b)
  {
    v->s1->b += 3;
    v->s1->c += 5;
    v->s2->b += 7;
  }

Thanks for trying the patch. The commented out code from #141042 is expected to work, but as discussed here, the original is not expected to work as per OpenMP. Since there is no map with v as either the list-item, or the base-pointer of a list-item, the implicit data-mapping rule for v, which is map(v[0:0]) will kick-in. This just does a lookup, and the lookup will fail because v[0] does not have corresponding storage on the device. So, as per OpenMP, v[0:0] will result in v containing its original host value inside the kernel. So, v->s1/v->s2/v->s3 inside the target region will be doing a dereference of the host pointer v.

This gives mappings like this:

omptarget --> Entry  0: Base=0x00006019ea405760, Begin=0x00006019ea405760, Size=24, Type=0x21, Name=v[:1]
omptarget --> Entry  1: Base=0x00007ffd26480870, Begin=0x00006019ea405760, Size=8, Type=0x4000, Name=v[:1]
omptarget --> Entry  2: Base=0x00006019ea4057a0, Begin=0x00006019ea4057a0, Size=1, Type=0x0, Name=unknown
omptarget --> Entry  3: Base=0x00006019ea4057a0, Begin=0x00006019ea4057a0, Size=12, Type=0x3000000000001, Name=v->s1[0]
omptarget --> Entry  4: Base=0x00006019ea405768, Begin=0x00006019ea4057a0, Size=8, Type=0x4000, Name=v->s1[0]
omptarget --> Entry  5: Base=0x00006019ea4057a0, Begin=0x00006019ea4057a4, Size=4, Type=0x3000000000003, Name=v->s1->b
omptarget --> Entry  6: Base=0x00006019ea4057a0, Begin=0x00006019ea4057a8, Size=4, Type=0x3000000000003, Name=v->s1->c
omptarget --> Entry  7: Base=0x00006019ea405768, Begin=0x00006019ea4057a0, Size=8, Type=0x4000, Name=v->s1->c
omptarget --> Entry  8: Base=0x00006019ea4057c0, Begin=0x00006019ea4057c4, Size=4, Type=0x3, Name=v->s2->b
omptarget --> Entry  9: Base=0x00006019ea405770, Begin=0x00006019ea4057c4, Size=8, Type=0x4000, Name=v->s2->b

I think entry 2 is bogus, particularly the "size 1" bit.

Did you try an older version of the patch? With the current version, I get the following:

It's not impossible... I think that was with hash 087945a (map-ptr-array-section-using-attach-maptype branch). (Edit: yes, user-error on my part here, apologies!)

❯ rg "pragma omp target" llorg-issue141042.cpp                                        
44:/* #pragma omp target map(tofrom: v->s1->b, v->s1->c, v->s2->b) */       
45:#pragma omp target map(to: v[:1]) map(tofrom: v->s1->b, v->s1->c, v->s2->b)

❯ clang -O0 -fopenmp -fopenmp-targets=x86_64 llorg-issue141042.cpp -g && LIBOMPTARGET_DEBUG=1 ./a.out 2>&1 | rg "entry *\d" -i

// For v[:1]
omptarget --> Entry  0: Base=0x00005ecf70c87ad0, Begin=0x00005ecf70c87ad0, Size=24, Type=0x21, Name=v[:1]
omptarget --> Entry  1: Base=0x00007ffcba0be7d8, Begin=0x00005ecf70c87ad0, Size=8, Type=0x4000, Name=v[:1] // ATTACH entry for conditional attachment between v and v[0]

// For v->s1->b and v->s1->c, including the containing structure
omptarget --> Entry  2: Base=0x00005ecf70c87b10, Begin=0x00005ecf70c87b14, Size=8, Type=0x0, Name=unknown
omptarget --> Entry  3: Base=0x00005ecf70c87b10, Begin=0x00005ecf70c87b14, Size=4, Type=0x3000000000003, Name=v->s1->b
omptarget --> Entry  4: Base=0x00005ecf70c87b10, Begin=0x00005ecf70c87b18, Size=4, Type=0x3000000000003, Name=v->s1->c
// A single ATTACH entry (instead of creating one for both v->s1->b and v->s1->c).
// Attaches v->s1 to the contigous pointee that contains v->s1->b and v->s1->c.
omptarget --> Entry  5: Base=0x00005ecf70c87ad8, 
Begin=0x00005ecf70c87b14, Size=8, Type=0x4000, Name=v->s1->c


// For v->s2->b. No containing struct allocation needed as no other member of v->s2[0] is mapped.
omptarget --> Entry  6: Base=0x00005ecf70c87b30, Begin=0x00005ecf70c87b34, Size=4, Type=0x3, Name=v->s2->b
omptarget --> Entry  7: Base=0x00005ecf70c87ae0, Begin=0x00005ecf70c87b34, Size=8, Type=0x4000, Name=v->s2->b // ATTACH

// Attach specific debug logs. 
omptarget --> Processing ATTACH entry 0: HstPtr=0x00007ffcba0be7d8, HstPteeBegin=0x00005ecf70c87ad0, Size=8, Type=0x4000
omptarget --> Skipping ATTACH entry 0: pointer not present on device
omptarget --> Processing ATTACH entry 1: HstPtr=0x00005ecf70c87ad8, HstPteeBegin=0x00005ecf70c87b14, Size=8, Type=0x4000
omptarget --> ATTACH entry 1 processed successfully
omptarget --> Processing ATTACH entry 2: HstPtr=0x00005ecf70c87ae0, HstPteeBegin=0x00005ecf70c87b34, Size=8, Type=0x4000
omptarget --> ATTACH entry 2 processed successfully
omptarget --> Ignoring ATTACH entry 7 in targetDataEnd
omptarget --> Ignoring ATTACH entry 5 in targetDataEnd
omptarget --> Ignoring ATTACH entry 1 in targetDataEnd

I'll go back at look at that in more detail, thank you.

At a high level, this patch series tackles the problem of parsing complex expressions by keying off the attach-pointer component, whereas my (admittedly less developed!) patch series adjusts the partial-struct (overlap) handling code to group parts of the expression by "containing struct" base. That's similar but not quite the same, and I'm not sure how to reconcile the two approaches, or even if that's necessary -- but if it's not, then I'm not sure how to handle cases such as the above.

Aside from grouping containing structures, another reason for grouping component-lists starting with the same variable, by the complexity of their attach-pointers, is that the kernel argument is determined based on whether the pointer is mapped, or its pointee (see #145454, or line clang/lib/CodeGen/CGOpenMPRuntime.cpp:9749 in this PR).

Do you think there are any cases that we won't be able to handle with this approach?

I'm not sure yet.

With this PR, we are able to reuse the existing logic for determining the partial struct, since the grouping of component-lists happens at a higher level. It might still be desirable to update the partial-struct handling if it makes things easier to read/follow. At the same time, it can continue to assume that it's only working with a set of component-lists that will either have no overlap, or belong to the same containing struct.

I'm still working to better understand the original code, and your patch set, so please bear with me!

General questions:

  • Do you expect each map clause/list item to generate a maximum of one attach operation? (IIRC things get complicated regarding references to pointers.)

  • For reasons I admittedly haven't entirely understood, generateInfoForComponentList is called three times from two different functions -- from generateAllInfoForClauses (via generateAllInfo, genMapInfo), or twice via generateInfoForCaptureFromComponentLists (via generateInfoForCaptureFromClauseInfo, genMapInfoForCaptures, genMapInfo). The new ATTACH handling bits, sorting by complexity and so on, are mostly only present in the former case (non-capture dependent), but overlapped mapping handling is only present in the latter case (capture-dependent). What's the rationale for that? (I guess the attach operations should take place whether or not the variable is actually used in the target region, for correctness?)

  • What stops a given clause being processed twice (in non-capture/capture cases)? (Edit: maybe "IsMapInfoExist"?)

  • Does attachment handling is (at present) inhibit overlapped-member handling? (Edit: or generally, are there unintended interactions between capturing of vars, overlapped members and attachment handling?)

@jtb20
Copy link
Contributor

jtb20 commented Aug 28, 2025

Here's a test case that might need more complex handling for overlapped mappings:

#include <cstdlib>
#include <cstdio>
#include <cassert>

struct R {
  int d;
  int e;
  int f;
};

struct S {
  R *r0;
  R *r1;
  R *r2;
};

struct T {
  S *s0;
  S *s1;
  S *s2;
};

void print_r(const char *pfx, int num, R *r) {
  fprintf (stderr, "%sr%d->d=%d, %sr%d->e=%d, %sr%d->f=%d\n", pfx, num, r->d, pfx, num, r->e, pfx, num, r->f);
}

void print_s(const char *pfx0, int num, S *s) {
  char pfx[20];
  sprintf (pfx, "%ss%d->", pfx0, num);
  print_r(pfx, 0, s->r0);
  print_r(pfx, 1, s->r1);
  print_r(pfx, 2, s->r2);
  fputc('\n', stderr);
}

//#define NO_OVERLAPS

int main() {
  T *v = (T *) malloc (sizeof(T));
  T w, x, z;
  v->s0 = (S *) malloc (sizeof(S));
  v->s1 = (S *) malloc (sizeof(S));
  v->s2 = (S *) malloc (sizeof(S));

  v->s0->r0 = (R *) calloc (1, sizeof(R));
  v->s0->r1 = (R *) calloc (1, sizeof(R));
  v->s0->r2 = (R *) calloc (1, sizeof(R));

  v->s1->r0 = (R *) calloc (1, sizeof(R));
  v->s1->r1 = (R *) calloc (1, sizeof(R));
  v->s1->r2 = (R *) calloc (1, sizeof(R));

  v->s2->r0 = (R *) calloc (1, sizeof(R));
  v->s2->r1 = (R *) calloc (1, sizeof(R));
  v->s2->r2 = (R *) calloc (1, sizeof(R));

#ifdef NO_OVERLAPS
  #pragma omp target map(to: v->s1, v->s2, v->s1->r1, v->s1->r2, v->s2, v->s2->r0) \
                     map(tofrom: v->s1->r1->d, v->s1->r1->e, v->s1->r2->d, v->s1->r2->f, v->s2->r0->e)
  {
    v->s1->r1->d += 3;
    v->s1->r1->e += 5;
    v->s1->r2->d += 7;
    v->s1->r2->f += 9;
    v->s2->r0->e += 11;
  }
#else
  #pragma omp target map(to: v->s1, v->s2, *v->s1, v->s1->r1, *v->s2, v->s2->r0) \
                     map(tofrom: v->s1->r1->d, v->s1->r1->e, v->s1->r2->d, v->s1->r2->f, v->s2->r0->e)
  {
    v->s1->r1->d += 3;
    v->s1->r1->e += 5;
    v->s1->r2->d += 7;
    v->s1->r2->f += 9;
    v->s2->r0->e += 11;
  }
#endif

  print_s("v->", 0, v->s0);
  print_s("v->", 1, v->s1);
  print_s("v->", 2, v->s2);

  free(v->s0->r0);
  free(v->s0->r1);
  free(v->s0->r2);
  free(v->s1->r0);
  free(v->s1->r1);
  free(v->s1->r2);
  free(v->s2->r0);
  free(v->s2->r1);
  free(v->s2->r2);
  free(v->s0);
  free(v->s1);
  free(v->s2);
  free(v);

  return 0;
}

It works (with this patchset) with NO_OVERLAPS defined, but not with it commented out as shown.

@abhinavgaba
Copy link
Contributor Author

Here's a test case that might need more complex handling for overlapped mappings

Thanks for the test! It helped identify the need to use the appropriate pointer-type rather than void* in the load from attach-ptr-addr, to get the pointee-base-addr. With void*, the size-computation done for the combined-entry creation was resulting in 1, instead of the size of the struct. Please try with the latest commit. It also removes some redundant attachments, as now ATTACH is only emitted once per attach-pointer group. Could you add your tests to offload/test/mapping as end-to-end tests and mark them as XFAIL? Once they are merged, the XFAIL can be removed with this PR.

* Do you expect each map clause/list item to generate a maximum of one attach operation? (IIRC things get complicated regarding references to pointers.)

From OpenMP's perspective, there can only be one ATTACH operation per list-item. If a list-item is a reference variable that's a member of a struct, like:

int x;
int p;
struct S {
  int &xr  = x;
  int &pr = p;
};

map(s.xr) involves an attachment between the ref_ptr(s.xr) and ref_ptee(s.xr), while if we have a map like map(s.pr[1]), the attachment should happen between ref_ptee(s.pr) and &s.pr[1].

A user can map both together, like map(s.pr, s.pr[1]), which would involve two attachments, but in that case, the attach-ptrs for the two would be ref_ptr(s.pr) and ref_ptee(s.pr). One way to enable that would be to have that distinction between ref_ptr/ref_ptee explicit in the component-lists populated in SemaOpenMP, and then update the findAttachPtr to get the expr from the proper component. But that's out of scope for this PR.

* For reasons I admittedly haven't entirely understood, `generateInfoForComponentList` is called three times from two different functions -- from `generateAllInfoForClauses` (via `generateAllInfo`, `genMapInfo`), or twice via `generateInfoForCaptureFromComponentLists` (via `generateInfoForCaptureFromClauseInfo`, `genMapInfoForCaptures`, `genMapInfo`). The new `ATTACH` handling bits, sorting by complexity and so on, are mostly only present in the former case (non-capture dependent), but overlapped mapping handling is only present in the latter case (capture-dependent). What's the rationale for that? (I guess the attach operations should take place whether or not the 
* What stops a given clause being processed twice (in non-capture/capture cases)? (Edit: maybe "IsMapInfoExist"?)

variable is actually used in the target region, for correctness?)

For target constructs, first mapping of "Captured" vars is done. These are the variables referenced inside the body of the construct. This is done via genMapInfoForCaptures, which internally calls genDefaultMapInfo or generateInfoForCaptureFromClauseInfo. genDefaultMapInfo emits a single map, that should happen first, as it needs to decide the kernel argument. So no grouping is required. The grouping of maps is done for generateInfoForCaptureFromClauseInfo as it's working with non-default clauses with list-items that use the captured variable in the map clause expression.

After the captured vars have been handled, the remaining map clauses are handled in genMapInfo, which internally calls generateAllInfoForClauses. This again first groups the list-items by the VarDecl from the component-list, and then by the attach-ptr exprs.

generateAllInfoForClauses also covers codegen for target data/enter-data/exit-data.

* Does attachment handling is (at present) inhibit overlapped-member handling? (Edit: or generally, are there unintended interactions between capturing of vars, overlapped members and attachment handling?)

The attach handling should not affect handling of overlapping members etc, which is handled by the PartialStruct processing. Each attach-ptr-group (list of component-lists with the same attach-ptr) has its own PartialStruct, so the handling of PartialStruct works transparently. Each of the individual components in the list have the correct base-address, which is the address of a load from attach-ptr, which acts as the base address of the containing struct in case of an overlap. See clang/test/OpenMP/target_map_pointer_defalut_mapper_codegen.cpp for some pseudo-code and IR for reference.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants