[OpenMP][Clang] Use `ATTACH` map-type for list-items with base-pointers. #153683

abhinavgaba · 2025-08-14T21:02:30Z

This adds support for using ATTACH map-type for proper pointer-attachment when mapping list-items that have base-pointers.

For example, for the following:

  int *p;
  #pragma omp target enter data map(p[1:10])

The following maps are now emitted by clang:

  (A)
  &p[0], &p[1], 10 * sizeof(p[1]), TO | FROM
  &p, &p[1], sizeof(p), ATTACH

Previously, the two possible maps emitted by clang were:

  (B)
  &p[0], &p[1], 10 * sizeof(p[1]), TO | FROM

  (C)
  &p, &p[1], 10 * sizeof(p[1]), TO | FROM | PTR_AND_OBJ

(B) does not perform any pointer attachment, while (C) also maps the
pointer p, both of which are incorrect.

With this change, we are using ATTACH-style maps, like (A), for cases where the expression has a base-pointer. For example:

  int *p, **pp;
  S *ps, **pps;
  ... map(p[0])
  ... map(p[10:20])
  ... map(*p)
  ... map(([20])p)
  ... map(ps->a)
  ... map(pps->p->a)
  ... map(pp[0][0])
  ... map(*(pp + 10)[0])

Why a large PR

While it's unfortunate that this PR has gotten large and difficult to review, the issue is that all the functional changes have to be made together, to prevent regressions from partially implemented changes.

For example, the changes to capturing were previously done separately (#145454), but they would still cause stability issues in absence of full attach-mapping. And attach-mapping needs those changes to be able to launch kernels.

We can extract the utilities and functions, like those for finding attach-ptrs, or comparing exprs, out as a separate NFC PR that doesn't call those functions, just adds them (#155625). And maybe the change that adds a new error message for use_device_addr on array-sections with non-var base-pointers can be extracted out too (but that will have to be a follow-up change in that case, and we may get comp-fails with this PR when the erroneous case is not caught/diagnosed).

Grouping of maps based on attach base-pointers

We also group mapping of clauses with the same base decl in the order of the increasing complexity of their base-pointers, e.g. for something like:

  S **spp;
  map(spp[0][0], spp[0][0].a), // attach-ptr: spp[0]
  map(spp[0]),                 // attach-ptr: spp
  map(spp),                    // attach-ptr: N/A

We first map spp, then spp[0] then spp[0][0] and spp[0][0].a.

This allows us to also group "struct" allocation based on their attach pointers. This resolves the issues of us always mapping everything from the beginning of the symbol spp. Each group is mapped independently, and at the same level, like spp[0][0] and its member spp[0][0].a, we still get map them together as part of the same contiguous struct spp[0][0]. This resolves issue #141042.

use_device_ptr/addr fixes

The handling of use_device_ptr/addr was updated to use the attach-ptr information, and works for many cases that were failing before. It has to be done as part of this series because otherwise, the switch from ptr_to_obj to attach-style mapping would have caused regressions in existing use_device_ptr/addr tests.

Handling of attach-pointers that are members of implicitly mapped structs:

When a struct member-pointer, like p below, is a base-pointer in a map clause on a target construct (like map(p[0:1]), and the base of that struct is either the this pointer (implicitly or explicitly), or a struct that is implicitly mapped on that construct, we add an implicit map(p) so that we don't implicitly map the full struct.

 struct S { int *p;
 void f1() {
   #pragma omp target map(p[0:1]) // Implicitly map this->p, to ensure
                                  // that the implicit map of `this[:]` does
                                  // not map the full struct
      printf("%p %p\n", &p, p);
 }

Scope for improvement:

We may be able to compute attach-ptr expr while collecting component-lists in Sema.
- But we cache the computation results already, and findAttachPtrExpr is fairly simple, and fast.
There may be a better way to implement semantic expr comparison.

Needs future work:

Attach-style maps not yet emitted for declare mappers.
Mapping of class member references: We are still using PTR_AND_OBJ maps for them. We will likely need to change that to handle ref_ptr/ref_ptee, and attach map-type-modifier on them.

For the following: ```c int *p; \#pragma omp target map(p[0]) // (A) (void)p; \#pragma omp target map(p) // (B) (void)p; \#pragma omp target map(p, p[0]) // (C) (void)p; \#pragma omp target map(p[0], p) // (D) (void)p; ``` For (A), the pointer `p` is predetermined `firstprivate`, so it should be (and is) captured by-copy. However, for (B), (C), and (D), since `p` is already listed in a `map` clause, it's not predetermined `firstprivate`, and hence, should be captured by-reference, like any other mapped variable. To ensure the correct handling of (C) and (D), the following changes were made: 1. In SemaOpenMP, we now ensure that `p` is marked to be captured by-reference in these cases. 2. We no longer ignore `map(p)` during codegen of `target` constructs, even if there's another map like `map(p[0])` that would have been mapped using a PTR_AND_OBJ map. 3. For cases like (D), we now handle `map(p)` before `map(p[0])`, so the former gets the TARGET_PARAM flag and sets the kernel argument.

… before their derefs.

The output of the compile-and-run tests is incorrect. These will be used for reference in future commits that resolve the issues. Also updated the existing clang LIT test, target_map_both_pointer_pointee_codegen.cpp, with more regions and more narrowed-down update_cc_test_checks filters.

…mapped-ptrs-by-ref

…ion-using-attach-maptype

…rs-by-ref

…ion-using-attach-maptype

This patch introduces libomptarget support for the ATTACH map-type, which can be used to implement OpenMP conditional compliant pointer attachment, based on whether the pointer/pointee is newly mapped on a given construct. For example, for the following: ```c int *p; #pragma omp target enter data map(p[1:10]) ``` The following maps can be emitted by clang: ``` (A) &p[0], &p[1], 10 * sizeof(p[1]), TO | FROM &p, &p[1], sizeof(p), ATTACH ``` Without this map-type, the two possible maps emitted by clang: ``` (B) &p[0], &p[1], 10 * sizeof(p[1]), TO | FROM (C) &p, &p[1], 10 * sizeof(p[1]), TO | FROM | PTR_AND_OBJ ```` (B) does not perform any pointer attachment, while (C) also maps the pointer p, which are both incorrect. In terms of implementation, maps with the ATTACH map-type are handled after all other maps have been processed, as it requires knowledge of which new allocations happened as part of the construct. As per OpenMP 5.0, an attachment should happen only when either the pointer or the pointee was newly mapped while handling the construct. Maps with ATTACH map-type-bit do not increase/decrease the ref-count. With OpenMP 6.1, `attach(always/never)` can be used to force/prevent attachment. For `attach(always)`, the compiler will insert the ALWAYS map-type, which would let libomptarget bypass the check about one of the pointer/pointee being new. With `attach(never)`, the ATTACH map will not be emitted at all. The size argument of the ATTACH map-type can specify values greater than `sizeof(void*)` which can be used to support pointer attachment on Fortran descriptors. Note that this also requires shadow-pointer tracking to also support them. That has not been implemented in this patch. This was worked upon in coordination with Ravi Narayanaswamy, who has since retired. Happy retirement, Ravi!

…addr-tests

…on-using-attach-maptype

Most tests are either compfailing or runfailing. They should start passing once we start using ATTACH map-type based codegen. (#153683) Even after they start passing, there are a few places where the EXPECTED and actual CHECKs are different, due to two main issues: * use_device_ptr translation on `&p[0]` is not succeeding in looking-up a previously mapped `&p[1]` * privatization of byref use_device_addr operands is not happening correctly. The above should be fixed as separate standalone changes.

…on-using-attach-maptype

…1]'.

…base-ptrs. These have been pulled out of the codegen PR llvm#153683, to reduce the size of that PR.

jtb20 · 2025-08-28T13:22:07Z

I've spent some time looking through these patches. First off, I see that the test case from #141042 is partially fixed -- it works as written in the issue, but not if the commented-out #pragma omp target lines is exchanged for the active one, i.e.:

//#pragma omp target map(tofrom: v->s1->b, v->s1->c, v->s2->b)
#pragma omp target map(to: v[:1]) map(tofrom: v->s1->b, v->s1->c, v->s2->b)
  {
    v->s1->b += 3;
    v->s1->c += 5;
    v->s2->b += 7;
  }

This gives mappings like this:

omptarget --> Entry  0: Base=0x00006019ea405760, Begin=0x00006019ea405760, Size=24, Type=0x21, Name=v[:1]
omptarget --> Entry  1: Base=0x00007ffd26480870, Begin=0x00006019ea405760, Size=8, Type=0x4000, Name=v[:1]
omptarget --> Entry  2: Base=0x00006019ea4057a0, Begin=0x00006019ea4057a0, Size=1, Type=0x0, Name=unknown
omptarget --> Entry  3: Base=0x00006019ea4057a0, Begin=0x00006019ea4057a0, Size=12, Type=0x3000000000001, Name=v->s1[0]
omptarget --> Entry  4: Base=0x00006019ea405768, Begin=0x00006019ea4057a0, Size=8, Type=0x4000, Name=v->s1[0]
omptarget --> Entry  5: Base=0x00006019ea4057a0, Begin=0x00006019ea4057a4, Size=4, Type=0x3000000000003, Name=v->s1->b
omptarget --> Entry  6: Base=0x00006019ea4057a0, Begin=0x00006019ea4057a8, Size=4, Type=0x3000000000003, Name=v->s1->c
omptarget --> Entry  7: Base=0x00006019ea405768, Begin=0x00006019ea4057a0, Size=8, Type=0x4000, Name=v->s1->c
omptarget --> Entry  8: Base=0x00006019ea4057c0, Begin=0x00006019ea4057c4, Size=4, Type=0x3, Name=v->s2->b
omptarget --> Entry  9: Base=0x00006019ea405770, Begin=0x00006019ea4057c4, Size=8, Type=0x4000, Name=v->s2->b

I think entry 2 is bogus, particularly the "size 1" bit.

At a high level, this patch series tackles the problem of parsing complex expressions by keying off the attach-pointer component, whereas my (admittedly less developed!) patch series adjusts the partial-struct (overlap) handling code to group parts of the expression by "containing struct" base. That's similar but not quite the same, and I'm not sure how to reconcile the two approaches, or even if that's necessary -- but if it's not, then I'm not sure how to handle cases such as the above.

(This one, I mean: #153672)

When I worked on this problem for GCC, I ended up creating a kind of tiny DSL just to parse expressions for mapping, described briefly in this patch submission:

https://gcc.gnu.org/pipermail/gcc-patches/2023-August/627897.html

For LLVM, I don't think we should use exactly the same approach, but I was thinking of something conceptually similar, probably along the lines of an iterator that walks over each "chunk" of an expression, with members to query things like attachment points, containing struct bases, innermost and outermost component accesses, or the "leaf"-type of the chunk (array section expr, array shaping expr, and so on). Incrementing the iterator would then (I think) move to the next indirection level, or the next containing-struct base, with details of course TBD.

That would allow the rather complex and hard-to-follow (existing) logic in generateInfoForComponentList to be somewhat simplified, I think.

(That's not the same as the ComponentListRefPtrPteeIterator in my posted patch, but could be done similarly, perhaps.)

Also improves one debug dump regarding pointer-attachment.

abhinavgaba · 2025-08-28T15:40:11Z

First off, I see that the test case from #141042 is partially fixed -- it works as written in the issue, but not if the commented-out #pragma omp target lines is exchanged for the active one, i.e.:
//#pragma omp target map(tofrom: v->s1->b, v->s1->c, v->s2->b)
#pragma omp target map(to: v[:1]) map(tofrom: v->s1->b, v->s1->c, v->s2->b)
  {
    v->s1->b += 3;
    v->s1->c += 5;
    v->s2->b += 7;
  }

Thanks for trying the patch. The commented out code from #141042 is expected to work, but as discussed here, the original is not expected to work as per OpenMP. Since there is no map with v as either the list-item, or the base-pointer of a list-item, the implicit data-mapping rule for v, which is map(v[0:0]) will kick-in. This just does a lookup, and the lookup will fail because v[0] does not have corresponding storage on the device. So, as per OpenMP, v[0:0] will result in v containing its original host value inside the kernel. So, v->s1/v->s2/v->s3 inside the target region will be doing a dereference of the host pointer v.

This gives mappings like this:

omptarget --> Entry  0: Base=0x00006019ea405760, Begin=0x00006019ea405760, Size=24, Type=0x21, Name=v[:1]
omptarget --> Entry  1: Base=0x00007ffd26480870, Begin=0x00006019ea405760, Size=8, Type=0x4000, Name=v[:1]
omptarget --> Entry  2: Base=0x00006019ea4057a0, Begin=0x00006019ea4057a0, Size=1, Type=0x0, Name=unknown
omptarget --> Entry  3: Base=0x00006019ea4057a0, Begin=0x00006019ea4057a0, Size=12, Type=0x3000000000001, Name=v->s1[0]
omptarget --> Entry  4: Base=0x00006019ea405768, Begin=0x00006019ea4057a0, Size=8, Type=0x4000, Name=v->s1[0]
omptarget --> Entry  5: Base=0x00006019ea4057a0, Begin=0x00006019ea4057a4, Size=4, Type=0x3000000000003, Name=v->s1->b
omptarget --> Entry  6: Base=0x00006019ea4057a0, Begin=0x00006019ea4057a8, Size=4, Type=0x3000000000003, Name=v->s1->c
omptarget --> Entry  7: Base=0x00006019ea405768, Begin=0x00006019ea4057a0, Size=8, Type=0x4000, Name=v->s1->c
omptarget --> Entry  8: Base=0x00006019ea4057c0, Begin=0x00006019ea4057c4, Size=4, Type=0x3, Name=v->s2->b
omptarget --> Entry  9: Base=0x00006019ea405770, Begin=0x00006019ea4057c4, Size=8, Type=0x4000, Name=v->s2->b

I think entry 2 is bogus, particularly the "size 1" bit.

Did you try an older version of the patch? With the current version, I get the following:

❯ rg "pragma omp target" llorg-issue141042.cpp                                        
44:/* #pragma omp target map(tofrom: v->s1->b, v->s1->c, v->s2->b) */       
45:#pragma omp target map(to: v[:1]) map(tofrom: v->s1->b, v->s1->c, v->s2->b)

❯ clang -O0 -fopenmp -fopenmp-targets=x86_64 llorg-issue141042.cpp -g && LIBOMPTARGET_DEBUG=1 ./a.out 2>&1 | rg "entry *\d" -i

// For v[:1]
omptarget --> Entry  0: Base=0x00005ecf70c87ad0, Begin=0x00005ecf70c87ad0, Size=24, Type=0x21, Name=v[:1]
omptarget --> Entry  1: Base=0x00007ffcba0be7d8, Begin=0x00005ecf70c87ad0, Size=8, Type=0x4000, Name=v[:1] // ATTACH entry for conditional attachment between v and v[0]

// For v->s1->b and v->s1->c, including the containing structure
omptarget --> Entry  2: Base=0x00005ecf70c87b10, Begin=0x00005ecf70c87b14, Size=8, Type=0x0, Name=unknown
omptarget --> Entry  3: Base=0x00005ecf70c87b10, Begin=0x00005ecf70c87b14, Size=4, Type=0x3000000000003, Name=v->s1->b
omptarget --> Entry  4: Base=0x00005ecf70c87b10, Begin=0x00005ecf70c87b18, Size=4, Type=0x3000000000003, Name=v->s1->c
// A single ATTACH entry (instead of creating one for both v->s1->b and v->s1->c).
// Attaches v->s1 to the contigous pointee that contains v->s1->b and v->s1->c.
omptarget --> Entry  5: Base=0x00005ecf70c87ad8, 
Begin=0x00005ecf70c87b14, Size=8, Type=0x4000, Name=v->s1->c


// For v->s2->b. No containing struct allocation needed as no other member of v->s2[0] is mapped.
omptarget --> Entry  6: Base=0x00005ecf70c87b30, Begin=0x00005ecf70c87b34, Size=4, Type=0x3, Name=v->s2->b
omptarget --> Entry  7: Base=0x00005ecf70c87ae0, Begin=0x00005ecf70c87b34, Size=8, Type=0x4000, Name=v->s2->b // ATTACH

// Attach specific debug logs. 
omptarget --> Processing ATTACH entry 0: HstPtr=0x00007ffcba0be7d8, HstPteeBegin=0x00005ecf70c87ad0, Size=8, Type=0x4000
omptarget --> Skipping ATTACH entry 0: pointer not present on device
omptarget --> Processing ATTACH entry 1: HstPtr=0x00005ecf70c87ad8, HstPteeBegin=0x00005ecf70c87b14, Size=8, Type=0x4000
omptarget --> ATTACH entry 1 processed successfully
omptarget --> Processing ATTACH entry 2: HstPtr=0x00005ecf70c87ae0, HstPteeBegin=0x00005ecf70c87b34, Size=8, Type=0x4000
omptarget --> ATTACH entry 2 processed successfully
omptarget --> Ignoring ATTACH entry 7 in targetDataEnd
omptarget --> Ignoring ATTACH entry 5 in targetDataEnd
omptarget --> Ignoring ATTACH entry 1 in targetDataEnd

At a high level, this patch series tackles the problem of parsing complex expressions by keying off the attach-pointer component, whereas my (admittedly less developed!) patch series adjusts the partial-struct (overlap) handling code to group parts of the expression by "containing struct" base. That's similar but not quite the same, and I'm not sure how to reconcile the two approaches, or even if that's necessary -- but if it's not, then I'm not sure how to handle cases such as the above.

Aside from grouping containing structures, another reason for grouping component-lists starting with the same variable, by the complexity of their attach-pointers, is that the kernel argument is determined based on whether the pointer is mapped, or its pointee (see #145454, or line clang/lib/CodeGen/CGOpenMPRuntime.cpp:9749 in this PR).

Do you think there are any cases that we won't be able to handle with this approach?

With this PR, we are able to reuse the existing logic for determining the partial struct, since the grouping of component-lists happens at a higher level. It might still be desirable to update the partial-struct handling if it makes things easier to read/follow. At the same time, it can continue to assume that it's only working with a set of component-lists that will either have no overlap, or belong to the same containing struct.

jtb20 · 2025-08-28T16:52:48Z

First off, I see that the test case from #141042 is partially fixed -- it works as written in the issue, but not if the commented-out #pragma omp target lines is exchanged for the active one, i.e.:
//#pragma omp target map(tofrom: v->s1->b, v->s1->c, v->s2->b)
#pragma omp target map(to: v[:1]) map(tofrom: v->s1->b, v->s1->c, v->s2->b)
  {
    v->s1->b += 3;
    v->s1->c += 5;
    v->s2->b += 7;
  }
Thanks for trying the patch. The commented out code from #141042 is expected to work, but as discussed here, the original is not expected to work as per OpenMP. Since there is no map with v as either the list-item, or the base-pointer of a list-item, the implicit data-mapping rule for v, which is map(v[0:0]) will kick-in. This just does a lookup, and the lookup will fail because v[0] does not have corresponding storage on the device. So, as per OpenMP, v[0:0] will result in v containing its original host value inside the kernel. So, v->s1/v->s2/v->s3 inside the target region will be doing a dereference of the host pointer v.
This gives mappings like this:
omptarget --> Entry  0: Base=0x00006019ea405760, Begin=0x00006019ea405760, Size=24, Type=0x21, Name=v[:1]
omptarget --> Entry  1: Base=0x00007ffd26480870, Begin=0x00006019ea405760, Size=8, Type=0x4000, Name=v[:1]
omptarget --> Entry  2: Base=0x00006019ea4057a0, Begin=0x00006019ea4057a0, Size=1, Type=0x0, Name=unknown
omptarget --> Entry  3: Base=0x00006019ea4057a0, Begin=0x00006019ea4057a0, Size=12, Type=0x3000000000001, Name=v->s1[0]
omptarget --> Entry  4: Base=0x00006019ea405768, Begin=0x00006019ea4057a0, Size=8, Type=0x4000, Name=v->s1[0]
omptarget --> Entry  5: Base=0x00006019ea4057a0, Begin=0x00006019ea4057a4, Size=4, Type=0x3000000000003, Name=v->s1->b
omptarget --> Entry  6: Base=0x00006019ea4057a0, Begin=0x00006019ea4057a8, Size=4, Type=0x3000000000003, Name=v->s1->c
omptarget --> Entry  7: Base=0x00006019ea405768, Begin=0x00006019ea4057a0, Size=8, Type=0x4000, Name=v->s1->c
omptarget --> Entry  8: Base=0x00006019ea4057c0, Begin=0x00006019ea4057c4, Size=4, Type=0x3, Name=v->s2->b
omptarget --> Entry  9: Base=0x00006019ea405770, Begin=0x00006019ea4057c4, Size=8, Type=0x4000, Name=v->s2->b
I think entry 2 is bogus, particularly the "size 1" bit.
Did you try an older version of the patch? With the current version, I get the following:

It's not impossible... I think that was with hash 087945a (map-ptr-array-section-using-attach-maptype branch). (Edit: yes, user-error on my part here, apologies!)

❯ rg "pragma omp target" llorg-issue141042.cpp                                        
44:/* #pragma omp target map(tofrom: v->s1->b, v->s1->c, v->s2->b) */       
45:#pragma omp target map(to: v[:1]) map(tofrom: v->s1->b, v->s1->c, v->s2->b)

❯ clang -O0 -fopenmp -fopenmp-targets=x86_64 llorg-issue141042.cpp -g && LIBOMPTARGET_DEBUG=1 ./a.out 2>&1 | rg "entry *\d" -i

// For v[:1]
omptarget --> Entry  0: Base=0x00005ecf70c87ad0, Begin=0x00005ecf70c87ad0, Size=24, Type=0x21, Name=v[:1]
omptarget --> Entry  1: Base=0x00007ffcba0be7d8, Begin=0x00005ecf70c87ad0, Size=8, Type=0x4000, Name=v[:1] // ATTACH entry for conditional attachment between v and v[0]

// For v->s1->b and v->s1->c, including the containing structure
omptarget --> Entry  2: Base=0x00005ecf70c87b10, Begin=0x00005ecf70c87b14, Size=8, Type=0x0, Name=unknown
omptarget --> Entry  3: Base=0x00005ecf70c87b10, Begin=0x00005ecf70c87b14, Size=4, Type=0x3000000000003, Name=v->s1->b
omptarget --> Entry  4: Base=0x00005ecf70c87b10, Begin=0x00005ecf70c87b18, Size=4, Type=0x3000000000003, Name=v->s1->c
// A single ATTACH entry (instead of creating one for both v->s1->b and v->s1->c).
// Attaches v->s1 to the contigous pointee that contains v->s1->b and v->s1->c.
omptarget --> Entry  5: Base=0x00005ecf70c87ad8, 
Begin=0x00005ecf70c87b14, Size=8, Type=0x4000, Name=v->s1->c


// For v->s2->b. No containing struct allocation needed as no other member of v->s2[0] is mapped.
omptarget --> Entry  6: Base=0x00005ecf70c87b30, Begin=0x00005ecf70c87b34, Size=4, Type=0x3, Name=v->s2->b
omptarget --> Entry  7: Base=0x00005ecf70c87ae0, Begin=0x00005ecf70c87b34, Size=8, Type=0x4000, Name=v->s2->b // ATTACH

// Attach specific debug logs. 
omptarget --> Processing ATTACH entry 0: HstPtr=0x00007ffcba0be7d8, HstPteeBegin=0x00005ecf70c87ad0, Size=8, Type=0x4000
omptarget --> Skipping ATTACH entry 0: pointer not present on device
omptarget --> Processing ATTACH entry 1: HstPtr=0x00005ecf70c87ad8, HstPteeBegin=0x00005ecf70c87b14, Size=8, Type=0x4000
omptarget --> ATTACH entry 1 processed successfully
omptarget --> Processing ATTACH entry 2: HstPtr=0x00005ecf70c87ae0, HstPteeBegin=0x00005ecf70c87b34, Size=8, Type=0x4000
omptarget --> ATTACH entry 2 processed successfully
omptarget --> Ignoring ATTACH entry 7 in targetDataEnd
omptarget --> Ignoring ATTACH entry 5 in targetDataEnd
omptarget --> Ignoring ATTACH entry 1 in targetDataEnd

I'll go back at look at that in more detail, thank you.

At a high level, this patch series tackles the problem of parsing complex expressions by keying off the attach-pointer component, whereas my (admittedly less developed!) patch series adjusts the partial-struct (overlap) handling code to group parts of the expression by "containing struct" base. That's similar but not quite the same, and I'm not sure how to reconcile the two approaches, or even if that's necessary -- but if it's not, then I'm not sure how to handle cases such as the above.

Aside from grouping containing structures, another reason for grouping component-lists starting with the same variable, by the complexity of their attach-pointers, is that the kernel argument is determined based on whether the pointer is mapped, or its pointee (see #145454, or line clang/lib/CodeGen/CGOpenMPRuntime.cpp:9749 in this PR).

Do you think there are any cases that we won't be able to handle with this approach?

I'm not sure yet.

With this PR, we are able to reuse the existing logic for determining the partial struct, since the grouping of component-lists happens at a higher level. It might still be desirable to update the partial-struct handling if it makes things easier to read/follow. At the same time, it can continue to assume that it's only working with a set of component-lists that will either have no overlap, or belong to the same containing struct.

I'm still working to better understand the original code, and your patch set, so please bear with me!

General questions:

Do you expect each map clause/list item to generate a maximum of one attach operation? (IIRC things get complicated regarding references to pointers.)
For reasons I admittedly haven't entirely understood, generateInfoForComponentList is called three times from two different functions -- from generateAllInfoForClauses (via generateAllInfo, genMapInfo), or twice via generateInfoForCaptureFromComponentLists (via generateInfoForCaptureFromClauseInfo, genMapInfoForCaptures, genMapInfo). The new ATTACH handling bits, sorting by complexity and so on, are mostly only present in the former case (non-capture dependent), but overlapped mapping handling is only present in the latter case (capture-dependent). What's the rationale for that? (I guess the attach operations should take place whether or not the variable is actually used in the target region, for correctness?)
What stops a given clause being processed twice (in non-capture/capture cases)? (Edit: maybe "IsMapInfoExist"?)
Does attachment handling is (at present) inhibit overlapped-member handling? (Edit: or generally, are there unintended interactions between capturing of vars, overlapped members and attachment handling?)

jtb20 · 2025-08-28T17:58:36Z

Here's a test case that might need more complex handling for overlapped mappings:

#include <cstdlib>
#include <cstdio>
#include <cassert>

struct R {
  int d;
  int e;
  int f;
};

struct S {
  R *r0;
  R *r1;
  R *r2;
};

struct T {
  S *s0;
  S *s1;
  S *s2;
};

void print_r(const char *pfx, int num, R *r) {
  fprintf (stderr, "%sr%d->d=%d, %sr%d->e=%d, %sr%d->f=%d\n", pfx, num, r->d, pfx, num, r->e, pfx, num, r->f);
}

void print_s(const char *pfx0, int num, S *s) {
  char pfx[20];
  sprintf (pfx, "%ss%d->", pfx0, num);
  print_r(pfx, 0, s->r0);
  print_r(pfx, 1, s->r1);
  print_r(pfx, 2, s->r2);
  fputc('\n', stderr);
}

//#define NO_OVERLAPS

int main() {
  T *v = (T *) malloc (sizeof(T));
  T w, x, z;
  v->s0 = (S *) malloc (sizeof(S));
  v->s1 = (S *) malloc (sizeof(S));
  v->s2 = (S *) malloc (sizeof(S));

  v->s0->r0 = (R *) calloc (1, sizeof(R));
  v->s0->r1 = (R *) calloc (1, sizeof(R));
  v->s0->r2 = (R *) calloc (1, sizeof(R));

  v->s1->r0 = (R *) calloc (1, sizeof(R));
  v->s1->r1 = (R *) calloc (1, sizeof(R));
  v->s1->r2 = (R *) calloc (1, sizeof(R));

  v->s2->r0 = (R *) calloc (1, sizeof(R));
  v->s2->r1 = (R *) calloc (1, sizeof(R));
  v->s2->r2 = (R *) calloc (1, sizeof(R));

#ifdef NO_OVERLAPS
  #pragma omp target map(to: v->s1, v->s2, v->s1->r1, v->s1->r2, v->s2, v->s2->r0) \
                     map(tofrom: v->s1->r1->d, v->s1->r1->e, v->s1->r2->d, v->s1->r2->f, v->s2->r0->e)
  {
    v->s1->r1->d += 3;
    v->s1->r1->e += 5;
    v->s1->r2->d += 7;
    v->s1->r2->f += 9;
    v->s2->r0->e += 11;
  }
#else
  #pragma omp target map(to: v->s1, v->s2, *v->s1, v->s1->r1, *v->s2, v->s2->r0) \
                     map(tofrom: v->s1->r1->d, v->s1->r1->e, v->s1->r2->d, v->s1->r2->f, v->s2->r0->e)
  {
    v->s1->r1->d += 3;
    v->s1->r1->e += 5;
    v->s1->r2->d += 7;
    v->s1->r2->f += 9;
    v->s2->r0->e += 11;
  }
#endif

  print_s("v->", 0, v->s0);
  print_s("v->", 1, v->s1);
  print_s("v->", 2, v->s2);

  free(v->s0->r0);
  free(v->s0->r1);
  free(v->s0->r2);
  free(v->s1->r0);
  free(v->s1->r1);
  free(v->s1->r2);
  free(v->s2->r0);
  free(v->s2->r1);
  free(v->s2->r2);
  free(v->s0);
  free(v->s1);
  free(v->s2);
  free(v);

  return 0;
}

It works (with this patchset) with NO_OVERLAPS defined, but not with it commented out as shown.

abhinavgaba · 2025-08-29T07:16:00Z

Here's a test case that might need more complex handling for overlapped mappings

Thanks for the test! It helped identify the need to use the appropriate pointer-type rather than void* in the load from attach-ptr-addr, to get the pointee-base-addr. With void*, the size-computation done for the combined-entry creation was resulting in 1, instead of the size of the struct. Please try with the latest commit. It also removes some redundant attachments, as now ATTACH is only emitted once per attach-pointer group. Could you add your tests to offload/test/mapping as end-to-end tests and mark them as XFAIL? Once they are merged, the XFAIL can be removed with this PR.

* Do you expect each map clause/list item to generate a maximum of one attach operation? (IIRC things get complicated regarding references to pointers.)

From OpenMP's perspective, there can only be one ATTACH operation per list-item. If a list-item is a reference variable that's a member of a struct, like:

int x;
int p;
struct S {
  int &xr  = x;
  int &pr = p;
};

map(s.xr) involves an attachment between the ref_ptr(s.xr) and ref_ptee(s.xr), while if we have a map like map(s.pr[1]), the attachment should happen between ref_ptee(s.pr) and &s.pr[1].

A user can map both together, like map(s.pr, s.pr[1]), which would involve two attachments, but in that case, the attach-ptrs for the two would be ref_ptr(s.pr) and ref_ptee(s.pr). One way to enable that would be to have that distinction between ref_ptr/ref_ptee explicit in the component-lists populated in SemaOpenMP, and then update the findAttachPtr to get the expr from the proper component. But that's out of scope for this PR.

* For reasons I admittedly haven't entirely understood, `generateInfoForComponentList` is called three times from two different functions -- from `generateAllInfoForClauses` (via `generateAllInfo`, `genMapInfo`), or twice via `generateInfoForCaptureFromComponentLists` (via `generateInfoForCaptureFromClauseInfo`, `genMapInfoForCaptures`, `genMapInfo`). The new `ATTACH` handling bits, sorting by complexity and so on, are mostly only present in the former case (non-capture dependent), but overlapped mapping handling is only present in the latter case (capture-dependent). What's the rationale for that? (I guess the attach operations should take place whether or not the 
* What stops a given clause being processed twice (in non-capture/capture cases)? (Edit: maybe "IsMapInfoExist"?)

variable is actually used in the target region, for correctness?)

For target constructs, first mapping of "Captured" vars is done. These are the variables referenced inside the body of the construct. This is done via genMapInfoForCaptures, which internally calls genDefaultMapInfo or generateInfoForCaptureFromClauseInfo. genDefaultMapInfo emits a single map, that should happen first, as it needs to decide the kernel argument. So no grouping is required. The grouping of maps is done for generateInfoForCaptureFromClauseInfo as it's working with non-default clauses with list-items that use the captured variable in the map clause expression.

After the captured vars have been handled, the remaining map clauses are handled in genMapInfo, which internally calls generateAllInfoForClauses. This again first groups the list-items by the VarDecl from the component-list, and then by the attach-ptr exprs.

generateAllInfoForClauses also covers codegen for target data/enter-data/exit-data.

* Does attachment handling is (at present) inhibit overlapped-member handling? (Edit: or generally, are there unintended interactions between capturing of vars, overlapped members and attachment handling?)

The attach handling should not affect handling of overlapping members etc, which is handled by the PartialStruct processing. Each attach-ptr-group (list of component-lists with the same attach-ptr) has its own PartialStruct, so the handling of PartialStruct works transparently. Each of the individual components in the list have the correct base-address, which is the address of a load from attach-ptr, which acts as the base address of the containing struct in case of an overlap. See clang/test/OpenMP/target_map_pointer_defalut_mapper_codegen.cpp for some pseudo-code and IR for reference.

abhinavgaba added 30 commits June 23, 2025 20:56

Add another test, minor clang-format fix.

97b2dff

Update the comparator for sort, plus minor additions to offload tests.

a31ea2a

[WIP] Use ATTACH maps for array-sections/subscripts on pointers.

cc90643

[WIP] Commit some more non-debug print changes.

6ba75e1

[NFC][Clang] Refactor mapinfo generation for captured vars on target.

001da09

Handle mapping of captured pointers before any dereference of them.

1ebb7a9

No need to change the sorting since we now handle mapping of pointers…

7799d0a

… before their derefs.

[NFC][Clang] Refactor mapinfo generation for captured vars on target.

9f61909

Minor clang-format fix.

5261578

Minor clang-format fix.

15e7d50

Merge branch 'capture-info-refactor' into tgt-capture-mapped-ptrs-by-ref

9a2fabb

Merge branch 'nfc-lit-test-updates-for-incoming-pr' into tgt-capture-…

78fd5e5

…mapped-ptrs-by-ref

Minor NFC update to a test.

b1540b3

Update tests with the current output.

f1746b6

Merge branch 'tgt-capture-mapped-ptrs-by-ref' into map-ptr-array-sect…

9408873

…ion-using-attach-maptype

Fix findAttachComponent

3ce181f

Fix build issues, keep debug prints

50c8d0e

Fix handling of array-sections in findAttachComponent

c23e01f

Merge remote-tracking branch 'origin/main' into tgt-capture-mapped-pt…

6c2df5c

…rs-by-ref

Fix one test.

9adf9a8

Merge branch 'tgt-capture-mapped-ptrs-by-ref' into map-ptr-array-sect…

ca0c381

…ion-using-attach-maptype

Fix star case. Update some tests.

36f6d9b

Refactor AddAttachEntry

8349a49

Delay addition of attach when we populate PartialStruct.

eaf332f

Update some more tests.

da55bf8

More test updates

fcdc13a

abhinavgaba added 7 commits August 22, 2025 06:06

Clang-format fixes

398a3a6

Add two empty lines.

3e4d99b

Clang-format renamed files as well.

a49143a

Merge remote-tracking branch 'upstream/main' into add-use-device-ptr-…

609a53c

…addr-tests

Merge branch 'add-use-device-ptr-addr-tests' into map-ptr-array-secti…

f7ef799

…on-using-attach-maptype

Remove XFAIL from new uda/udp tests.

b0170f7

Add an implicit map of attach-ptr on a target construct.

63d7c8f

abhinavgaba added 4 commits August 25, 2025 14:31

Merge remote-tracking branch 'upstream/main' into map-ptr-array-secti…

ba9b90c

…on-using-attach-maptype

Update header comment showing various maps in CGOpenMPRuntime.cpp

60c0ac2

Minor NFC, comment changes.

38bf1a0

Fix another obsolete test.

8b18577

abhinavgaba changed the title ~~[OpenMP][WIP] Use ATTACH maps for array-sections/subscripts on pointers.~~ [OpenMP] Use ATTACH map-type for mapping expressions with base-pointers. Aug 26, 2025

abhinavgaba added 4 commits August 26, 2025 17:36

Remove some obsolete code, update some comments.

84f1173

Only add implicit maps for base-ptr for 's.p[0:1]', not for 'ps->p[0:…

6c0f62e

…1]'.

Update comment showing maps in CGOpenMPRuntime.

c5a4be6

Update test for addr error messages for array-section operands.

087945a

abhinavgaba changed the title ~~[OpenMP] Use ATTACH map-type for mapping expressions with base-pointers.~~ [OpenMP][Clang] Use ATTACH map-type for list-items with base-pointers. Aug 27, 2025

abhinavgaba changed the title ~~[OpenMP][Clang] Use ATTACH map-type for list-items with base-pointers.~~ [OpenMP][Clang] Use ATTACH map-type for list-items with base-pointers. Aug 27, 2025

abhinavgaba added a commit to abhinavgaba/llvm-project that referenced this pull request Aug 27, 2025

[NFC][Clang] Add helper functions/utils for finding/comparing attach …

8256e93

…base-ptrs. These have been pulled out of the codegen PR llvm#153683, to reduce the size of that PR.

abhinavgaba mentioned this pull request Aug 27, 2025

[NFC][Clang][OpenMP] Add helper functions/utils for finding/comparing attach base-ptrs. #155625

Open

[Offload] Update LIBOMPTARGET_INFO text for "attach" map-type.

707dfce

Also improves one debug dump regarding pointer-attachment.

abhinavgaba added 2 commits August 28, 2025 17:19

Fix pointer-type used to load from attach-ptr-addr.

2e9dde2

Emit a single attach entry per attach-ptr group.

1cc6abf

Apply clang-format

a361e99

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[OpenMP][Clang] Use `ATTACH` map-type for list-items with base-pointers. #153683

[OpenMP][Clang] Use `ATTACH` map-type for list-items with base-pointers. #153683

Uh oh!

abhinavgaba commented Aug 14, 2025 •

edited

Loading

Uh oh!

jtb20 commented Aug 28, 2025

Uh oh!

abhinavgaba commented Aug 28, 2025

Uh oh!

jtb20 commented Aug 28, 2025 •

edited

Loading

Uh oh!

jtb20 commented Aug 28, 2025

Uh oh!

abhinavgaba commented Aug 29, 2025

Uh oh!

Uh oh!

[OpenMP][Clang] Use ATTACH map-type for list-items with base-pointers. #153683

Are you sure you want to change the base?

[OpenMP][Clang] Use ATTACH map-type for list-items with base-pointers. #153683

Uh oh!

Conversation

abhinavgaba commented Aug 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why a large PR

Grouping of maps based on attach base-pointers

use_device_ptr/addr fixes

Handling of attach-pointers that are members of implicitly mapped structs:

Scope for improvement:

Needs future work:

Uh oh!

jtb20 commented Aug 28, 2025

Uh oh!

abhinavgaba commented Aug 28, 2025

Uh oh!

jtb20 commented Aug 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jtb20 commented Aug 28, 2025

Uh oh!

abhinavgaba commented Aug 29, 2025

Uh oh!

Uh oh!

[OpenMP][Clang] Use `ATTACH` map-type for list-items with base-pointers. #153683

[OpenMP][Clang] Use `ATTACH` map-type for list-items with base-pointers. #153683

abhinavgaba commented Aug 14, 2025 •

edited

Loading

jtb20 commented Aug 28, 2025 •

edited

Loading