Skip to content

Conversation

yxsamliu
Copy link
Collaborator

@yxsamliu yxsamliu commented Sep 4, 2025

This change lets non-const lvalue refs bind directly to a vector element (like v[0]) to match GCC. vector remains rejected. Writes through the reference update the correct lane.

Motivation: Make builtin vector types behave more like normal C++ vectors so they work in more use cases. In HIP, vector types are structs to stay compatible with CUDA, but users often cast to builtin vector types for better performance. Allowing references to vector elements helps these patterns and reduces friction when mixing struct-based HIP vectors with builtin vectors.

Sema: In TryReferenceInitializationCore, treat a vector element as referenceable when the initializer is a vector element and the element type is not bool. This enables binding without creating a temporary and leaves vector unchanged.

CodeGen: When a reference is initialized from a vector element, record the VectorElt LValue at init time and reuse it for later uses. Loads and stores then lower to extractelement and insertelement on the base vector. Non-vector cases keep the old path to preserve existing IR.

Note: Taking the address of such a reference is still not supported.

Fixes: SWDEV-548106

@llvmbot llvmbot added clang Clang issues not falling into any other category clang:frontend Language frontend issues, e.g. anything involving "Sema" clang:codegen IR generation bugs: mangling, exceptions, etc. labels Sep 4, 2025
@llvmbot
Copy link
Member

llvmbot commented Sep 4, 2025

@llvm/pr-subscribers-clang

@llvm/pr-subscribers-clang-codegen

Author: Yaxun (Sam) Liu (yxsamliu)

Changes

This change lets non-const lvalue refs bind directly to a vector element (like v[0]) to match GCC. vector<bool> remains rejected. Writes through the reference update the correct lane.

Motivation: Make builtin vector types behave more like normal C++ vectors so they work in more use cases. In HIP, vector types are structs to stay compatible with CUDA, but users often cast to builtin vector types for better performance. Allowing references to vector elements helps these patterns and reduces friction when mixing struct-based HIP vectors with builtin vectors.

Sema: In TryReferenceInitializationCore, treat a vector element as referenceable when the initializer is a vector element and the element type is not bool. This enables binding without creating a temporary and leaves vector<bool> unchanged.

CodeGen: When a reference is initialized from a vector element, record the VectorElt LValue at init time and reuse it for later uses. Loads and stores then lower to extractelement and insertelement on the base vector. Non-vector cases keep the old path to preserve existing IR.

Note: Taking the address of such a reference is still not supported.

Fixes: SWDEV-548106


Full diff: https://github.com/llvm/llvm-project/pull/156891.diff

8 Files Affected:

  • (modified) clang/lib/CodeGen/CGDecl.cpp (+10)
  • (modified) clang/lib/CodeGen/CGExpr.cpp (+5)
  • (modified) clang/lib/CodeGen/CodeGenFunction.h (+4)
  • (modified) clang/lib/Sema/SemaInit.cpp (+7-1)
  • (added) clang/test/CodeGenCUDA/non-const-lval-ref-vector-elem.cu (+18)
  • (added) clang/test/CodeGenCXX/non-const-lval-ref-vector-elem.cpp (+58)
  • (added) clang/test/SemaCXX/non-const-lval-ref-vector-elem.cpp (+23)
  • (modified) clang/test/SemaCXX/references.cpp (+13-4)
diff --git a/clang/lib/CodeGen/CGDecl.cpp b/clang/lib/CodeGen/CGDecl.cpp
index 29193e0c541b9..475b0ce11b633 100644
--- a/clang/lib/CodeGen/CGDecl.cpp
+++ b/clang/lib/CodeGen/CGDecl.cpp
@@ -2095,6 +2095,16 @@ void CodeGenFunction::EmitExprAsInit(const Expr *init, const ValueDecl *D,
   QualType type = D->getType();
 
   if (type->isReferenceType()) {
+    // Only special-case when actually binding to a vector element.
+    if (init->refersToVectorElement()) {
+      LValue SrcLV = EmitLValue(init);
+      if (SrcLV.isVectorElt()) {
+        if (const auto *VD = dyn_cast<VarDecl>(D)) {
+          VectorEltRefBindings[VD] = SrcLV;
+          return; // Uses of the reference will reload this LV.
+        }
+      }
+    }
     RValue rvalue = EmitReferenceBindingToExpr(init);
     if (capturedByInit)
       drillIntoBlockVariable(*this, lvalue, cast<VarDecl>(D));
diff --git a/clang/lib/CodeGen/CGExpr.cpp b/clang/lib/CodeGen/CGExpr.cpp
index 26fba751e6f9d..7e74da5de7cd1 100644
--- a/clang/lib/CodeGen/CGExpr.cpp
+++ b/clang/lib/CodeGen/CGExpr.cpp
@@ -3203,6 +3203,11 @@ static bool canEmitSpuriousReferenceToVariable(CodeGenFunction &CGF,
 }
 
 LValue CodeGenFunction::EmitDeclRefLValue(const DeclRefExpr *E) {
+  if (const auto *VD = dyn_cast<VarDecl>(E->getDecl())) {
+    auto It = VectorEltRefBindings.find(VD);
+    if (It != VectorEltRefBindings.end())
+      return It->second;
+  }
   const NamedDecl *ND = E->getDecl();
   QualType T = E->getType();
 
diff --git a/clang/lib/CodeGen/CodeGenFunction.h b/clang/lib/CodeGen/CodeGenFunction.h
index 123cb4f51f828..b09f2bb1a9ecd 100644
--- a/clang/lib/CodeGen/CodeGenFunction.h
+++ b/clang/lib/CodeGen/CodeGenFunction.h
@@ -1750,6 +1750,10 @@ class CodeGenFunction : public CodeGenTypeCache {
   /// The last regular (non-return) debug location (breakpoint) in the function.
   SourceLocation LastStopPoint;
 
+  /// Remember bindings for references initialized from vector elements.
+  llvm::DenseMap<const clang::VarDecl *, clang::CodeGen::LValue>
+      VectorEltRefBindings;
+
 public:
   /// Source location information about the default argument or member
   /// initializer expression we're evaluating, if any.
diff --git a/clang/lib/Sema/SemaInit.cpp b/clang/lib/Sema/SemaInit.cpp
index c97129336736b..70b6e174df95a 100644
--- a/clang/lib/Sema/SemaInit.cpp
+++ b/clang/lib/Sema/SemaInit.cpp
@@ -5379,7 +5379,13 @@ static void TryReferenceInitializationCore(Sema &S,
   OverloadingResult ConvOvlResult = OR_Success;
   bool T1Function = T1->isFunctionType();
   if (isLValueRef || T1Function) {
-    if (InitCategory.isLValue() && !isNonReferenceableGLValue(Initializer) &&
+    // Allow direct binding to vector elements (except for vector<bool>
+    // elements) to match GCC. Treat vector elements as referenceable lvalues
+    // for non-bool element types.
+    bool AllowVectorElementRef =
+        Initializer->refersToVectorElement() && !T2->isBooleanType();
+    if (InitCategory.isLValue() &&
+        (!isNonReferenceableGLValue(Initializer) || AllowVectorElementRef) &&
         (RefRelationship == Sema::Ref_Compatible ||
          (Kind.isCStyleOrFunctionalCast() &&
           RefRelationship == Sema::Ref_Related))) {
diff --git a/clang/test/CodeGenCUDA/non-const-lval-ref-vector-elem.cu b/clang/test/CodeGenCUDA/non-const-lval-ref-vector-elem.cu
new file mode 100644
index 0000000000000..2d03eaa269ade
--- /dev/null
+++ b/clang/test/CodeGenCUDA/non-const-lval-ref-vector-elem.cu
@@ -0,0 +1,18 @@
+// RUN: %clang_cc1 -emit-llvm %s -o - -fcuda-is-device -triple nvptx-unknown-unknown | FileCheck %s
+// RUN: %clang_cc1 -emit-llvm %s -o - -fcuda-is-device -triple amdgcn | FileCheck %s
+
+#include "Inputs/cuda.h"
+typedef double __attribute__((vector_size(32))) native_double4;
+
+struct alignas(32) double4_struct {
+    double x,y,z,w;
+    __device__ operator native_double4& () { return (native_double4&)(*this); }
+};
+
+__device__ void test_write(double4_struct& x, int i) {
+  x[i] = 1;
+}
+
+__device__ void test_read(double& y, double4_struct& x, int i) {
+  y = x[i];
+}
diff --git a/clang/test/CodeGenCXX/non-const-lval-ref-vector-elem.cpp b/clang/test/CodeGenCXX/non-const-lval-ref-vector-elem.cpp
new file mode 100644
index 0000000000000..f52397ef051ea
--- /dev/null
+++ b/clang/test/CodeGenCXX/non-const-lval-ref-vector-elem.cpp
@@ -0,0 +1,58 @@
+// RUN: %clang_cc1 -triple x86_64-pc-linux-gnu \
+// RUN:   -emit-llvm %s -o - | FileCheck %s
+
+using v4i = int __attribute__((ext_vector_type(4)));
+
+struct v4i_s { int x, y, z, w; };
+
+// CHECK-LABEL: define dso_local noundef i32 @_Z7f_basicv()
+// CHECK: entry:
+// CHECK:   %v = alloca <4 x i32>, align 16
+// CHECK:   %r = alloca ptr, align 8
+// CHECK:   store <4 x i32> <i32 1, i32 2, i32 3, i32 4>, ptr %v, align 16
+// CHECK:   %0 = load <4 x i32>, ptr %v, align 16
+// CHECK:   %vecins = insertelement <4 x i32> %0, i32 7, i32 0
+// CHECK:   store <4 x i32> %vecins, ptr %v, align 16
+// CHECK:   %1 = load <4 x i32>, ptr %v, align 16
+// CHECK:   %vecext = extractelement <4 x i32> %1, i32 0
+// CHECK:   ret i32 %vecext
+int f_basic() {
+  v4i v = {1, 2, 3, 4};
+  int &r = v[0];
+  r = 7;
+  return v[0];
+}
+
+// CHECK-LABEL: define dso_local noundef i32 @_Z8f_varidxRDv4_ii(
+// CHECK: entry:
+// CHECK:   %v.addr = alloca ptr, align 8
+// CHECK:   %i.addr = alloca i32, align 4
+// CHECK:   %r = alloca ptr, align 8
+// CHECK:   store ptr %v, ptr %v.addr, align 8
+// CHECK:   store i32 %i, ptr %i.addr, align 4
+// CHECK:   %0 = load ptr, ptr %v.addr, align 8
+// CHECK:   %1 = load i32, ptr %i.addr, align 4
+// CHECK:   %2 = load <4 x i32>, ptr %0, align 16
+// CHECK:   %vecext = extractelement <4 x i32> %2, i32 %1
+// CHECK:   %add = add nsw i32 %vecext, 1
+// CHECK:   %3 = load <4 x i32>, ptr %0, align 16
+// CHECK:   %vecins = insertelement <4 x i32> %3, i32 %add, i32 %1
+// CHECK:   store <4 x i32> %vecins, ptr %0, align 16
+// CHECK:   %4 = load ptr, ptr %v.addr, align 8
+// CHECK:   %5 = load <4 x i32>, ptr %4, align 16
+// CHECK:   %6 = load i32, ptr %i.addr, align 4
+// CHECK:   %vecext1 = extractelement <4 x i32> %5, i32 %6
+// CHECK:   ret i32 %vecext1
+int f_varidx(v4i &v, int i) {
+  int &r = v[i];
+  r = r + 1;
+  return v[i];
+}
+
+int cast_ref_read(v4i_s &v, int i) {
+  return ((v4i&)v)[i];
+}
+
+int cast_ptr_read(v4i_s *v, int i) {
+  return (*((v4i*)v))[i];
+}
\ No newline at end of file
diff --git a/clang/test/SemaCXX/non-const-lval-ref-vector-elem.cpp b/clang/test/SemaCXX/non-const-lval-ref-vector-elem.cpp
new file mode 100644
index 0000000000000..f8eab2a73dd70
--- /dev/null
+++ b/clang/test/SemaCXX/non-const-lval-ref-vector-elem.cpp
@@ -0,0 +1,23 @@
+// RUN: %clang_cc1 -triple x86_64-pc-linux-gnu -fsyntax-only \
+// RUN:   -verify %s
+
+using v4i = int __attribute__((ext_vector_type(4)));
+using v4b = bool __attribute__((ext_vector_type(4)));
+
+void ok_int_ref() {
+  v4i v = {1, 2, 3, 4};
+  int &r0 = v[0]; // ok
+  int &r3 = v[3]; // ok
+  (void)r0;
+  (void)r3;
+}
+
+void bad_bool_ref(v4b vb) {
+  bool &br = vb[1]; // expected-error {{non-const reference cannot bind to vector element}}
+  (void)br;
+}
+
+void ok_const_bool_ref(v4b vb) {
+  const bool &cr = vb[2]; // ok: binds to a temporary
+  (void)cr;
+}
diff --git a/clang/test/SemaCXX/references.cpp b/clang/test/SemaCXX/references.cpp
index 7ef3f43ff55ae..96366f91c0aa4 100644
--- a/clang/test/SemaCXX/references.cpp
+++ b/clang/test/SemaCXX/references.cpp
@@ -146,12 +146,21 @@ void test10() {
   typedef __attribute__(( ext_vector_type(4) )) int ext_vec4;
   
   vec4 v;
-  int &a = v[0]; // expected-error{{non-const reference cannot bind to vector element}}
-  const int &b = v[0];
+  int &a = v[0]; // ok
+  const int &b = v[0]; // ok
   
   ext_vec4 ev;
-  int &c = ev.x; // expected-error{{non-const reference cannot bind to vector element}}
-  const int &d = ev.x;
+  int &c = ev.x; // ok
+  const int &d = ev.x; // ok
+}
+
+void test11() {
+  __attribute((vector_size(16))) typedef bool bvec4; // expected-error {{invalid vector element type 'bool'}}
+  typedef __attribute__(( ext_vector_type(4) )) bool ext_bvec4;
+
+  ext_bvec4 ev;
+  bool &c = ev.x; // expected-error {{illegal vector component name 'x'}}
+  const bool &d = ev.x; // expected-error {{illegal vector component name 'x'}}
 }
 
 namespace PR7149 {

…<bool>)

This change lets int& bind directly to a vector element (like v[0]) to match
GCC. vector<bool> remains rejected. Writes through the reference update the
correct lane.

Motivation: Make builtin vector types behave more like normal C++ vectors so
they work in more use cases. In HIP, vector types are structs to stay
compatible with CUDA, but users often cast to builtin vector types for better
performance. Allowing references to vector elements helps these patterns and
reduces friction when mixing struct-based HIP vectors with builtin vectors.

Sema: In TryReferenceInitializationCore, treat a vector element as
referenceable when the initializer is a vector element and the element type is
not bool. This enables binding without creating a temporary and leaves
vector<bool> unchanged.

CodeGen: When a reference is initialized from a vector element, record the
VectorElt LValue at init time and reuse it for later uses. Loads and stores
then lower to extractelement and insertelement on the base vector. Non-vector
cases keep the old path to preserve existing IR.

Note: Taking the address of such a reference is still not supported.

Fixes: SWDEV-548106
Copy link
Contributor

@rjmccall rjmccall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I imagine that GCC allows this by just binding the reference to the address of the element in the vector. The way you're implementing this, we have to introduce the concept of a non-addressable reference, which would be a major change.

But in general, I'm fine with this; I don't think we really have a good reason to not allow references to bind to vector elements. We can still prefer to keep vectors as aggregates in CodeGen whenever it's reasonably possible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:codegen IR generation bugs: mangling, exceptions, etc. clang:frontend Language frontend issues, e.g. anything involving "Sema" clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants