Skip to content

Conversation

balazs-benics-sonarsource
Copy link
Contributor

@balazs-benics-sonarsource balazs-benics-sonarsource commented Sep 3, 2025

When calculating the offset of a FieldRegion, we need to find out which field index the given field refers to.
Previously, if for some reason the field was not found, then the Idx passed to Layout.getFieldOffset was out of bounds and caused undefined behavior when dereferenced an out of bounds element in ASTVector::FieldOffsets::operator[], which asserts this in debug builds, but exposes the undefined behavior in release builds.

In this patch, I refactored how we enumerate the fields, and gracefully handle the scenario where the field is not found.
That case is still bad, but at least it should not expose the undefined behavior in release builds, and should assert earlier in debug builds than before.

The motivational case was transformed into a regression test, that would fail if no canonicalization would happen when creating a FieldRegion. This was reduced from a production crash.
In the test case, due to how modules work, there would be multiple copies of the same template specialization in the AST. This could lead into inconsistent state when the FieldRegion's Decl was different to the RecordDecl's field - because one referred to the first and the other to the second. This made calculateOffset fail to compute the field index, triggering the undefined behavior in production.

While this inconsistency gets fixed now, I think the assertion is still warranted to avoid potential undefined behavior in release builds.

CPP-6691,CPP-6849

When calculating the offset of a FieldRegion, we need to find out which
field index the given field refers to.
Previously, if for some reason the field was not found, then the `Idx`
passed to `Layout.getFieldOffset` was out of bounds and caused undefined
behavior when dereferenced an out of bounds element in
`ASTVector::FieldOffsets::operator[]`, which asserts this in debug
builds, but exposes the undefined behavior in release builds.

In this patch, I refactored how we enumerate the fields, and gracefully
handle the scenario where the field is not found.
That case is still bad, but at least it should not expose the undefined
behavior in release builds, and should assert earlier in debug builds
than before.

The motivational case was transformed into a regression test, that would
fail if no canonicalization would happen when creating a FieldRegion.
This was reduced from a production crash.
In the test case, due to how modules work, there would be multiple
copies of the same template specialization in the AST.
This could lead into inconsistent state when the FieldRegion's Decl was
different to the RecordDecl's field - because one referred to the first
and the other to the second. This made `calculateOffset` fail to compute
the field index, triggering the undefined behavior in production.

While this inconsistency gets fixed now, I think the assertion is still
warranted to avoid potential undefined behavior in release builds.

CPP-6691,CPP-6849

Co-authored-by: Marco Borgeaud <marco.borgeaud@sonarsource.com>
@llvmbot llvmbot added the clang Clang issues not falling into any other category label Sep 3, 2025
@llvmbot
Copy link
Member

llvmbot commented Sep 3, 2025

@llvm/pr-subscribers-clang

@llvm/pr-subscribers-clang-static-analyzer-1

Author: Balázs Benics (balazs-benics-sonarsource)

Changes

When calculating the offset of a FieldRegion, we need to find out which field index the given field refers to.
Previously, if for some reason the field was not found, then the Idx passed to Layout.getFieldOffset was out of bounds and caused undefined behavior when dereferenced an out of bounds element in ASTVector::FieldOffsets::operator[], which asserts this in debug builds, but exposes the undefined behavior in release builds.

In this patch, I refactored how we enumerate the fields, and gracefully handle the scenario where the field is not found.
That case is still bad, but at least it should not expose the undefined behavior in release builds, and should assert earlier in debug builds than before.

The motivational case was transformed into a regression test, that would fail if no canonicalization would happen when creating a FieldRegion. This was reduced from a production crash.
In the test case, due to how modules work, there would be multiple copies of the same template specialization in the AST. This could lead into inconsistent state when the FieldRegion's Decl was different to the RecordDecl's field - because one referred to the first and the other to the second. This made calculateOffset fail to compute the field index, triggering the undefined behavior in production.

While this inconsistency gets fixed now, I think the assertion is still warranted to avoid potential undefined behavior in release builds.

CPP-6691,CPP-6849


Full diff: https://github.com/llvm/llvm-project/pull/156668.diff

2 Files Affected:

  • (modified) clang/lib/StaticAnalyzer/Core/MemRegion.cpp (+15-8)
  • (added) clang/test/Analysis/modules/explicit-templ-inst-crash-in-modules.cppm (+35)
diff --git a/clang/lib/StaticAnalyzer/Core/MemRegion.cpp b/clang/lib/StaticAnalyzer/Core/MemRegion.cpp
index 5f271963e3d09..f68fb1e6df759 100644
--- a/clang/lib/StaticAnalyzer/Core/MemRegion.cpp
+++ b/clang/lib/StaticAnalyzer/Core/MemRegion.cpp
@@ -1271,7 +1271,7 @@ const SymbolicRegion *MemRegionManager::getSymbolicHeapRegion(SymbolRef Sym) {
 const FieldRegion*
 MemRegionManager::getFieldRegion(const FieldDecl *d,
                                  const SubRegion* superRegion){
-  return getSubRegion<FieldRegion>(d, superRegion);
+  return getSubRegion<FieldRegion>(d->getCanonicalDecl(), superRegion);
 }
 
 const ObjCIvarRegion*
@@ -1704,16 +1704,23 @@ static RegionOffset calculateOffset(const MemRegion *R) {
       if (SymbolicOffsetBase)
         continue;
 
-      // Get the field number.
-      unsigned idx = 0;
-      for (RecordDecl::field_iterator FI = RD->field_begin(),
-             FE = RD->field_end(); FI != FE; ++FI, ++idx) {
-        if (FR->getDecl() == *FI)
-          break;
+      auto MaybeFieldIdx = [FR, RD]() -> std::optional<unsigned> {
+        assert(FR->getDecl()->getCanonicalDecl() == FR->getDecl());
+        for (auto [Idx, Field] : llvm::enumerate(RD->fields())) {
+          if (FR->getDecl() == Field->getCanonicalDecl())
+            return Idx;
+        }
+        return std::nullopt;
+      }();
+
+      if (!MaybeFieldIdx.has_value()) {
+        assert(false && "Field not found");
+        goto Finish; // Invalid offset.
       }
+
       const ASTRecordLayout &Layout = R->getContext().getASTRecordLayout(RD);
       // This is offset in bits.
-      Offset += Layout.getFieldOffset(idx);
+      Offset += Layout.getFieldOffset(MaybeFieldIdx.value());
       break;
     }
     }
diff --git a/clang/test/Analysis/modules/explicit-templ-inst-crash-in-modules.cppm b/clang/test/Analysis/modules/explicit-templ-inst-crash-in-modules.cppm
new file mode 100644
index 0000000000000..6eec29c7187ba
--- /dev/null
+++ b/clang/test/Analysis/modules/explicit-templ-inst-crash-in-modules.cppm
@@ -0,0 +1,35 @@
+// RUN: rm -rf %t
+// RUN: mkdir -p %t
+// RUN: split-file %s %t
+//
+// DEFINE: %{common-flags}= -std=c++20 -I %t -fprebuilt-module-path=%t
+//
+// RUN: %clang_cc1 %{common-flags} %t/other.cppm -emit-module-interface -o %t/other.pcm
+// RUN: %clang_analyze_cc1 -analyzer-checker=core %{common-flags} %t/entry.cppm -verify
+
+
+//--- MyStruct.h
+template <typename T> struct MyStruct {
+  T data = 0;
+};
+template struct MyStruct<int>; // Explicit template instantiation.
+
+//--- other.cppm
+module;
+#include "MyStruct.h"
+export module other;
+static void implicit_instantiate_MyStruct() {
+  MyStruct<int> var;
+  (void)var;
+}
+
+//--- entry.cppm
+// expected-no-diagnostics
+module;
+#include "MyStruct.h"
+module other;
+
+void entry_point() {
+  MyStruct<int> var; // no-crash
+  (void)var;
+}

Copy link
Contributor

@NagyDonat NagyDonat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The patch LGTM, it is always good to see a fix for a crash.

I added two very minor stylistic suggestions in inline comments, but I think the change is acceptable even without those changes.

Copy link
Contributor

@NagyDonat NagyDonat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates!

@balazs-benics-sonarsource balazs-benics-sonarsource merged commit 9971844 into llvm:main Sep 4, 2025
9 checks passed
@balazs-benics-sonarsource balazs-benics-sonarsource deleted the bb/upstream-cpp-6849-calc-offset-oob-ub branch September 4, 2025 06:20
@llvm-ci
Copy link
Collaborator

llvm-ci commented Sep 4, 2025

LLVM Buildbot has detected a new failure on builder llvm-clang-aarch64-darwin running on doug-worker-4 while building clang at step 6 "test-build-unified-tree-check-all".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/190/builds/26685

Here is the relevant piece of the build log for the reference
Step 6 (test-build-unified-tree-check-all) failure: test (failure)
******************** TEST 'lld :: MinGW/driver.test' FAILED ********************
Exit Code: 127

Command Output (stdout):
--
# RUN: at line 1
/Volumes/RAMDisk/buildbot-root/aarch64-darwin/build/bin/ld.lld -### foo.o -m i386pe 2>&1 | /Volumes/RAMDisk/buildbot-root/aarch64-darwin/build/bin/FileCheck -check-prefix=X86 /Users/buildbot/buildbot-root/llvm-project/lld/test/MinGW/driver.test
# executed command: /Volumes/RAMDisk/buildbot-root/aarch64-darwin/build/bin/ld.lld '-###' foo.o -m i386pe
# note: command had no output on stdout or stderr
# executed command: /Volumes/RAMDisk/buildbot-root/aarch64-darwin/build/bin/FileCheck -check-prefix=X86 /Users/buildbot/buildbot-root/llvm-project/lld/test/MinGW/driver.test
# note: command had no output on stdout or stderr
# RUN: at line 7
echo "-### foo.o -m i386pe" > /Volumes/RAMDisk/buildbot-root/aarch64-darwin/build/tools/lld/test/MinGW/Output/driver.test.tmp.rsp
# executed command: echo '-### foo.o -m i386pe'
# note: command had no output on stdout or stderr
# RUN: at line 8
/Volumes/RAMDisk/buildbot-root/aarch64-darwin/build/bin/ld.lld @/Volumes/RAMDisk/buildbot-root/aarch64-darwin/build/tools/lld/test/MinGW/Output/driver.test.tmp.rsp 2>&1 | /Volumes/RAMDisk/buildbot-root/aarch64-darwin/build/bin/FileCheck -check-prefix=X86 /Users/buildbot/buildbot-root/llvm-project/lld/test/MinGW/driver.test
# executed command: /Volumes/RAMDisk/buildbot-root/aarch64-darwin/build/bin/ld.lld @/Volumes/RAMDisk/buildbot-root/aarch64-darwin/build/tools/lld/test/MinGW/Output/driver.test.tmp.rsp
# note: command had no output on stdout or stderr
# executed command: /Volumes/RAMDisk/buildbot-root/aarch64-darwin/build/bin/FileCheck -check-prefix=X86 /Users/buildbot/buildbot-root/llvm-project/lld/test/MinGW/driver.test
# note: command had no output on stdout or stderr
# RUN: at line 10
/Volumes/RAMDisk/buildbot-root/aarch64-darwin/build/bin/ld.lld -### foo.o -m i386pep 2>&1 | /Volumes/RAMDisk/buildbot-root/aarch64-darwin/build/bin/FileCheck -check-prefix=X64 /Users/buildbot/buildbot-root/llvm-project/lld/test/MinGW/driver.test
# executed command: /Volumes/RAMDisk/buildbot-root/aarch64-darwin/build/bin/ld.lld '-###' foo.o -m i386pep
# note: command had no output on stdout or stderr
# executed command: /Volumes/RAMDisk/buildbot-root/aarch64-darwin/build/bin/FileCheck -check-prefix=X64 /Users/buildbot/buildbot-root/llvm-project/lld/test/MinGW/driver.test
# note: command had no output on stdout or stderr
# RUN: at line 16
/Volumes/RAMDisk/buildbot-root/aarch64-darwin/build/bin/ld.lld -### foo.o -m thumb2pe 2>&1 | /Volumes/RAMDisk/buildbot-root/aarch64-darwin/build/bin/FileCheck -check-prefix=ARM /Users/buildbot/buildbot-root/llvm-project/lld/test/MinGW/driver.test
# executed command: /Volumes/RAMDisk/buildbot-root/aarch64-darwin/build/bin/ld.lld '-###' foo.o -m thumb2pe
# note: command had no output on stdout or stderr
# executed command: /Volumes/RAMDisk/buildbot-root/aarch64-darwin/build/bin/FileCheck -check-prefix=ARM /Users/buildbot/buildbot-root/llvm-project/lld/test/MinGW/driver.test
# note: command had no output on stdout or stderr
# RUN: at line 22
/Volumes/RAMDisk/buildbot-root/aarch64-darwin/build/bin/ld.lld -### foo.o -m arm64pe 2>&1 | /Volumes/RAMDisk/buildbot-root/aarch64-darwin/build/bin/FileCheck -check-prefix=ARM64 /Users/buildbot/buildbot-root/llvm-project/lld/test/MinGW/driver.test
# executed command: /Volumes/RAMDisk/buildbot-root/aarch64-darwin/build/bin/ld.lld '-###' foo.o -m arm64pe
# note: command had no output on stdout or stderr
# executed command: /Volumes/RAMDisk/buildbot-root/aarch64-darwin/build/bin/FileCheck -check-prefix=ARM64 /Users/buildbot/buildbot-root/llvm-project/lld/test/MinGW/driver.test
# note: command had no output on stdout or stderr
# RUN: at line 28
/Volumes/RAMDisk/buildbot-root/aarch64-darwin/build/bin/ld.lld -### foo.o -m arm64ecpe 2>&1 | /Volumes/RAMDisk/buildbot-root/aarch64-darwin/build/bin/FileCheck -check-prefix=ARM64EC /Users/buildbot/buildbot-root/llvm-project/lld/test/MinGW/driver.test
# executed command: /Volumes/RAMDisk/buildbot-root/aarch64-darwin/build/bin/ld.lld '-###' foo.o -m arm64ecpe
# note: command had no output on stdout or stderr
# executed command: /Volumes/RAMDisk/buildbot-root/aarch64-darwin/build/bin/FileCheck -check-prefix=ARM64EC /Users/buildbot/buildbot-root/llvm-project/lld/test/MinGW/driver.test
# note: command had no output on stdout or stderr
# RUN: at line 34
/Volumes/RAMDisk/buildbot-root/aarch64-darwin/build/bin/ld.lld -### foo.o -m arm64xpe 2>&1 | /Volumes/RAMDisk/buildbot-root/aarch64-darwin/build/bin/FileCheck -check-prefix=ARM64X /Users/buildbot/buildbot-root/llvm-project/lld/test/MinGW/driver.test
# executed command: /Volumes/RAMDisk/buildbot-root/aarch64-darwin/build/bin/ld.lld '-###' foo.o -m arm64xpe
# note: command had no output on stdout or stderr
# executed command: /Volumes/RAMDisk/buildbot-root/aarch64-darwin/build/bin/FileCheck -check-prefix=ARM64X /Users/buildbot/buildbot-root/llvm-project/lld/test/MinGW/driver.test
...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:static analyzer clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants