Skip to content

Conversation

mizvekov
Copy link
Contributor

@mizvekov mizvekov commented Jul 9, 2025

This is a major change on how we represent nested name qualifications in the AST.

  • The nested name specifier itself and how it's stored is changed. The prefixes for types are handled within the type hierarchy, which makes canonicalization for them super cheap, no memory allocation required. Also translating a type into nested name specifier form becomes a no-op. An identifier is stored as a DependentNameType. The nested name specifier gains a lightweight handle class, to be used instead of passing around pointers, which is similar to what is implemented for TemplateName. There is still one free bit available, and this handle can be used within a PointerUnion and PointerIntPair, which should keep bit-packing aficionados happy.
  • The ElaboratedType node is removed, all type nodes in which it could previously apply to can now store the elaborated keyword and name qualifier, tail allocating when present.
  • TagTypes can now point to the exact declaration found when producing these, as opposed to the previous situation of there only existing one TagType per entity. This increases the amount of type sugar retained, and can have several applications, for example in tracking module ownership, and other tools which care about source file origins, such as IWYU. These TagTypes are lazily allocated, in order to limit the increase in AST size.

This patch offers a great performance benefit.

It greatly improves compilation time for stdexec. For one datapoint, for test_on2.cpp in that project, which is the slowest compiling test, this patch improves -c compilation time by about 7.2%, with the -fsyntax-only improvement being at ~12%.

This has great results on compile-time-tracker as well:
image

This patch also further enables other optimziations in the future, and will reduce the performance impact of template specialization resugaring when that lands.

It has some other miscelaneous drive-by fixes.

About the review: Yes the patch is huge, sorry about that. Part of the reason is that I started by the nested name specifier part, before the ElaboratedType part, but that had a huge performance downside, as ElaboratedType is a big performance hog. I didn't have the steam to go back and change the patch after the fact.

There is also a lot of internal API changes, and it made sense to remove ElaboratedType in one go, versus removing it from one type at a time, as that would present much more churn to the users. Also, the nested name specifier having a different API avoids missing changes related to how prefixes work now, which could make existing code compile but not work.

How to review: The important changes are all in clang/include/clang/AST and clang/lib/AST, with also important changes in clang/lib/Sema/TreeTransform.h.

The rest and bulk of the changes are mostly consequences of the changes in API.

PS: TagType::getDecl is renamed to getOriginalDecl in this patch, just for easier to rebasing. I plan to rename it back after this lands.

Fixes #136624
Fixes #43179
Fixes #68670
Fixes #92757

mizvekov added a commit that referenced this pull request Aug 26, 2025
This makes sure NestedNameSpecifierLocs don't apply to the replacement
TemplateName of SubstTemplate* nodes.

Also removes improper name qualification over these Subst Nodes, causing
some canonical TemplateNames to not be fully qualified when printed.

Since this is a regression introduced in #147835, which was never
released, there are no release notes.

Fixes #155281
mizvekov added a commit that referenced this pull request Aug 26, 2025
… name (#155347)

This fixes a bug in the fast path for the creation of TagTypes from
injected class names.

The creation of TagTypes has a fast path which, when there is no
elaboration, uses storage in the declaration itself for memoizing the
resuling type node, instead of using the folding set.

This memoizing would fail when the type was created from the injected
class name, as we would look for the node in the injected declaration
but store it in the non-injected one, so a different type would be
created each time.

This regression was reported here:
#147835 (comment)

Since this regression was never released, there are no release notes.
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Aug 26, 2025
…ected class name (#155347)

This fixes a bug in the fast path for the creation of TagTypes from
injected class names.

The creation of TagTypes has a fast path which, when there is no
elaboration, uses storage in the declaration itself for memoizing the
resuling type node, instead of using the folding set.

This memoizing would fail when the type was created from the injected
class name, as we would look for the node in the injected declaration
but store it in the non-injected one, so a different type would be
created each time.

This regression was reported here:
llvm/llvm-project#147835 (comment)

Since this regression was never released, there are no release notes.
@zmodem
Copy link
Collaborator

zmodem commented Aug 26, 2025

Heads up, it seems Rust's bindgen (which uses libclang) is broken after this change (https://crbug.com/440975178).

@aeubanks
Copy link
Contributor

aeubanks commented Aug 26, 2025

$ cat /tmp/a.cc
class A;
struct A;
void f(A*) {}
$ clang-old++ -S -o - /tmp/a.cc -O2 -target x86_64-msvc-windows-pc
...
"?f@@YAXPEAVA@@@Z"
...
$ clang-new++ -S -o - /tmp/a.cc -O2 -target x86_64-msvc-windows-pc
...
"?f@@YAXPEAUA@@@Z"
...

notice the V -> U

edit: we apparently had suppressed -Wmismatched-tags which would have caught this...

@mizvekov
Copy link
Contributor Author

@aeubanks

I see, so that's unfortunate MSVC mangles the struct / class difference.
This is a different problem than I had in mind.

So MSVC mangles according to the tag kind of the first declaration, clang after this changes now mangles it according to the declaration found by lookup at the point of use.

Sounds like if we want strict MSVC bug-for-bug conformance, we should change that to use the first declaration then.

@nico
Copy link
Contributor

nico commented Aug 27, 2025

Sounds like if we want strict MSVC bug-for-bug conformance, we should change that to use the first declaration then.

We don't want bug-for-bug conformance, but we do want to be ABI compatible. So we do have to undo the regression in behavior here. (I think this was covered by tests even, right?) So we'd appreciate a fix 🙂

zmodem added a commit to zmodem/rust-bindgen that referenced this pull request Aug 27, 2025
In llvm/llvm-project#147835, Clang's AST changed
so that a TagType's decl will not necessarily refer to the type's
definition, but to the exact declaration which produced the type.

For example, in

  typedef struct S T;
  struct S {
    int x;
  };

the 'struct S' type would refer to the incomplete type decl in the
typedef, causing CompInfo to fail to see the type's definition.

This patch inserts a call to use the definition when available. It fixes
the original test case in
rust-lang#3275 and most of the
test failures caused by the Clang change but not all.
emilio pushed a commit to rust-lang/rust-bindgen that referenced this pull request Aug 27, 2025
In llvm/llvm-project#147835, Clang's AST changed
so that a TagType's decl will not necessarily refer to the type's
definition, but to the exact declaration which produced the type.

For example, in

  typedef struct S T;
  struct S {
    int x;
  };

the 'struct S' type would refer to the incomplete type decl in the
typedef, causing CompInfo to fail to see the type's definition.

This patch inserts a call to use the definition when available. It fixes
the original test case in
#3275 and most of the
test failures caused by the Clang change but not all.
github-merge-queue bot pushed a commit to rust-lang/rust-bindgen that referenced this pull request Aug 27, 2025
In llvm/llvm-project#147835, Clang's AST changed
so that a TagType's decl will not necessarily refer to the type's
definition, but to the exact declaration which produced the type.

For example, in

  typedef struct S T;
  struct S {
    int x;
  };

the 'struct S' type would refer to the incomplete type decl in the
typedef, causing CompInfo to fail to see the type's definition.

This patch inserts a call to use the definition when available. It fixes
the original test case in
#3275 and most of the
test failures caused by the Clang change but not all.
@mizvekov
Copy link
Contributor Author

mizvekov commented Aug 27, 2025

(I think this was covered by tests even, right?)

No, this patch didn't change any existing mangling tests at all, the behavior in question here was untested.

mizvekov added a commit that referenced this pull request Aug 27, 2025
This fixes a regression reported here: #147835 (comment)

Since this regression was never released, there are no release notes.
mizvekov added a commit that referenced this pull request Aug 27, 2025
…155662)

This fixes a regression reported here:
#147835 (comment)

Since this regression was never released, there are no release notes.
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Aug 27, 2025
… tagkind (#155662)

This fixes a regression reported here:
llvm/llvm-project#147835 (comment)

Since this regression was never released, there are no release notes.
@bolshakov-a
Copy link
Contributor

We don't want bug-for-bug conformance, but we do want to be ABI compatible.

It really looks like an MSVC bug. If its mangling depends on whether the first (forward-)declaration of a class in a translation unit uses class or struct class-key, the TUs cannot be linked together in general.

@mizvekov
Copy link
Contributor Author

The MSVC mangling issue is resolved here: #155662

mizvekov added a commit that referenced this pull request Aug 27, 2025
This makes the type printer not qualify constructor and destructor names.
These are represented as canonical types and the type printer is used,
but unlike canonical types which we normally print as fully qualified,
the expected behaviour for declaration names is for them to be unqualified.

Note that this restores the behaviour pre #147835, but that
is still broken for the constructor names of class templates,
since in that case the injected class name type is used, but
here the type printer is configured to also print the implicit
template arguments.

No release notes since this regression was never released.

Fixes #155537
@kimgr
Copy link
Contributor

kimgr commented Aug 27, 2025

We don't want bug-for-bug conformance, but we do want to be ABI compatible.

It really looks like an MSVC bug. If its mangling depends on whether the first (forward-)declaration of a class in a translation unit uses class or struct class-key, the TUs cannot be linked together in general.

I guess this is why MSVC has this warning: https://stackoverflow.com/q/468486

mizvekov added a commit that referenced this pull request Aug 27, 2025
This makes the type printer not qualify constructor and destructor
names. These are represented as canonical types and the type printer is
used, but unlike canonical types which we normally print as fully
qualified, the expected behaviour for declaration names is for them to
be unqualified.

Note that this restores the behaviour pre #147835, but that is still
broken for the constructor names of class templates, since in that case
the injected class name type is used, but here the type printer is
configured to also print the implicit template arguments.

No release notes since this regression was never released.

Fixes #155537
@rjmccall
Copy link
Contributor

I mean, it's an MSVC bug in the sense that it's non-conformant behavior, but it's non-conformant behavior that has long since become a cornerstone of the MSVC ABI, so the bugginess is no longer relevant. We are obliged to maintain compatibility with it if we want to say we implement the ABI correctly.

@bolshakov-a
Copy link
Contributor

bolshakov-a commented Aug 28, 2025

Yeah, the not-to-break-ABI consideration is obviously the only reason why they haven't fixed it so far. But this makes dangerous to mix declarations with different tag kinds despite it is allowed by the standard, and hence MSVC has a warning on that as @kimgr has noticed. So probably, there isn't much sense for strict correspondence in that case because it might be said that this is user's code problem... Anyway, it is fixed now.

@nico
Copy link
Contributor

nico commented Aug 28, 2025

FWIW, clang has a warning like that too, it's -Wmismatched-tags (part of -Wmost, which is part of -Wall).

emilio pushed a commit to rust-lang/rust-bindgen that referenced this pull request Aug 31, 2025
In llvm/llvm-project#147835, Clang's AST changed
so that a TagType's decl will not necessarily refer to the type's
definition, but to the exact declaration which produced the type.

For example, in

  typedef struct S T;
  struct S {
    int x;
  };

the 'struct S' type would refer to the incomplete type decl in the
typedef, causing CompInfo to fail to see the type's definition.

This patch inserts a call to use the definition when available. It fixes
the original test case in
#3275 and most of the
test failures caused by the Clang change but not all.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:AArch64 backend:AMDGPU backend:ARC backend:ARM backend:CSKY backend:Hexagon backend:Lanai backend:loongarch backend:MIPS backend:PowerPC backend:RISC-V backend:Sparc backend:SystemZ backend:WebAssembly backend:X86 clang:analysis clang:as-a-library libclang and C++ API clang:bytecode Issues for the clang bytecode constexpr interpreter clang:codegen IR generation bugs: mangling, exceptions, etc. clang:dataflow Clang Dataflow Analysis framework - https://clang.llvm.org/docs/DataFlowAnalysisIntro.html clang:frontend Language frontend issues, e.g. anything involving "Sema" clang:modules C++20 modules and Clang Header Modules clang:openmp OpenMP related changes to Clang clang:static analyzer clang Clang issues not falling into any other category clang-tidy clang-tools-extra clangd coroutines C++20 coroutines debuginfo HLSL HLSL Language Support libc++ libc++ C++ Standard Library. Not GNU libstdc++. Not libc++abi. lldb
Projects
None yet