Skip to content

Conversation

aemerson
Copy link
Contributor

[NFC] [clangd] [Modules] remove dot in log

The dot in the log makes it hard to copy and execute the commands from
the log. Remove it.

[clangd] [C++20 Modules] Add --debug-modules-builder to not remove built module files on exit

In practice I found the option is very helpful for me to understand what
happens when clangd's C++20 modules support fails. With '--log=verbose',
I can rerun the command by clangd to understand what's going wrong
actually.

The documentation or adding the option to '--help' list can be done
separately.

Fix test added in #155148 work with Windows style path separators. (#155354)

Should fix Windows build bot failures such as
https://lab.llvm.org/buildbot/#/builders/46/builds/22281.

The test (and the followup fix in #155303) did not properly account for
Windows style path separators.

[libc++] Add a release note about multi{map,set}::find not returning the first element anymore (#155252)

We've modified the algorithm of __tree::find in #152370, which can
change the return value. Since we're always returned the lower bound
before some users started relying on it. This patch adds a release note
so users are aware that this might break their code.

[libc++][C++03] Split libc++-specific tests for the frozen headers (#144093)

The C++03 headers are essentially a separate implementation, so it
doesn't make a ton of sense to try to test two implementations with a
single set of implementation-specific tests.

This patch doesn't copy over any tests that will not be run in C++03
mode. The most notable changes are that lit.local.cfg files are
touched to change the path from libcxx/test/libcxx to
libcxx/test/libcxx-03 in a few places.

This also modifies lit.local.cfg files to run libcxx/test/libcxx-03
only when using the frozen headers and lbcxx/test/libcxx tests only
when not using the frozen headers.

This is part of
https://discourse.llvm.org/t/rfc-freezing-c-03-headers-in-libc.

[libc++] Remove a few incorrect _LIBCPP_EXPORTED_FROM_ABI annotations (#132602)

This has two benefits:

  • It is less likely that the macro will be copy-pasted around when
    unnecessary
  • We can drop _LIBCPP_HIDE_FROM_ABI from any member functions once we
    are able to make _LIBCPP_HIDE_FROM_ABI the default within libc++

[lldb] Fix a warning

This patch fixes:

lldb/unittests/Protocol/ProtocolMCPServerTest.cpp:285:14: error:
unused variable 'mutex' [-Werror,-Wunused-variable]

[libc++][NFC] Wrap lines in ReleaseNotes/22.rst (#155359)

Some of the lines in ReleaseNotes/22.rst are (significantly) longer
than our usual 120 column limit. This wraps all lines in the file so
they are never more than our usual limit.

[flang] Disable loop interchange by default (#155279)

Disable loop interchange by default, while keeping the ability to
explicitly enable using -floop-interchange. This matches Clang.

See discussion on #140182.

[X86] Fix spill issue for fr16 (#155225)

When avx512fp16 is not available, we use MOVSS to spill fr16/fr16x
register.
However The MOVSSmr require fr32 register class and MOVSSrm require
vr128
register class which cause bad instruction detected by machine verifier.
To fix the issue this patch is to create a pseudo instruction MOVSHP for
fr16 register spilling. MOVSHP is expanded to MOVSS or VMOVSSZ depending
on the register number.


Co-authored-by: Yuanke Luo ykluo@birentech.com

[mlir][emitc] Fix bug in ApplyOp translation (#155171)

The translator emits emitc.apply incorrectly when the op is part of an
expression, as it prints the name of the operand instead of calling
emitOperand() which takes into account the expression being emitted,
leaving out the part of the expression feeding this op, e.g.

func.func @foo(%a: i32, %p: !emitc.ptr<i32>) -> i32 {
  %c = emitc.expression : i32 {
    %e = emitc.sub %p, %a : (!emitc.ptr<i32>, i32) -> !emitc.ptr<i32>
    %d = emitc.apply "*"(%e) : (!emitc.ptr<i32>) -> i32
    emitc.yield %d : i32
  }
  return %c : i32
}

translates to:

int32_t foo(int32_t v1, int32_t* v2) {
  int32_t v3 = *v4;
  return v3;
}

instead of:

int32_t foo(int32_t v1, int32_t* v2) {
  int32_t v3 = *(v2 - v1);
  return v3;
}

[clang][test] Add a RUN line for the bytecode interpreter (#155363)

This test works with the bytecode interpreter, so add some additional
testing.

[mlir][scf] Expose isPerfectlyNestedForLoops (#152115)

The function isPerfectlyNestedForLoops is useful on its own and so I'm
exposing it for downstream use.

[NFC] Remove out dated comment for clear-ast-before-backend

The comment is outdated since d0a5f61

[clang][DebugInfo][test] Move debug-info tests from CodeGenObjCXX to DebugInfo directory (#154912)

This patch works towards consolidating all Clang debug-info into the
clang/test/DebugInfo directory

(https://discourse.llvm.org/t/clang-test-location-of-clang-debug-info-tests/87958).

Here we move only the clang/test/CodeGenObjCXX tests.

The list of files i came up with is:

  1. searched for anything with *debug-info* in the filename
  2. searched for occurrences of debug-info-kind in the tests

[LV] Remove use of llc from vectoriser tests (#154759)

There were 5 X86 loop vectoriser tests that were piping the output from
opt into llc. I think in the directory test/Transforms/LoopVectorize we
should only be testing the output from the loop vectoriser pass. Any
codegen tests should live in test/CodeGen/X86 instead.

avx512.ll: it looks like we were really just testing that we generate
the right vector length.
fp32_to_uint32-cost-model.ll/fp64_to_uint32-cost-model.ll: the tests
only seem to care that we're not scalarising the fptoui, so I've
modified the test to check for vector ops. I've assumed there are
already codegen tests for fptoui vector operations.
vectorization-remarks-loopid-dbg.ll: i've copied this test to
CodeGen/X86/vectorization-remarks-loopid-dbg.ll for the llc RUN line
variant
vectorization-remarks.ll: seems to test the same thing as
vectorization-remarks-loopid-dbg.ll

[MLIR][TOSA] Add missing SameOperandsAndResultShape Trait to tosa.cast (#153826)

According to the TOSA spec, tosa.cast is only changing the elementtype,
and not the shape of the input tensor

Signed-off-by: Rickert, Jonas jonas.rickert@amd.com

[ComplexDeinterleaving] Use LLVM ADTs (NFC) (#154754)

This swaps out STL types for their LLVM equivalents. This is recommended
in the LLVM coding standards: https://llvm.org/docs/CodingStandards.html#c-standard-library

[LV] Stop using the legacy cost model for udiv + friends (#152707)

In VPWidenRecipe::computeCost for the instructions udiv, sdiv, urem and
srem we fall back on the legacy cost unnecessarily. At this point we
know that the vplan must be functionally correct, i.e. if the
divide/remainder is not safe to speculatively execute then we must have
either:

  1. Scalarised the operation, in which case we wouldn't be using a
    VPWidenRecipe, or
  2. We've inserted a select for the second operand to ensure we don't
    fault through divide-by-zero.

For 2) it's necessary to add the select operation to
VPInstruction::computeCost so that we mirror the cost of the legacy cost
model. The only problem with this is that we also generate selects in
vplan for predicated loops with reductions, which aren't accounted for
in the legacy cost model. In order to prevent asserts firing I've also
added the selects to precomputeCosts to ensure the legacy costs match
the vplan costs for reductions.

[libc++][C++03] Remove XFAILs from the non-frozen libc++-specific tests (#144101)

The tests in libcxx/test/libcxx aren't run against the frozen headers
anymore, so we can remove any XFAILs in them.

This is part of
https://discourse.llvm.org/t/rfc-freezing-c-03-headers-in-libc.

[clang][bytecode][NFC] Check InitializingBlocks in _within_lifetime (#155378)

This kind of check is exactly why InterpState::InitializingBlocks
exists.

[libc++][C++03] Fix tests which only fail due to incorrect includes (#144110)

Quite a few of the frozen header tests only fail because the include
path is incorrect due to copying the headers. This patch fixes the tests
where that's the only problem.

This is part of
https://discourse.llvm.org/t/rfc-freezing-c-03-headers-in-libc.

[LV] Return Invalid from getLegacyCost when instruction cost forced. (#154543)

LoopVectorizationCostModel::expectedCost will only override the cost
returned by getInstructionCost when valid. This patch ensures we do
the same in VPCostContext::getLegacyCost, avoiding the "VPlan cost
model and legacy cost model disagreed" assert in the included test.

[libc++][C++03] Fix a bunch of random tests (#144117)

This fixes/removes a bunch of random tests. They all failed in
relatively simple to fix ways.

Specificially (all inside libcxx/test/libcxx-03):

  • utilities/template.bitset/includes.pass.cpp: the header guards have
    different names now (guard names fixed)
  • utilities/meta/is_referenceable.compile.pass.cpp: The name changed
    from __libcpp_is_referenceable (reverted name)
  • utilities/function.objects/refwrap/desugars_to.compile.pass.cpp:
    Optimization has been added after the header split (test removed)
  • type_traits/is_replaceable.compile.pass.cpp: __is_replacable_v has
    been added after the header split (test removed)
  • type_traits/is_constant_evaluated.pass.cpp: Ran C++11 code
    accidentally (C++11 test parts removed)
  • type_traits/desugars_to.compile.pass.cpp: Optimization has been
    added after the header split (test removed)
  • numerics/bit.ops.pass.cpp: Tried to include header which doesn't
    exist (removed include and related code which wasn't executed in C++03)
  • experimental/fexperimental-library.compile.pass.cpp: This test is
    irrelevant for C++03, since there are no C++03 experimental features
    (test removed)
  • containers/container_traits.compile.pass.cpp: container_traits
    have been introduced after the header split (test removed)

[OpenMPIRBuilder] Fix tripcount not a multiple of tile size (#154999)

The emitted code tests whether the current tile should executing the
remainder iterations by checking the logical iteration number is the one
after the floor iterations that execute the non-remainder iterations.
There are two counts of how many iterations there are: Those of
non-remainder iterations (simply rounded-down division of tripcount and
tile size), and those including an additional floor iteration for the
remainder iterations. The code was used the wrong one that caused the
condition to never match.

[VPlan][RISC-V] Add test case for #154103

This has now been fixed by #152707

[clang-repl] Delegate CodeGen related operations for PTU to IncrementalParser (#137458)

Read discussion : #136404 (comment)
and the following comments for context

Motivation

  1. IncrementalAction is designed to keep Frontend statealive across
    inputs. As per the docstring: “IncrementalAction ensures it keeps its
    underlying action's objects alive as long as the IncrementalParser needs
    them.”
  2. To align responsibilities with that contract, the parser layer (host:
    IncrementalParser, device: IncrementalCUDADeviceParser) should
    manage PTU registration and module generation, while the interpreter
    orchestrates at a higher level.

What this PR does

  1. Moves CodeGen surfaces behind IncrementalAction:
    GenModule(), getCodeGen(), and the cached “first CodeGen module” now
    live in IncrementalAction.

  2. Moves PTU ownership to the parser layer:
    Adds IncrementalParser::RegisterPTU(…) (and device counterpart)

  3. Add device-side registration in IncrementalCUDADeviceParser.

  4. Remove Interpreter::{getCodeGen, GenModule, RegisterPTU}.

[TableGen][DecoderEmitter] Remove no longer needed MaxFilterWidth (NFC) (#155382)

11c6158 made the variable redundant.
Also remove Target, which is apparently unused.

[mlir][SCFToOpenMP] Use walk pattern driver (#155242)

The lowering pattern uses various APIs that are not supported in a
dialect conversion such as Block::eraseArguments and
RewriterBase::replaceAllUsesWith. Switch to the more efficient and
simpler walk pattern driver.

[LV] Add early-exit test where the inner loop IV depends on outer loop.

[MLIR] Apply clang-tidy fixes for misc-use-internal-linkage in DialectTransform.cpp (NFC)

[MLIR] Apply clang-tidy fixes for performance-unnecessary-value-param in DialectTransform.cpp (NFC)

[gn build] Port 2ab4c28

[flang][OpenMP] move omp end directive validation to semantics (#154739)

The old parse tree errors quckly exploded to thousands of unhelpful
lines when there were multiple missing end directives (e.g. #90452).

Instead I've added a flag to the parse tree indicating when a missing
end directive needs to be diagnosed, and moved the error messages to
semantics (where they are a lot easier to control).

This has the disadvantage of not displaying the error if there were
other parse errors, but there is a precedent for this approach (e.g.
parsing atomic constructs).

[mlir][MemRef] Address TODO to use early_inc to simplify elimination of uses (NFC) (#155123)

[MLIR][EmitC] Bugfix in emitc.call_opaque operand emission (#153980)

The operand emission needed the operand to be in scope which lead to
failure when the emitc.call_opaque is in an emitc.expression's body.

[lldb] Fix spacing in "proccess plugin packet monitor" help

[SCEVExp] Check if getPtrToIntExpr resulted in CouldNotCompute.

This fixes a crash trying to use SCEVCouldNotCompute, if getPtrToIntExpr
failed.

Fixes #155287

[MLIR] Apply clang-tidy fixes for performance-unnecessary-value-param in ExecutionEngineModule.cpp (NFC)

[flang][OpenMP] move omp end sections validation to semantics (#154740)

See #90452. The old parse tree errors exploded to thousands of unhelpful
lines when there were multiple missing end directives.

Instead, allow a missing end directive in the parse tree then validate
that it is present during semantics (where the error messages are a lot
easier to control).

[clang][bytecode][NFC] Use Pointer::initializeAllElements() in Program (#155391)

We just initialized the entire string, so use this function instead.

[Offload] Full AMD support for olMemFill (#154958)

[mlir][vector] Fix crashes in from_elements folder + broadcast verifier (#155393)

This PR fixes two crashes / failures.

  1. The vector.broadcast verifier did not take into account
    VectorElementTypeInterface and was looking for int/index/float types.
  2. The vector.from_elements folder attempted to create an invalid
    DenseElementsAttr. Only int/float/index/complex types are supported.

[clang][bytecode][NFC] Check hasTrivialDtor() in RunDestructors (#155381)

We do this when calling Free() on dynamically allocated memory.

AMDGPU: Stop checking if registers are reserved in adjustAllocatableRegClass (#155125)

This function is used to implement TargetInstrInfo::getRegClass and
conceptually should not depend on the dynamic state of the function.

[RISCV][NFC] Fix typo v32 -> v31 in document (#155389)

[VPlan] Replace EVL branch condition with (branch-on-count AVLNext, 0) (#152167)

This changes the branch condition to use the AVL's backedge value
instead of the EVL-based IV.

This allows us to emit bnez on RISC-V and removes a use of the trip
count, which should reduce register pressure.

To match phis with VPlanPatternMatch I've had to relax the assert that
the number of operands must exactly match the pattern for the Phi
opcode, and I've copied over m_ZExtOrSelf from the LLVM IR
PatternMatch.h.

Fixes #151459

[MLIR] Apply clang-tidy fixes for llvm-include-order in IRAffine.cpp (NFC)

[MLIR] Apply clang-tidy fixes for performance-unnecessary-value-param in IRAffine.cpp (NFC)

[MLIR] Apply clang-tidy fixes for llvm-else-after-return in IRAttributes.cpp (NFC)

[clang][bytecode] Support remaining add_sat like X86 builtins (#155358)

[RelLookupTableConverter] Generate test checks (NFC)

This was using a mix of generated check lines and manual edits,
which makes future changes hard. Regenerate with a newer version
and --check-globals.

[clang][bytecode] Try to avoid dtor functions in Record descriptors (#155396)

We don't need to call the dtor fn of a record where all bases, fields
and virtual bases have no dtor fn either.

[AArch64] Expand MI->getOperand(1).getImm() with 0 literal (#154598)

MI->getOperand(1).getImm() has already been verified to be 0 entering
the block.

[VPlan] Compute cost of replicating calls in VPlan. (NFCI) (#154291)

Implement computing the scalarization overhead for replicating calls in
VPlan, matching the legacy cost model.

Depends on #154126.

PR: #154291

[X86] Show failure to fold freeze(gfni()) -> gfni(freeze(),freeze()) for all gfni instructions

[InstCombine] Generate test checks (NFC)

[llvm-exegesis] Implement the loop repetition mode for AArch64 (#154751)

Subject says it all: implement the loop iterator decrement and jump
function functions, and reserve X19 for the loop counter.

[GWP-ASan] Include <unistd.h> for sysconf(_SC_PAGESIZE) (#155261)

This fixes build failures on Fuchsia that started with #153860

[VPlan] Improve style around container-inserts (NFC) (#155174)

[MLIR] Apply clang-tidy fixes for performance-unnecessary-value-param in IRAttributes.cpp (NFC)

[MLIR] Apply clang-tidy fixes for misc-use-internal-linkage in IRCore.cpp (NFC)

[flang][OpenMP] Delete no longer used Omp[End]CriticalDirective, NFC (#155099)

[Headers][X86] Allow AVX512VLBW integer reduction intrinsics to be used in constexpr (#155199)

Fixes #154284

Add constexpr support for the following:

_mm_reduce_add_epi8 _mm_reduce_add_epi16 _mm256_reduce_add_epi8
_mm256_reduce_add_epi16 _mm_reduce_mul_epi8 _mm_reduce_mul_epi16
_mm256_reduce_mul_epi8 _mm256_reduce_mul_epi16 _mm_reduce_and_epi8
_mm_reduce_and_epi16 _mm256_reduce_and_epi8 _mm256_reduce_and_epi16
_mm_reduce_or_epi8 _mm_reduce_or_epi16 _mm256_reduce_or_epi8
_mm256_reduce_or_epi16

_mm_mask_reduce_add_epi8 _mm_mask_reduce_add_epi16
_mm256_mask_reduce_add_epi8 _mm256_mask_reduce_add_epi16
_mm_mask_reduce_mul_epi8 _mm_mask_reduce_mul_epi16
_mm256_mask_reduce_mul_epi8 _mm256_mask_reduce_mul_epi16
_mm_mask_reduce_and_epi8 _mm_mask_reduce_and_epi16
_mm256_mask_reduce_and_epi8 _mm256_mask_reduce_and_epi16
_mm_mask_reduce_or_epi8 _mm_mask_reduce_or_epi16
_mm256_mask_reduce_or_epi8 _mm256_mask_reduce_or_epi16

_mm_reduce_max_epi8 _mm_reduce_max_epi16 _mm256_reduce_max_epi8
_mm256_reduce_max_epi16 _mm_reduce_min_epi8 _mm_reduce_min_epi16
_mm256_reduce_min_epi8 _mm256_reduce_min_epi16 _mm_reduce_max_epu8
_mm_reduce_max_epu16 _mm256_reduce_max_epu8 _mm256_reduce_max_epu16
_mm_reduce_min_epu8 _mm_reduce_min_epu16 _mm256_reduce_min_epu8
_mm256_reduce_min_epu16

_mm_mask_reduce_max_epi8 _mm_mask_reduce_max_epi16
_mm256_mask_reduce_max_epi8 _mm256_mask_reduce_max_epi16
_mm_mask_reduce_min_epi8 _mm_mask_reduce_min_epi16
_mm256_mask_reduce_min_epi8 _mm256_mask_reduce_min_epi16
_mm_mask_reduce_max_epu8 _mm_mask_reduce_max_epu16
_mm256_mask_reduce_max_epu8 _mm256_mask_reduce_max_epu16
_mm_mask_reduce_min_epu8 _mm_mask_reduce_min_epu16
_mm256_mask_reduce_min_epu8 _mm256_mask_reduce_min_epu16

[Clang] Generate test checks (NFC)

This test was already using generated test checks, but with minor
manual adjustments. Make it fully generated, as check lines for
metadata are supported nowadays.

[Offload][Conformance] Add README file (#155190)

This patch introduces a README.md file for the GPU math conformance
test suite located in offload/unittests/Conformance.

The goal of this document is to provide clear and thorough instructions
for new users and future contributors. It covers the project's purpose,
system requirements, build and execution steps, testing methodology, and
overall architecture.

[Clang] Support generic bit counting builtins on fixed boolean vectors (#154203)

Summary:
Boolean vectors as implemented in clang can be bit-casted to an integer
that is rounded up to the next primitive sized integer. Users can do
this themselves, but since the counting bits are very likely to be used
with bitmasks like this and the generic forms are expected to be
generic it seems reasonable that we handle this case directly.

[X86] canCreateUndefOrPoisonForTargetNode - add GF2P8AFFINEINVQB / GF2P8AFFINEQB / GF2P8MULB handling (#155409)

All 3 instructions are well defined bit twiddling operations - they do
not introduce undef/poison with well defined inputs.

Fixes regressions in #152107

[NFC][SimplifyCFG] Simplify operators for the combined predicate in mergeConditionalStoreToAddress (#155058)

This is about code readability. The operands in the disjunction forming the combined predicate in mergeConditionalStoreToAddress could sometimes be negated twice. This patch addresses that.

2 tests needed updating because they exposed the double negation and now they don’t.

[libc++][C++03][NFC] Remove XFAILS from libcxx/test/libcxx (#155384)

We've split the implementation-specific tests into
libcxx/test/libcxx-03, so we don't need the annotations in
libcxx/test/libcxx anymore.

[lldb][lldb-dap] parse pathFormat as an optional (#155238)

pathFormat is an optional field in initializeAruguments.

[libc++][C++03] Fix test/libcxx-03/system_reserved_names.gen.py (#155385)

This test only fails because it includes <__config>. Switch to using
<__cxx03/__config> instead to fix the issue.

[libc++] Refactor key extraction for __hash_table and __tree (#154512)

This patch replaces __can_extract_key with an overload set to try to
extract the key. This simplifies the code, since we don't need to have
separate overload sets for the unordered and associative containers. It
also allows extending the set of extraction cases more easily, since we
have a single place to define how the key is extracted.

Revert "[llvm-exegesis] Implement the loop repetition mode for AArch64" (#155423)

I see some build bot failures:

Revert #154751 while I investigate this.

[gn build] Port af1f06e

[LLDB] Re-land 'Update DIL handling of array subscripting' (#154269)

This attempts to fix the issues with the original PR (#151605), updating
the DIL code for handling array subscripting to more closely match and
handle all the casees from the original 'frame var' implementation. The
first PR did not include special-case code for objc pointers, which
apparently caused a test failure on the green-dragon buildbot. Hopefully
this PR, which includes the objc pointer special code, fixes that issue.

AMDGPU: Replace copy-to-mov-imm folding logic with class compat checks (#154501)

This strengthens the check to ensure the new mov's source class
is compatible with the source register. This avoids using the register
sized based checks in getMovOpcode, which don't quite understand
AV superclasses correctly. As a side effect it also enables more folds
into true16 movs.

getMovOpcode should probably be deleted, or at least replaced
with class check based logic. In this particular case other
legality checks need to be mixed in with attempted IR changes,
so I didn't try to push all of that into the opcode selection.

[VPlan] Add VPlan-based addMinIterCheck, replace ILV for non-epilogue. (#153643)

This patch adds a new VPlan-based addMinimumIterationCheck, which
replaced the ILV version for the non-epilogue case.

The VPlan-based version constructs a SCEV expression to compute the
minimum iterations, use that to check if the check is known true or
false. Otherwise it creates a VPExpandSCEV recipe and emits a
compare-and-branch.

When using epilogue vectorization, we still need to create the minimum
trip-count-check during the legacy skeleton creation. The patch moves
the definitions out of ILV.

PR: #153643

X86: Remove LOW32_ADDR_ACCESS_RBPRegClass (#155127)

[lldb] Underline short option letters as mnemonics (#153695)

Whenever an option would use something other than the first letter of
the long option as the short option, Jim would capitalized the letter we
picked as a mnemonic. This has often been mistaken for a typo and Jim
wondered if we should stop doing this.

During the discussion, David mentioned how this reminds him of the
underline in menu bars when holding down alt. I suggested we do
something similar in LLDB by underlying the letter in the description.

https://discourse.llvm.org/t/should-we-remove-the-capital-letter-in-option-helps/87816

s390x: pattern match saturated truncation (#155377)

Simplify min/max instruction matching by making the related
SelectionDAG operations legal.

Add patterns to match (signed and unsigned) saturated
truncation based on open-coded min/max patterns.

Fixes #153655

[clang][bytecode] Cleanup primitive descriptor ctor/dtor handling (#155401)

Use switches instead of if statements and COMPOSITE_TYPE_SWITCH and
remove some leftover move functions.

Revert "[AMDGPU] gfx1250 trans instructions bf16 codegen tests update. NFC (#155310)"

This reverts commit 43a9b66. Was causing
ninja check-llvm failures on x86 host.

Reapply "[RISCV] Add test coverage for upcoming change to zicond select lowering""

This was reverted because a previous version had check lines which didn't
match tip of tree. Looking back through my terminal history, I'm 99% sure
this was a failure to update after a pull, but the diff itself looks
suspicious like other user error. I've run ninja check-llvm on this one
multiple times. :)

[clang-format] Fix a bug in SkipMacroDefinitionBody (#155346)

All comments before the macro definition body should be skipped.

[RISCV] Add tied source operand to Zvqdotq MC instructions. (#155286)

This is consistent with what we do for integer and FP multiply
accumulate instructions.

We need new classes because normal multiply accumulate have the operands
in a different order.

[X86] Use array instead of SmallVector. NFC (#155321)

[TableGen][DecoderEmitter] Optimize single-case OPC_ExtractField (#155414)

OPC_ExtractField followed by a single OPC_FilterValue is equivalent to
OPC_CheckField. Optimize this relatively common case.

Revert "[libc++] Refactor key extraction for __hash_table and __tree (#154512)"

This reverts commit af1f06e.

This is causing some build failures in premerge as some of the LLDB
tests fail.

[gn build] Port 72c04bb

[OpenACC] Add C tests for recipe generation, fix NYI

I realized while messing with other things that I'd written all of the
recipe tests for C++, so this patch adds a bunch of tests for C mode.
The assert wasn't quite accurate (as C default init doesn't really do
anything/have an AST node), so that is corrected. Also, the lack of
cir.copy causes some of the firstprivate tests to be incomplete, so
added TODOs for that as well.

[clang-format] Use proper flags for git diff-tree (#155247)

From local testing, git diff-tree does not support three dot diffs
correctly, instead expecting the --merge-base flag to be passed along
with two commits. From my reading, the documentation
(https://git-scm.com/docs/git-diff-tree) also confirms this. This patch
updates the git-clang-format script to be correct.

I don't think we ever ran into this issue before because we never ended
up using it. For the PR code format job I believe we would just
explicitly pass the merge base, completely bypassing the problem.

[HLSL][DirectX] Add the Qdx-rootsignature-strip driver option (#154454)

This pr adds the Qstrip-rootsignature as a DXC driver option.

To do so, this pr introduces the BinaryModifyJobClass as an Action
to modify a produced object file before its final output.

Further, it registers llvm-objcopy as the tool to modify a produced
DXContainer on the HLSL toolchain.

This allows us to specify the Qstrip-rootsignature option to
clang-dxc which will invoke llvm-objcopy with a
--remove-section=RTS0 argument to implement its functionality.

Resolves: #150275.

[DirectX] Fix the writing of ConstantExpr GEPs to DXIL bitcode (#154446)

Fixes #153304

Changes:

  • When writing ConstantExpr GEPs to DXIL bitcode, the bitcode writer
    will use the old Constant Code CST_CODE_CE_GEP_OLD = 12 instead of the
    newer CST_CODE_CE_GEP = 32 which is interpreted as an undef in DXIL.
    Additional context: CST_CODE_CE_GEP = 12 in
    DXC

    while the same constant code is labeled CST_CODE_CE_GEP_OLD in
    LLVM
  • Modifies the PointerTypeAnalysis to be able to analyze pointer-typed
    constants that appear in the operands of instructions so that the
    correct type of the ConstantExpr GEP is determined and written into
    the DXIL bitcode.
  • Adds a PointerTypeAnalysis test and dxil-dis test to ensure that the
    pointer type of ConstantExpr GEPs are resolved and ConstantExpr GEPs
    are written to DXIL bitcode correctly

In addition, this PR also adds a missing call to
GV.removeDeadConstantUsers() in the DXILFinalizeLinkage pass, and
removes an unnecessary manual removal of a ConstantExpr in the
DXILFlattenArrays pass.

[clang-tidy][test] Make check_clang_tidy.py work with very long file paths (#155318)

http://github.com/llvm/llvm-project/pull/95220 added a test with a very
long file path, which can fail if run on Windows with a long directory
path.

On Windows, there are file path length limits, which can be worked
around by prefixing the (absolute) path with '\?':
https://learn.microsoft.com/en-us/windows/win32/fileio/maximum-file-path-limitation


Co-authored-by: Reid Kleckner rnk@google.com

[AArch64] AArch64TargetLowering::computeKnownBitsForTargetNode - add support for AArch64ISD::MOV/MVN constants (#154039)

Add support for the following constant nodes in
AArch64TargetLowering::computeKnownBitsForTargetNode:

  case AArch64ISD::MOVIedit:
  case AArch64ISD::MOVImsl:
  case AArch64ISD::MVNIshift:
  case AArch64ISD::MVNImsl:

Also add AArch64TargetLowering::computeKnownBitsForTargetNode tests
for all the MOVI constant nodes in
llvm/unittests/Target/AArch64/AArch64SelectionDAGTest.cpp

Fixes: #153159


Co-authored-by: Simon Pilgrim llvm-dev@redking.me.uk

[Interpreter] Fix a warning

This patch fixes:

clang/lib/Interpreter/IncrementalAction.h:37:21: error: private
field 'CI' is not used [-Werror,-Wunused-private-field]

[NFC][DirectX] Fix variable set but not used warning (#155445)

[compiler-rt] Fix a warning

This patch fixes:

compiler-rt/lib/asan/tests/asan_test.cpp:398:27: error: allocation
of insufficient size '0' for type 'int' with size '4'
[-Werror,-Walloc-size]

[NFC][MC][XCore] Rearrange decoder functions for XCore disassembler (#155009)

Rearrange decode functions to be before including the generated
disassembler code and eliminate forward declarations for most of them.
This is possible because fieldFromInstruction is now in MCDecoder.h
and not in the generated disassembler code.

[flang] optimize sind precision (#155429)

Part of #150452.

[NFC][MC][ARM] Rearrange decode functions in ARM disassembler (#154988)

Move tryAddingSymbolicOperand and tryAddingPcLoadReferenceComment to
before including the generated disassembler code. This is in preparation
for rearranging the decoder functions to eliminate forward declarations.

[LV] Remove unused ILV::VectorTripCount (NFC).

The field is no longer used, remove it.

[NFC][Asan] Fix warning in test (#155447)

After #150028.

Warning:

asan_test.cpp:398:27: error: allocation of insufficient size '0' for type 'int' with size '4'

[CI] Save sccache logs (#155444)

This patch saves the sccache logs to the artifacts. If sccache dies and
the server prints logs, we currently do not collect them anywhere and
they do not get dumped to STDOUT/STDERR. If the process is directly
getting killed (SIGTERM), it seems like it doesn't dump anything, but in
most other cases we should be able to see something.

Related to #155442.

[NFC][DirectX] Fix build failure (#155441)

Add BinaryFormat to LINK_COMPONENTS to fix the following linker
error:

ld.lld: error: undefined symbol: llvm::dxbc::getRootParameterTypes()
>>> referenced by DXILRootSignature.cpp
>>>               lib/Target/DirectX/CMakeFiles/LLVMDirectXCodeGen.dir/DXILRootSignature.cpp.o:(llvm::dxil::RootSignatureAnalysisPrinter::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&))

ld.lld: error: undefined symbol: llvm::dxbc::getShaderVisibility()
>>> referenced by DXILRootSignature.cpp
>>>               lib/Target/DirectX/CMakeFiles/LLVMDirectXCodeGen.dir/DXILRootSignature.cpp.o:(llvm::dxil::RootSignatureAnalysisPrinter::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&))
>>> referenced by DXILRootSignature.cpp
>>>               lib/Target/DirectX/CMakeFiles/LLVMDirectXCodeGen.dir/DXILRootSignature.cpp.o:(llvm::dxil::RootSignatureAnalysisPrinter::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&))

Root cause: #154249 changed a
header-only dependency to a real dependency without noticing that the
dependency was missing in CMakeLists.txt

Bitcode: Stop combining function alignments into MaxAlignment.

MaxAlignment is used to produce the abbreviation for MODULE_CODE_GLOBALVAR
and is not used for anything related to function alignments, so stop
combining function alignments and rename it to make its purpose clearer.

Reviewers: teresajohnson

Reviewed By: teresajohnson

Pull Request: #155341

[SCEV] Try to push op into ZExt: C * zext (A + B) -> zext (AC + BC) (#155300)

Try to push constant multiply operand into a ZExt containing an add, if
possible. In general we are trying to push down ops through ZExt if
possible. This is similar to
#151227 which did the same for
additions.

For now this is restricted to adds with a constant operand, which is
similar to some of the logic above.

This enables some additional simplifications.

Alive2 Proof: https://alive2.llvm.org/ce/z/97pbSL

PR: #155300

[CIR] Add VTTAddrPointOp (#155048)

This adds the definition, verification, and lowering for CIR's
VTTAddrPointOp. This is a bit ahead of the current codegen
implementation, which doesn't yet have support for emitting VTT
definitions, but since this doesn't depend on any of the other work in
progress, it is being upstreamed in advance.

[RISCV][VLOPT] Update vl-opt-op-info.mir test with extra COPYs. NFC

[Transforms] Allow non-regex Source in SymbolRewriter in case of using ExplicitRewriteDescriptor (#154319)

Do not check that Source is a valid regex in case of Target (explicit)
transformation. Source may contain special symbols that may cause an
incorrect invalid regex error.

Note that source and exactly one of [Target, Transform] must be
provided.

Target (explicit transformation): In this kind of rule Source is
treated as a symbol name and is matched in its entirety. Target field
will denote the symbol name to transform to.

Transform (pattern transformation): This rule treats Source as a
regex that should match the complete symbol name. Transform is a regex
specifying the name to transform to.

[MLIR] Apply clang-tidy fixes for modernize-use-using in IRCore.cpp (NFC)

[MLIR] Apply clang-tidy fixes for performance-move-const-arg in IRCore.cpp (NFC)

[MLIR] Apply clang-tidy fixes for readability-identifier-naming in IRCore.cpp (NFC)

[clang] NFC: introduce Type::getAsEnumDecl, and cast variants for all TagDecls (#155463)

And make use of those.

These changes are split from prior PR #155028, in order to decrease the
size of that PR and facilitate review.

[Flang] Fix BUILD_SHARED_LIBS build (#155422)

In contrast to linking a static library, when linking a shared library
all referenced symbols must be available in either the objects files,
static libraries, or shared libraries passed to the linker command line
and cannot be deferred to when building the executable.

Fixes #150027

Same fix as included in #152223, but with only the changes necessary to
fix #150027 (which is unrelated to GCC 15)

[NFC][WPD] code style fixes (#155454)

Revert "[CI] Save sccache logs (#155444)"

This reverts commit c81cc9f.

This is causing premerge failures and needs more testing.

[ARM] Update a number of MVE tests to use -cost-kind=all. NFC

[gn build] Disable objc rewriter (#155479)

This is off by default in the CMake build:

option(CLANG_ENABLE_OBJC_REWRITER "Build the Objective-C rewriter tool" OFF)

[AMDGCN] Add missing gfx1250 clang tests. NFC. (#155478)

Remove trailing whitespace in DiagnosticSemaKinds.td. NFC (#155482)

[clang][PAC] Fix builtins that claim address discriminated types are bitwise compatible (#154490)

A number of builtins report some variation of "this type is compatibile
with some bitwise equivalent operation", but this is not true for
address discriminated values. We had address a number of cases, but not
all of them. This PR corrects the remaining builtins.

Fixes #154394

[mlir][acc] Add destroy region to reduction recipes (#155480)

Reduction recipes capture how a private copy is created. In some
languages, like C++ class variables with destructors - that private copy
also must be properly destroyed. Thus update the reduction recipe to
contain a destroy region similarly to the private recipes.

[hwasan] Add hwasan-static-linking option (#154529)

Discarding the .note.hwasan.globals section in ldscript causes a
linker error, since hwasan_globals refers to the discarded section.
The issue comes from hwasan.dummy.global being associated via metadata
with .note.hwasan.globals.

Add a new -hwasan-static-linking option to skip inserting
.note.hwasan.globals for static binaries, as it is only needed for
instrumenting globals from dynamic libraries. In static binaries, the
global variables section can be accessed directly via the
__start_hwasan_globals and __stop_hwasan_globals symbols inserted by
the linker.

[lldb] Do not use LC_FUNCTION_STARTS data to determine symbol size as symbols are created (#155282)

Note: This is a resubmission of #106791. I had to revert this a year ago
for a failing test that I could not understand. I have time now to try
and get this in again.

Summary:
This improves the performance of ObjectFileMacho::ParseSymtab by
removing eager and expensive work in favor of doing it later in a
less-expensive fashion.

Experiment:
My goal was to understand LLDB's startup time.
First, I produced a Debug build of LLDB (no dSYM) and a
Release+NoAsserts build of LLDB. The Release build debugged the Debug
build as it debugged a small C++ program. I found that
ObjectFileMachO::ParseSymtab accounted for somewhere between 1.2 and 1.3
seconds consistently. After applying this change, I consistently
measured a reduction of approximately 100ms, putting the time closer to
1.1s and 1.2s on average.

Background:
ObjectFileMachO::ParseSymtab will incrementally create symbols by
parsing nlist entries from the symtab section of a MachO binary. As it
does this, it eagerly tries to determine the size of symbols (e.g. how
long a function is) using LC_FUNCTION_STARTS data (or eh_frame if
LC_FUNCTION_STARTS is unavailable). Concretely, this is done by
performing a binary search on the function starts array and calculating
the distance to the next function or the end of the section (whichever
is smaller).

However, this work is unnecessary for 2 reasons:

  1. If you have debug symbol entries (i.e. STABs), the size of a function
    is usually stored right after the function's entry. Performing this work
    right before parsing the next entry is unnecessary work.
  2. Calculating symbol sizes for symbols of size 0 is already performed
    in Symtab::InitAddressIndexes after all the symbols are added to the
    Symtab. It also does this more efficiently by walking over a list of
    symbols sorted by address, so the work to calculate the size per symbol
    is constant instead of O(log n).

[IA][RISCV] Recognize interleaving stores that could lower to strided segmented stores (#154647)

This is a sibling patch to #151612: passing gap masks to the renewal TLI
hooks for lowering interleaved stores that use shufflevector to do the
interleaving.

NFC: remove some instances of deprecated capture (#154884)

 warning: implicit capture of 'this' with a capture default of '=' is deprecated [-Wdeprecated-this-capture]

Co-authored-by: Jeremy Kun j2kun@users.noreply.github.com

[LV] Remove unneeded ILV::LoopScalarPreHeader (NFC).

Follow-up suggested in #153643.
Remove some more global state by directly returning the scalar
preheader from createScalarPreheader.

[AArch64] Add another switch clustering test with power-of-2 constants.

Adds more test coverage for
#139736.

[AMDGPU] wmma_scale* IR verification (#155493)

[DAG] ComputeNumSignBits - ISD::EXTRACT_ELEMENT needs to return at least 1 (#155455)

When going through the ISD::EXTRACT_ELEMENT case, KnownSign - rIndex * BitWidth
could produce a negative. When a negative is produced, the lower bound
of the std::clamp is returned. Change that lower bound to one to avoid
potential underflows, because the expectation is that
ComputeNumSignBits
should always return at least 1.

Fixes #155452.

[AMDGPU] Do not assert on non-zero COMPUTE_PGM_RSRC3 on gfx1250. NFCI (#155498)

COMPUTE_PGM_RSRC3 does exist on gfx1250, we are just not using it yet.

[NFC][Asan] Remove volatile from test

After #155447.
It's not needed, but does not compile on PowerPC.

Reapply "[compiler-rt] Remove %T from shared object substitutions (#155302)"

This reverts commit 1d3c302.

There were three test failures:
odr-violation.cpp - Attempted to fix by keeping everything in the same
folder.
interception-in-shared-lib-test.cpp - Tried folding comments to preserve
line numberings. Almost seems like a debug info issue on PPC.
odr_c_test.c - Attempted to fix by keeping everything in the same
folder.

[lldb] Corretly parse Wasm segments (#154727)

My original implementation for parsing Wasm segments was wrong in two
related ways. I had a bug in calculating the file vm address and I
didn't fully understand the difference between active and passive
segments and how that impacted their file vm address.

With this PR, we now support parsing init expressions for active
segments, rather than just skipping over them. This is necessary to
determine where they get loaded.

Similar to llvm-objdump, we currently only support simple opcodes (i.e.
constants). We also currently do not support active segments that use a
non-zero memory index. However this covers all segments for a
non-trivial Swift binary compiled to Wasm.

[flang][openacc] Only generate acc.terminator in compute construct (#155504)

When the end of a block is inside a data region (not a compute region),
generating an acc.terminator will lead to a missing terminator when
translating to LLVM.

Only generate acc.terminator instead of fir.unreachable when nested in
acc compute region.

[MLIR] Apply clang-tidy fixes for performance-unnecessary-value-param in IRTypes.cpp (NFC)

Revert "[AArch64] AArch64TargetLowering::computeKnownBitsForTargetNode - add support for AArch64ISD::MOV/MVN constants" (#155503)

Reverts #154039, as it breaks bots.

[lldb] Adding structured types for existing MCP calls. (#155460)

This adds or renames existing types to match the names of the types on
https://modelcontextprotocol.io/specification/2025-06-18/schema for the
existing calls.

The new types are used in the unit tests and server implementation to
remove the need for crafting various llvm::json::Object values by
hand.

[ProfCheck] Exclude new LoopVectorize Test (#155502)

[MLIR][LLVMIR][DLTI] Pass to update #llvm.target's features per relevant backend (#154938)

Modifies #llvm.target<..., features = $FEATURES> so that $FEATURES
is now an #llvm.target_features<[...]> attribute (rather than a
StringAttr). This enables the attribute to respond to DLTI queries for
the different target features.

The pass updates the $FEATURES attribute of the target attr at name
llvm.target in accordance with the (Sub)Target's features that the
relevant LLVM backend knows about.


DEMO:

module attributes {llvm.target = #llvm.target<triple = "x86_64-unknown-linux",
                                              chip = "skylake"> } {
}

by way of -llvm-target-to-target-features turns into:

module attributes {llvm.target = #llvm.target<triple = "x86_64-unknown-linux",
                                              chip = "skylake", 
                                              features = <["+64bit", "+64bit-mode", "+adx", "+aes", "+allow-light-256-bit", "+avx", "+avx2", "+bmi", "+bmi2", "+clflushopt", "+cmov", "+crc32", "+cx16", "+cx8", "+ermsb", "+f16c", "+false-deps-popcnt", "+fast-15bytenop", "+fast-gather", "+fast-scalar-fsqrt", "+fast-shld-rotate", "+fast-variable-crosslane-shuffle", "+fast-variable-perlane-shuffle", "+fast-vector-fsqrt", "+fma", "+fsgsbase", "+fxsr", "+idivq-to-divl", "+invpcid", "+lzcnt", "+macrofusion", "+mmx", "+movbe", "+no-bypass-delay-blend", "+no-bypass-delay-mov", "+no-bypass-delay-shuffle", "+nopl", "+pclmul", "+popcnt", "+prfchw", "+rdrnd", "+rdseed", "+sahf", "+slow-3ops-lea", "+sse", "+sse2", "+sse3", "+sse4.1", "+sse4.2", "+ssse3", "+vzeroupper", "+x87", "+xsave", "+xsavec", "+xsaveopt", "+xsaves"]>>} {
}

[flang] Consolidate copy-in/copy-out determination in evaluate framework (#151408)

New implementation of MayNeedCopy() is used to consolidate
copy-in/copy-out checks.

IsAssumedShape() and IsAssumedRank() were simplified and are both
now in Fortran::semantics workspace.

preparePresentUserCallActualArgument() in lowering was modified to use
MayNeedCopyInOut()

Fixes #138471

[clang] Fix clang module build by declaring new textual header (#155510)

Add clang/Basic/ABIVersions.def introduced in #151995 to textual
header
to fix clang module build.

[fuzzer][Fuchsia] Forward fix for undefined StartRssThread (#155514)

The declaration was static when it shouldn't be since it can be defined
in FuzzerUtilFuchsia.cpp

Support: Add proxies for raw_ostream and raw_pwrite_stream (#113362)

Add proxies classes for raw_ostream and raw_pwrite_stream called
raw_ostream_proxy and raw_pwrite_stream_proxy. Add adaptor classes,
raw_ostream_proxy_adaptor<> and raw_pwrite_stream_proxy_adaptor<>,
to allow subclasses to use a different parent class than raw_ostream
or raw_pwrite_stream.

The adaptors are used by a future patch to help a subclass of
llvm::vfs::OutputFile, an abstract subclass of raw_pwrite_stream, to
proxy a raw_fd_ostream.

Patched by dexonsmith.

[gn build] Port 90670b5

[libc][NFC] Clean up utimes and setsid (#155495)

Simplify utims a bit and add proper error handling to setsid as
described in the standard

[NFC][MC][XCore] Eliminate forward decls by rearranging functions (#155456)

Revert "Reapply "[compiler-rt] Remove %T from shared object substitutions (#155302)""

This reverts commit 7624197.

This is causing more buildbot failures that probably need some offline
investigation:

  1. https://lab.llvm.org/buildbot/#/builders/186/builds/11923

[NFC][WPD] Pass the module analysis manager instead of lambdas (#155338)

Easier to evolve - if we need more analyses, it becomes clumsy to keep passing around lambdas.

Reapply "[AMDGPU] gfx1250 trans instructions bf16 codegen tests update. NFC (#155310)" (#155515)

[CIR] Add support for initializing classes with multiple vtables (#155275)

This adds support for initializing the vptr members in a class that
requires multiple vtables because of multiple inheritence. This still
does not handle virtual bases.

[CI] Strip strings from filenames in compute_projects.py (#155519)

This can otherwise mess up some of the path detection logic,
particularly around ensuring the premerge checks are run when the
workflow YAML file is changed.

[RISCV][TTI] Implement getAddressComputationCost() in RISCV TTI. (#149955)

This patch implements the getAddressComputationCost() in RISCV TTI
which
make the gather/scatter with address calculation more expansive that
stride cost.

Note that the only user of getAddressComputationCost() with vector
type is in VPWidenMemoryRecipe::computeCost(). So this patch make some
LV tests changes.

I've checked the tests changes in LV and seems those changes can be
divided into two groups.

  • gather/scatter with uniform vector ptr, seems can be optimized to
    masked.load.
  • can optimize to stride load/store.

[lldb-dap] Improving lldbdap_testcase.py error diagnosability (#155352)

Improved response Message handling in lldbdap_testcase.py to handle
various formats. Allows for more descriptive error messaging (Provides
useful info even when error details are malformed)


Co-authored-by: Piyush Jaiswal piyushjais@meta.com

[orc-rt] Fix comment typos in unit tests. NFC.

[lld][WebAssembly] -r: force -Bstatic (#108264)

This is a port of a recent ELF linker change: 8cc6a24.

[AMDGPU] Set GRANULATED_WAVEFRONT_SGPR_COUNT of compute_pgm_rsrc1 to 0 for gfx10+ (#154666)

According to llvm-project/llvm/docs/AMDGPUUsage.rst::L5212 the
GRANULATED_WAVEFRONT_SGPR_COUNT, which is compute_pgm_rsrc1[6:9] has
to be 0 for gfx10+ arch


Co-authored-by: Matt Arsenault Matthew.Arsenault@amd.com

Revert "[RISCV][TTI] Implement getAddressComputationCost() in RISCV TTI." (#155535)

Reverts #149955

Reapply "[CI] Save sccache logs (#155444)" (#155520)

This reverts commit b90f4ff.

Relands the change after making the relevant fixes (not missing the
artifacts directory).

AMDGPU: Fold mov imm to copy to av_32 class (#155428)

Previously we had special case folding into copies to AGPR_32,
ignoring AV_32. Try folding into the pseudos.

Not sure why the true16 case regressed.

[NFC] [clangd] [C++20 Modules] Add a warning if clangd detected multiple
source declares the same module

Now clangd assumes no duplicated module declared by different source
file in a sinlge project. But in practice, it may not be the case.

Although we can't fix it now, emitting a warning is helpful for users to
understand what's going on.

[DAGCombiner] Avoid double deletion when replacing multiple frozen/unfrozen uses (#155427)

Closes #155345.
In the original case, we have one frozen use and two unfrozen uses:

t73: i8 = select t81, Constant:i8<0>, t18
t75: i8 = select t10, t18, t73
t59: i8 = freeze t18 (combining)

t80: i8 = freeze t59 (another user of t59)

In DAGCombiner::visitFREEZE, we replace all uses of t18 with t59.
After updating the uses, t59: i8 = freeze t18 will be updated to t59: i8 = freeze t59 (AddModifiedNodeToCSEMaps) and CSEed into t80: i8 = freeze t59 (ReplaceAllUsesWith). As the previous call to
AddModifiedNodeToCSEMaps already removed t59 from the CSE map,
ReplaceAllUsesWith cannot remove t59 again.

For clarity, see the following call graph:

ReplaceAllUsesOfValueWith(t18, t59)
  ReplaceAllUsesWith(t18, t59)
    RemoveNodeFromCSEMaps(t73)
    update t73
    AddModifiedNodeToCSEMaps(t73)
    RemoveNodeFromCSEMaps(t75)
    update t75
    AddModifiedNodeToCSEMaps(t75)
    RemoveNodeFromCSEMaps(t59) <- first delection
    update t59
    AddModifiedNodeToCSEMaps(t59)
        ReplaceAllUsesWith(t59, t80)
            RemoveNodeFromCSEMaps(t59) <- second delection
                Boom!

This patch unfreezes all the uses first to avoid triggering CSE when
introducing cycles.

[clang][HeuristicResolver] Resolve explicit object parameter to enclosing record type (#155143)

Heuristically resolve the type of a this auto parameter to the record type
in the declaration.

struct Foo {
  int member {};
  auto&& getter1(this auto&& self) { // assume `self` is is `Foo`
    return self.member;
};

Fixes clangd/clangd#2323

[flang][acc] Fix the indexing of the reduction combiner for multidimensional static arrays (#155536)

In the following example of reducing a static 2D array, we have
incorrect coordinates for array access in the reduction combiner. This
PR reverses the order of the induction variables used for such array
indexing. For other cases of static arrays, we reverse the loop order as
well so that the innermost loop can handle the innermost dimension.

program main
  implicit none
  integer, parameter :: m = 2
  integer, parameter :: n = 10
  integer :: r(n,m), i

  r = 0

  !$acc parallel loop reduction(+:r(:n,:m))
  do i = 1, n
     r(i, 1) = i
  enddo

  print *, r
end program main

Currently, we have:

fir.do_loop %arg2 = %c0 to %c1 step %c1 {
  fir.do_loop %arg3 = %c0 to %c9 step %c1 {
    %0 = fir.coordinate_of %arg0, %arg2, %arg3 : (!fir.ref<!fir.array<10x2xi32>>, index, index) -> !fir.ref<i32>
    %1 = fir.coordinate_of %arg1, %arg2, %arg3 : (!fir.ref<!fir.array<10x2xi32>>, index, index) -> !fir.ref<i32>

We'll obtain:

fir.do_loop %arg2 = %c0 to %c1 step %c1 {
  fir.do_loop %arg3 = %c0 to %c9 step %c1 {
    %0 = fir.coordinate_of %arg0, %arg3, %arg2 : (!fir.ref<!fir.array<10x2xi32>>, index, index) -> !fir.ref<i32>
    %1 = fir.coordinate_of %arg1, %arg3, %arg2 : (!fir.ref<!fir.array<10x2xi32>>, index, index) -> !fir.ref<i32>

[Github][CI] Install the correct binary of sccache on aarch64 (#155328)

[LoongArch][NFC] Pre-commit for BR_CC and SELECT_CC optimization (#151788)

AMDGPU: Remove unused argument from adjustAllocatableRegClass (#155554)

[RISCV] Lower (setugt X, 2047) as (setne (srl X, 11), 0) (#155541)

This matches 4095 and other pow2-1 constants larger simm12. We normally
do this through a DAGCombine controlled by isLegalICmpImmediate. 2047 is
considered a legal immediate because we have a setult instruction. In
this case we have setugt which isn't natively supported.

I added tests for 4095 for comparison.

[orc-rt] Add bind_front, a pre-c++-20 std::bind_front substitute. (#155557)

This can be used until the ORC runtime is able to move to c++-20.

Also adds a CommonTestUtils header with a utility class, OpCounter, that
counts the number of default constructions, copy constructions and
assignments, move constructions and assignments, and destructions. This
is used to test that orc_rt::bind_front doesn't introduce unnecessary
copies / moves.

AMDGPU: Remove special case of SGPR_LO class in imm folding (#155518)

Previous change accidentally broke this which shows it's not
doing anything.

[libc][math][c++23] Add {modf,remainder,remquo}bf16 math functions (#154652)

This PR adds the following basic math functions for BFloat16 type along
with the tests:

  • modfbf16
  • remainderbf16
  • remquobf16

Signed-off-by: Krishna Pandey kpandey81930@gmail.com
Co-authored-by: OverMighty its.overmighty@gmail.com

[RISCV] Group Zcf and Zcd instructions and CompressPats together. NFC (#155555)

Instead of repeatedly changing Predicates for each instruction.

[CodeGen] Optimize/simplify finalizeBundle. NFC (#155448)

When tracking defs in finalizeBundle two sets are used. LocalDefs is
used to track defined virtual and physical registers, while LocalDefsP
is used to track defined register units for the physical registers.

This patch moves the updates of LocalDefsP to only iterate over regunits
when a new physical register is added to LocalDefs. When the physical
register already is present in LocalDefs, then the corresponding
register units are present in LocalDefsP. So it was a waste of time to
add them to the set again.

[mlir][amx] Direct AMX data transfers (#154114)

Extends Vector to AMX conversion to attempt populating AMX tiles
directly from memory.

When possible, contraction producers and consumers are replaced by AMX
tile data transfer operations. This shortens data path by skipping
intermediate register loads and stores.

Add tools needed by build_symbolizer.sh to runtime deps when internal symbolizer enabled. (#153723)

[mlir][Transforms] Dialect conversion: Context-aware type conversions (#140434)

This commit adds support for context-aware type conversions: type
conversion rules that can return different types depending on the IR.

There is no change for existing (context-unaware) type conversion rules:

// Example: Conversion any integer type to f32.
converter.addConversion([](IntegerType t) {
  return Float32Type::get(t.getContext());
}

There is now an additional overload to register context-aware type
conversion rules:

// Example: Type conversion rule for integers, depending on the context:
// Get the defining op of `v`, read its "increment" attribute and return an
// integer with a bitwidth that is increased by "increment".
converter.addConversion([](Value v) -> std::optional<Type> {
  auto intType = dyn_cast<IntegerType>(v.getType());
  if (!intType)
    return std::nullopt;
  Operation *op = v.getDefiningOp();
  if (!op)
    return std::nullopt;
  auto incrementAttr = op->getAttrOfType<IntegerAttr>("increment");
  if (!incrementAttr)
    return std::nullopt;
  return IntegerType::get(v.getContext(),
                          intType.getWidth() + incrementAttr.getInt());
});

For performance reasons, the type converter caches the result of type
conversions. This is no longer possible when there context-aware type
conversions because each conversion could compute a different type
depending on the context. There is no performance degradation when there
are only context-unaware type conversions.

Note: This commit just adds context-aware type conversions to the
dialect conversion framework. There are many existing patterns that
still call converter.convertType(someValue.getType()). These should be
gradually updated in subsequent commits to call
converter.convertType(someValue).

Co-authored-by: Markus Böck markus.boeck02@gmail.com

[BOLT][AArch64] Fix another cause of extra entry point misidentification (#155055)

[PowerPC] ppc64-P9-vabsd.ll - update v16i8 abdu test now that it vectorizes in the middle-end (#154712)

The scalarized IR was written before improvements to SLP / cost models
ensured that the abs intrinsic was easily vectorizable

opt -O3 : https://zig.godbolt.org/z/39T65vh8M

Now that it is we need a more useful llc test

[AMDGPU] Refactor insertWaveSizeFeature (#154850)

If a wavefrontsize32 or wavefrontsize64 is the only possible value
insert it into feature list by default and use that value as an
indication that another wavefront size is not legal.

s390x: optimize 128-bit fshl and fshr by high values (#154919)

Turn a funnel shift by N in the range 121..128 into a funnel shift in
the opposite direction by 128 - N. Because there are dedicated
instructions for funnel shifts by values smaller than 8, this emits
fewer instructions.

This additional rule is useful because LLVM appears to canonicalize
fshr into fshl, meaning that the rules for fshr on values less
than 8 would not match on organic input.

[clang] Post-commit review for #150028 (#155351)

  1. Return std::nullopt instead of {}.
  2. Rename the new function to evaluate*, it's not a simple getter.

[ASan] Prevent assert from scalable vectors in FunctionStackPoisoner. (#155357)

This has recently started causing 'Invalid size request on a scalable
vector.'

[LV] Add test for vectorisation of SAXPY unrolled by 5 (NFC). (#153039)

This test contains a vectorisation example of a loop based on SAXPY
manually unrolled by five, as discussed in #148808.

[Flang-RT][OpenMP] Define _GLIBCXX_NO_ASSERTIONS (#155440)

Since GCC 15.1, libstdc++ enabled assertions/hardening by default in
non-optimized (-O0) builds [1]. That is, _GLIBCXX_ASSERTIONS is defined
in the libstdc++ headers itself so defining/undefining it on the
compiler command line no longer has an effect in non-optimized builds.
As the commit message[2] suggests, define _GLIBCXX_NO_ASSERTIONS
instead.

For libstdc++ headers before 15.1, -U_GLIBCXX_ASSERTIONS still has to be
on the command line as well.

Defining _GLIBCXX_NO_ASSERTIONS was previously proposed in #152223

[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112808
[2] gcc-mirror/gcc@361d230

[AArch64][SME] Simplify initialization of the TPIDR2 block (#141049)

This patch updates the definition of AArch64ISD::INIT_TPIDR2OBJ to
take the number of save slices (which is currently always all ZA
slices). Using this, we can initialize the TPIDR2 block with a single
STP of the save buffer pointer and the number of save slices. The
reserved bytes (10-15) will be implicitly zeroed as the result of RDSVL
will always be <= 16-bits.

Note: We used to write the number of save slices to the TPIDR2 block
before every call with a lazy save; however, based on 6.6.9 "Changes to
the TPIDR2 block" in the aapcs64
(https://github.com/ARM-software/abi-aa/blob/main/aapcs64/aapcs64.rst#changes-to-the-tpidr2-block),
it seems we can rely on callers preserving the contents of the TPIDR2 block.

[AMDGPU] More radical feature initialization refactoring (#155222)

Factoring in flang, just have a single fillAMDGPUFeatureMap
function doing it all as an external interface and returing
an error.

[mlir] Consistently add TableGen generated files as deps to mlir-headers/mlir-generic-headers CMake targets (#155474)

Tool targets like mlir-opt rely on the mlir-headers or
mlir-generic-headers targets to run first to generate headers.
However, many of the IncGen targets are not specified as dependencies
of the header targets in CMake, which causes spurious build failures when
using a high number of parallel build jobs.

Thus, this commit introduces a pair of new CMake macros
add_mlir_dialect_tablegen_target and
add_mlir_generic_tablegen_target to
AddMLIR.cmake, which can be used in place of
add_public_tablegen_target to
ensure (by convention) that IncGen targets are added to the
mlir-headers
(resp. mlir-generic-headers) target dependencies.

Most uses of add_public_tablegen_target in the dialects have been
refactored to use the new macros.

[MLIR] Adopt LDBG() in EliminateBarriers.cpp (NFC) (#155092)

Also add an extra optional TYPE argument to the LDBG() macro to make it
easier to punctually overide DEBUG_TYPE.

[KeyInstr] Enable -gkey-instructions by default if optimisations are enabled (#149509)

That's enabling Clang's -gkey-instructions, cc1's -gkey-instructions
remains off by default.

Key Instructions improves the optimized-code debug-stepping experience
in debuggers that use DWARF's is_stmt line table register to determine
stepping behaviour.

The feature can be disabled with -gno-key-instructions (note that the
positive and negative flag both imply -g).

RFC:
https://discourse.llvm.org/t/rfc-improving-is-stmt-placement-for-better-interactive-debugging/82668

[Bazel] Add missing Support dep to VectorToAMX (#155576)

[MLIR] Migrate Transform/IR/TransformOps.cpp to LDBG() debugging macro (NFC) (#155098)

[clang] AST: fix getAs canonicalization of leaf types (#155028)

[GlobalISel] Add support for scalarizing vector insert and extract elements (#153274)

This Adds scalarization handling for fewer vector elements of insert and
extract, so that i128 and fp128 types can be handled if they make it
past combines. Inserts are unmerged with the inserted element added to
the remerged vector, extracts are unmerged then the correct element is
copied into the destination. With a non-constant vector the usual stack
lowering is used.

[MLIR] Apply clang-tidy fixes for performance-unnecessary-value-param in Pass.cpp (NFC)

[Bazel] Add missing SCFTransforms dep to TestDialect (#155581)

[MLIR] Apply clang-tidy fixes for llvm-include-order in RegisterEverything.cpp (NFC)

[mlir][linalg] Produce canonical linalg.generic for im2col (#134675)

Before this patch, the Img2Col transform produced a non-canonical
linalg.generic whose input tensor was not reported in the inputs of the
operation: instead, it was accessed manually from inside the op body,
after an internal calculation of the access offsets. This patch modifies
the Im2Col rewrite to produce a canonical linalg.generic whose input is
correctly reported in its 'ins()', whose access offsets are computed
through an indexing map, and whose body contains only a 'linalg.yield'
op.

Signed-off-by: Fabrizio Indirli Fabrizio.Indirli@arm.com
Co-authored-by: Georgios Pinitas georgios.pinitas@arm.com

[clang][bytecode] Handle vector assignments (#155573)

[clang-repl] Put CompilerInstance fr

When using no-movt we don't use the pcrel version of the literal load.
This change also unifies logic with the ARM version of this function as well,
which has:
```
  if (!Subtarget.useMovt() || ForceELFGOTPIC) {
    // For ELF non-PIC, use GOT PIC code sequence as well because R_ARM_GOT_ABS
    // does not have assembler support.
    if (TM.isPositionIndependent() || ForceELFGOTPIC)
      expandLoadStackGuardBase(MI, ARM::LDRLIT_ga_pcrel, ARM::LDRi12);
    else
      expandLoadStackGuardBase(MI, ARM::LDRLIT_ga_abs, ARM::LDRi12);
    return;
  }
```

rdar://138334512
Copy link
Contributor Author

This stack of pull requests is managed by Graphite. Learn more about stacking.

@aemerson aemerson closed this Aug 30, 2025
@aemerson aemerson deleted the users/aemerson/movt-pic branch August 30, 2025 21:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment