[NFC] [clangd] [Modules] remove dot in log #156207
Closed
+63
−23
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
[NFC] [clangd] [Modules] remove dot in log
The dot in the log makes it hard to copy and execute the commands from
the log. Remove it.
[clangd] [C++20 Modules] Add --debug-modules-builder to not remove built module files on exit
In practice I found the option is very helpful for me to understand what
happens when clangd's C++20 modules support fails. With '--log=verbose',
I can rerun the command by clangd to understand what's going wrong
actually.
The documentation or adding the option to '--help' list can be done
separately.
Fix test added in #155148 work with Windows style path separators. (#155354)
Should fix Windows build bot failures such as
https://lab.llvm.org/buildbot/#/builders/46/builds/22281.
The test (and the followup fix in #155303) did not properly account for
Windows style path separators.
[libc++] Add a release note about multi{map,set}::find not returning the first element anymore (#155252)
We've modified the algorithm of
__tree::find
in #152370, which canchange the return value. Since we're always returned the lower bound
before some users started relying on it. This patch adds a release note
so users are aware that this might break their code.
[libc++][C++03] Split libc++-specific tests for the frozen headers (#144093)
The C++03 headers are essentially a separate implementation, so it
doesn't make a ton of sense to try to test two implementations with a
single set of implementation-specific tests.
This patch doesn't copy over any tests that will not be run in C++03
mode. The most notable changes are that
lit.local.cfg
files aretouched to change the path from
libcxx/test/libcxx
tolibcxx/test/libcxx-03
in a few places.This also modifies
lit.local.cfg
files to runlibcxx/test/libcxx-03
only when using the frozen headers and
lbcxx/test/libcxx
tests onlywhen not using the frozen headers.
This is part of
https://discourse.llvm.org/t/rfc-freezing-c-03-headers-in-libc.
[libc++] Remove a few incorrect _LIBCPP_EXPORTED_FROM_ABI annotations (#132602)
This has two benefits:
unnecessary
_LIBCPP_HIDE_FROM_ABI
from any member functions once weare able to make
_LIBCPP_HIDE_FROM_ABI
the default within libc++[lldb] Fix a warning
This patch fixes:
lldb/unittests/Protocol/ProtocolMCPServerTest.cpp:285:14: error:
unused variable 'mutex' [-Werror,-Wunused-variable]
[libc++][NFC] Wrap lines in ReleaseNotes/22.rst (#155359)
Some of the lines in
ReleaseNotes/22.rst
are (significantly) longerthan our usual 120 column limit. This wraps all lines in the file so
they are never more than our usual limit.
[flang] Disable loop interchange by default (#155279)
Disable loop interchange by default, while keeping the ability to
explicitly enable using
-floop-interchange
. This matches Clang.See discussion on #140182.
[X86] Fix spill issue for fr16 (#155225)
When avx512fp16 is not available, we use MOVSS to spill fr16/fr16x
register.
However The MOVSSmr require fr32 register class and MOVSSrm require
vr128
register class which cause bad instruction detected by machine verifier.
To fix the issue this patch is to create a pseudo instruction MOVSHP for
fr16 register spilling. MOVSHP is expanded to MOVSS or VMOVSSZ depending
on the register number.
Co-authored-by: Yuanke Luo ykluo@birentech.com
[mlir][emitc] Fix bug in ApplyOp translation (#155171)
The translator emits
emitc.apply
incorrectly when the op is part of anexpression, as it prints the name of the operand instead of calling
emitOperand() which takes into account the expression being emitted,
leaving out the part of the expression feeding this op, e.g.
translates to:
instead of:
[clang][test] Add a RUN line for the bytecode interpreter (#155363)
This test works with the bytecode interpreter, so add some additional
testing.
[mlir][scf] Expose isPerfectlyNestedForLoops (#152115)
The function
isPerfectlyNestedForLoops
is useful on its own and so I'mexposing it for downstream use.
[NFC] Remove out dated comment for clear-ast-before-backend
The comment is outdated since d0a5f61
[clang][DebugInfo][test] Move debug-info tests from CodeGenObjCXX to DebugInfo directory (#154912)
This patch works towards consolidating all Clang debug-info into the
clang/test/DebugInfo
directory(https://discourse.llvm.org/t/clang-test-location-of-clang-debug-info-tests/87958).
Here we move only the
clang/test/CodeGenObjCXX
tests.The list of files i came up with is:
*debug-info*
in the filenamedebug-info-kind
in the tests[LV] Remove use of llc from vectoriser tests (#154759)
There were 5 X86 loop vectoriser tests that were piping the output from
opt into llc. I think in the directory test/Transforms/LoopVectorize we
should only be testing the output from the loop vectoriser pass. Any
codegen tests should live in test/CodeGen/X86 instead.
avx512.ll: it looks like we were really just testing that we generate
the right vector length.
fp32_to_uint32-cost-model.ll/fp64_to_uint32-cost-model.ll: the tests
only seem to care that we're not scalarising the fptoui, so I've
modified the test to check for vector ops. I've assumed there are
already codegen tests for fptoui vector operations.
vectorization-remarks-loopid-dbg.ll: i've copied this test to
CodeGen/X86/vectorization-remarks-loopid-dbg.ll for the llc RUN line
variant
vectorization-remarks.ll: seems to test the same thing as
vectorization-remarks-loopid-dbg.ll
[MLIR][TOSA] Add missing SameOperandsAndResultShape Trait to tosa.cast (#153826)
According to the TOSA spec, tosa.cast is only changing the elementtype,
and not the shape of the input tensor
Signed-off-by: Rickert, Jonas jonas.rickert@amd.com
[ComplexDeinterleaving] Use LLVM ADTs (NFC) (#154754)
This swaps out STL types for their LLVM equivalents. This is recommended
in the LLVM coding standards: https://llvm.org/docs/CodingStandards.html#c-standard-library
[LV] Stop using the legacy cost model for udiv + friends (#152707)
In VPWidenRecipe::computeCost for the instructions udiv, sdiv, urem and
srem we fall back on the legacy cost unnecessarily. At this point we
know that the vplan must be functionally correct, i.e. if the
divide/remainder is not safe to speculatively execute then we must have
either:
VPWidenRecipe, or
fault through divide-by-zero.
For 2) it's necessary to add the select operation to
VPInstruction::computeCost so that we mirror the cost of the legacy cost
model. The only problem with this is that we also generate selects in
vplan for predicated loops with reductions, which aren't accounted for
in the legacy cost model. In order to prevent asserts firing I've also
added the selects to precomputeCosts to ensure the legacy costs match
the vplan costs for reductions.
[libc++][C++03] Remove XFAILs from the non-frozen libc++-specific tests (#144101)
The tests in
libcxx/test/libcxx
aren't run against the frozen headersanymore, so we can remove any XFAILs in them.
This is part of
https://discourse.llvm.org/t/rfc-freezing-c-03-headers-in-libc.
[clang][bytecode][NFC] Check InitializingBlocks in _within_lifetime (#155378)
This kind of check is exactly why InterpState::InitializingBlocks
exists.
[libc++][C++03] Fix tests which only fail due to incorrect includes (#144110)
Quite a few of the frozen header tests only fail because the include
path is incorrect due to copying the headers. This patch fixes the tests
where that's the only problem.
This is part of
https://discourse.llvm.org/t/rfc-freezing-c-03-headers-in-libc.
[LV] Return Invalid from getLegacyCost when instruction cost forced. (#154543)
LoopVectorizationCostModel::expectedCost will only override the cost
returned by getInstructionCost when valid. This patch ensures we do
the same in VPCostContext::getLegacyCost, avoiding the "VPlan cost
model and legacy cost model disagreed" assert in the included test.
[libc++][C++03] Fix a bunch of random tests (#144117)
This fixes/removes a bunch of random tests. They all failed in
relatively simple to fix ways.
Specificially (all inside
libcxx/test/libcxx-03
):utilities/template.bitset/includes.pass.cpp
: the header guards havedifferent names now (guard names fixed)
utilities/meta/is_referenceable.compile.pass.cpp
: The name changedfrom
__libcpp_is_referenceable
(reverted name)utilities/function.objects/refwrap/desugars_to.compile.pass.cpp
:Optimization has been added after the header split (test removed)
type_traits/is_replaceable.compile.pass.cpp
:__is_replacable_v
hasbeen added after the header split (test removed)
type_traits/is_constant_evaluated.pass.cpp
: Ran C++11 codeaccidentally (C++11 test parts removed)
type_traits/desugars_to.compile.pass.cpp
: Optimization has beenadded after the header split (test removed)
numerics/bit.ops.pass.cpp
: Tried to include header which doesn'texist (removed include and related code which wasn't executed in C++03)
experimental/fexperimental-library.compile.pass.cpp
: This test isirrelevant for C++03, since there are no C++03 experimental features
(test removed)
containers/container_traits.compile.pass.cpp
:container_traits
have been introduced after the header split (test removed)
[OpenMPIRBuilder] Fix tripcount not a multiple of tile size (#154999)
The emitted code tests whether the current tile should executing the
remainder iterations by checking the logical iteration number is the one
after the floor iterations that execute the non-remainder iterations.
There are two counts of how many iterations there are: Those of
non-remainder iterations (simply rounded-down division of tripcount and
tile size), and those including an additional floor iteration for the
remainder iterations. The code was used the wrong one that caused the
condition to never match.
[VPlan][RISC-V] Add test case for #154103
This has now been fixed by #152707
[clang-repl] Delegate CodeGen related operations for PTU to IncrementalParser (#137458)
Read discussion : #136404 (comment)
and the following comments for context
Motivation
IncrementalAction
is designed to keep Frontend statealive acrossinputs. As per the docstring: “IncrementalAction ensures it keeps its
underlying action's objects alive as long as the IncrementalParser needs
them.”
IncrementalParser
, device:IncrementalCUDADeviceParser
) shouldmanage PTU registration and module generation, while the interpreter
orchestrates at a higher level.
What this PR does
Moves CodeGen surfaces behind IncrementalAction:
GenModule(), getCodeGen(), and the cached “first CodeGen module” now
live in IncrementalAction.
Moves PTU ownership to the parser layer:
Adds IncrementalParser::RegisterPTU(…) (and device counterpart)
Add device-side registration in IncrementalCUDADeviceParser.
Remove Interpreter::{getCodeGen, GenModule, RegisterPTU}.
[TableGen][DecoderEmitter] Remove no longer needed MaxFilterWidth (NFC) (#155382)
11c6158 made the variable redundant.
Also remove
Target
, which is apparently unused.[mlir][SCFToOpenMP] Use walk pattern driver (#155242)
The lowering pattern uses various APIs that are not supported in a
dialect conversion such as
Block::eraseArguments
andRewriterBase::replaceAllUsesWith
. Switch to the more efficient andsimpler walk pattern driver.
[LV] Add early-exit test where the inner loop IV depends on outer loop.
[MLIR] Apply clang-tidy fixes for misc-use-internal-linkage in DialectTransform.cpp (NFC)
[MLIR] Apply clang-tidy fixes for performance-unnecessary-value-param in DialectTransform.cpp (NFC)
[gn build] Port 2ab4c28
[flang][OpenMP] move omp end directive validation to semantics (#154739)
The old parse tree errors quckly exploded to thousands of unhelpful
lines when there were multiple missing end directives (e.g. #90452).
Instead I've added a flag to the parse tree indicating when a missing
end directive needs to be diagnosed, and moved the error messages to
semantics (where they are a lot easier to control).
This has the disadvantage of not displaying the error if there were
other parse errors, but there is a precedent for this approach (e.g.
parsing atomic constructs).
[mlir][MemRef] Address TODO to use early_inc to simplify elimination of uses (NFC) (#155123)
[MLIR][EmitC] Bugfix in emitc.call_opaque operand emission (#153980)
The operand emission needed the operand to be in scope which lead to
failure when the emitc.call_opaque is in an emitc.expression's body.
[lldb] Fix spacing in "proccess plugin packet monitor" help
[SCEVExp] Check if getPtrToIntExpr resulted in CouldNotCompute.
This fixes a crash trying to use SCEVCouldNotCompute, if getPtrToIntExpr
failed.
Fixes #155287
[MLIR] Apply clang-tidy fixes for performance-unnecessary-value-param in ExecutionEngineModule.cpp (NFC)
[flang][OpenMP] move omp end sections validation to semantics (#154740)
See #90452. The old parse tree errors exploded to thousands of unhelpful
lines when there were multiple missing end directives.
Instead, allow a missing end directive in the parse tree then validate
that it is present during semantics (where the error messages are a lot
easier to control).
[clang][bytecode][NFC] Use Pointer::initializeAllElements() in Program (#155391)
We just initialized the entire string, so use this function instead.
[Offload] Full AMD support for olMemFill (#154958)
[mlir][vector] Fix crashes in
from_elements
folder +broadcast
verifier (#155393)This PR fixes two crashes / failures.
vector.broadcast
verifier did not take into accountVectorElementTypeInterface
and was looking for int/index/float types.vector.from_elements
folder attempted to create an invalidDenseElementsAttr
. Only int/float/index/complex types are supported.[clang][bytecode][NFC] Check hasTrivialDtor() in RunDestructors (#155381)
We do this when calling Free() on dynamically allocated memory.
AMDGPU: Stop checking if registers are reserved in adjustAllocatableRegClass (#155125)
This function is used to implement TargetInstrInfo::getRegClass and
conceptually should not depend on the dynamic state of the function.
[RISCV][NFC] Fix typo v32 -> v31 in document (#155389)
[VPlan] Replace EVL branch condition with (branch-on-count AVLNext, 0) (#152167)
This changes the branch condition to use the AVL's backedge value
instead of the EVL-based IV.
This allows us to emit bnez on RISC-V and removes a use of the trip
count, which should reduce register pressure.
To match phis with VPlanPatternMatch I've had to relax the assert that
the number of operands must exactly match the pattern for the Phi
opcode, and I've copied over m_ZExtOrSelf from the LLVM IR
PatternMatch.h.
Fixes #151459
[MLIR] Apply clang-tidy fixes for llvm-include-order in IRAffine.cpp (NFC)
[MLIR] Apply clang-tidy fixes for performance-unnecessary-value-param in IRAffine.cpp (NFC)
[MLIR] Apply clang-tidy fixes for llvm-else-after-return in IRAttributes.cpp (NFC)
[clang][bytecode] Support remaining add_sat like X86 builtins (#155358)
[RelLookupTableConverter] Generate test checks (NFC)
This was using a mix of generated check lines and manual edits,
which makes future changes hard. Regenerate with a newer version
and --check-globals.
[clang][bytecode] Try to avoid dtor functions in Record descriptors (#155396)
We don't need to call the dtor fn of a record where all bases, fields
and virtual bases have no dtor fn either.
[AArch64] Expand MI->getOperand(1).getImm() with 0 literal (#154598)
MI->getOperand(1).getImm()
has already been verified to be 0 enteringthe block.
[VPlan] Compute cost of replicating calls in VPlan. (NFCI) (#154291)
Implement computing the scalarization overhead for replicating calls in
VPlan, matching the legacy cost model.
Depends on #154126.
PR: #154291
[X86] Show failure to fold freeze(gfni()) -> gfni(freeze(),freeze()) for all gfni instructions
[InstCombine] Generate test checks (NFC)
[llvm-exegesis] Implement the loop repetition mode for AArch64 (#154751)
Subject says it all: implement the loop iterator decrement and jump
function functions, and reserve X19 for the loop counter.
[GWP-ASan] Include <unistd.h> for sysconf(_SC_PAGESIZE) (#155261)
This fixes build failures on Fuchsia that started with #153860
[VPlan] Improve style around container-inserts (NFC) (#155174)
[MLIR] Apply clang-tidy fixes for performance-unnecessary-value-param in IRAttributes.cpp (NFC)
[MLIR] Apply clang-tidy fixes for misc-use-internal-linkage in IRCore.cpp (NFC)
[flang][OpenMP] Delete no longer used Omp[End]CriticalDirective, NFC (#155099)
[Headers][X86] Allow AVX512VLBW integer reduction intrinsics to be used in constexpr (#155199)
Fixes #154284
Add constexpr support for the following:
_mm_reduce_add_epi8 _mm_reduce_add_epi16 _mm256_reduce_add_epi8
_mm256_reduce_add_epi16 _mm_reduce_mul_epi8 _mm_reduce_mul_epi16
_mm256_reduce_mul_epi8 _mm256_reduce_mul_epi16 _mm_reduce_and_epi8
_mm_reduce_and_epi16 _mm256_reduce_and_epi8 _mm256_reduce_and_epi16
_mm_reduce_or_epi8 _mm_reduce_or_epi16 _mm256_reduce_or_epi8
_mm256_reduce_or_epi16
_mm_mask_reduce_add_epi8 _mm_mask_reduce_add_epi16
_mm256_mask_reduce_add_epi8 _mm256_mask_reduce_add_epi16
_mm_mask_reduce_mul_epi8 _mm_mask_reduce_mul_epi16
_mm256_mask_reduce_mul_epi8 _mm256_mask_reduce_mul_epi16
_mm_mask_reduce_and_epi8 _mm_mask_reduce_and_epi16
_mm256_mask_reduce_and_epi8 _mm256_mask_reduce_and_epi16
_mm_mask_reduce_or_epi8 _mm_mask_reduce_or_epi16
_mm256_mask_reduce_or_epi8 _mm256_mask_reduce_or_epi16
_mm_reduce_max_epi8 _mm_reduce_max_epi16 _mm256_reduce_max_epi8
_mm256_reduce_max_epi16 _mm_reduce_min_epi8 _mm_reduce_min_epi16
_mm256_reduce_min_epi8 _mm256_reduce_min_epi16 _mm_reduce_max_epu8
_mm_reduce_max_epu16 _mm256_reduce_max_epu8 _mm256_reduce_max_epu16
_mm_reduce_min_epu8 _mm_reduce_min_epu16 _mm256_reduce_min_epu8
_mm256_reduce_min_epu16
_mm_mask_reduce_max_epi8 _mm_mask_reduce_max_epi16
_mm256_mask_reduce_max_epi8 _mm256_mask_reduce_max_epi16
_mm_mask_reduce_min_epi8 _mm_mask_reduce_min_epi16
_mm256_mask_reduce_min_epi8 _mm256_mask_reduce_min_epi16
_mm_mask_reduce_max_epu8 _mm_mask_reduce_max_epu16
_mm256_mask_reduce_max_epu8 _mm256_mask_reduce_max_epu16
_mm_mask_reduce_min_epu8 _mm_mask_reduce_min_epu16
_mm256_mask_reduce_min_epu8 _mm256_mask_reduce_min_epu16
[Clang] Generate test checks (NFC)
This test was already using generated test checks, but with minor
manual adjustments. Make it fully generated, as check lines for
metadata are supported nowadays.
[Offload][Conformance] Add README file (#155190)
This patch introduces a
README.md
file for the GPU math conformancetest suite located in
offload/unittests/Conformance
.The goal of this document is to provide clear and thorough instructions
for new users and future contributors. It covers the project's purpose,
system requirements, build and execution steps, testing methodology, and
overall architecture.
[Clang] Support generic bit counting builtins on fixed boolean vectors (#154203)
Summary:
Boolean vectors as implemented in clang can be bit-casted to an integer
that is rounded up to the next primitive sized integer. Users can do
this themselves, but since the counting bits are very likely to be used
with bitmasks like this and the generic forms are expected to be
generic it seems reasonable that we handle this case directly.
[X86] canCreateUndefOrPoisonForTargetNode - add GF2P8AFFINEINVQB / GF2P8AFFINEQB / GF2P8MULB handling (#155409)
All 3 instructions are well defined bit twiddling operations - they do
not introduce undef/poison with well defined inputs.
Fixes regressions in #152107
[NFC][SimplifyCFG] Simplify operators for the combined predicate in
mergeConditionalStoreToAddress
(#155058)This is about code readability. The operands in the disjunction forming the combined predicate in
mergeConditionalStoreToAddress
could sometimes be negated twice. This patch addresses that.2 tests needed updating because they exposed the double negation and now they don’t.
[libc++][C++03][NFC] Remove XFAILS from libcxx/test/libcxx (#155384)
We've split the implementation-specific tests into
libcxx/test/libcxx-03
, so we don't need the annotations inlibcxx/test/libcxx
anymore.[lldb][lldb-dap] parse
pathFormat
as an optional (#155238)pathFormat is an optional field in
initializeAruguments
.[libc++][C++03] Fix test/libcxx-03/system_reserved_names.gen.py (#155385)
This test only fails because it includes
<__config>
. Switch to using<__cxx03/__config>
instead to fix the issue.[libc++] Refactor key extraction for __hash_table and __tree (#154512)
This patch replaces
__can_extract_key
with an overload set to try toextract the key. This simplifies the code, since we don't need to have
separate overload sets for the unordered and associative containers. It
also allows extending the set of extraction cases more easily, since we
have a single place to define how the key is extracted.
Revert "[llvm-exegesis] Implement the loop repetition mode for AArch64" (#155423)
I see some build bot failures:
Revert #154751 while I investigate this.
[gn build] Port af1f06e
[LLDB] Re-land 'Update DIL handling of array subscripting' (#154269)
This attempts to fix the issues with the original PR (#151605), updating
the DIL code for handling array subscripting to more closely match and
handle all the casees from the original 'frame var' implementation. The
first PR did not include special-case code for objc pointers, which
apparently caused a test failure on the green-dragon buildbot. Hopefully
this PR, which includes the objc pointer special code, fixes that issue.
AMDGPU: Replace copy-to-mov-imm folding logic with class compat checks (#154501)
This strengthens the check to ensure the new mov's source class
is compatible with the source register. This avoids using the register
sized based checks in getMovOpcode, which don't quite understand
AV superclasses correctly. As a side effect it also enables more folds
into true16 movs.
getMovOpcode should probably be deleted, or at least replaced
with class check based logic. In this particular case other
legality checks need to be mixed in with attempted IR changes,
so I didn't try to push all of that into the opcode selection.
[VPlan] Add VPlan-based addMinIterCheck, replace ILV for non-epilogue. (#153643)
This patch adds a new VPlan-based addMinimumIterationCheck, which
replaced the ILV version for the non-epilogue case.
The VPlan-based version constructs a SCEV expression to compute the
minimum iterations, use that to check if the check is known true or
false. Otherwise it creates a VPExpandSCEV recipe and emits a
compare-and-branch.
When using epilogue vectorization, we still need to create the minimum
trip-count-check during the legacy skeleton creation. The patch moves
the definitions out of ILV.
PR: #153643
X86: Remove LOW32_ADDR_ACCESS_RBPRegClass (#155127)
[lldb] Underline short option letters as mnemonics (#153695)
Whenever an option would use something other than the first letter of
the long option as the short option, Jim would capitalized the letter we
picked as a mnemonic. This has often been mistaken for a typo and Jim
wondered if we should stop doing this.
During the discussion, David mentioned how this reminds him of the
underline in menu bars when holding down alt. I suggested we do
something similar in LLDB by underlying the letter in the description.
https://discourse.llvm.org/t/should-we-remove-the-capital-letter-in-option-helps/87816
s390x: pattern match saturated truncation (#155377)
Simplify min/max instruction matching by making the related
SelectionDAG operations legal.
Add patterns to match (signed and unsigned) saturated
truncation based on open-coded min/max patterns.
Fixes #153655
[clang][bytecode] Cleanup primitive descriptor ctor/dtor handling (#155401)
Use switches instead of if statements and COMPOSITE_TYPE_SWITCH and
remove some leftover move functions.
Revert "[AMDGPU] gfx1250 trans instructions bf16 codegen tests update. NFC (#155310)"
This reverts commit 43a9b66. Was causing
ninja check-llvm failures on x86 host.
Reapply "[RISCV] Add test coverage for upcoming change to zicond select lowering""
This was reverted because a previous version had check lines which didn't
match tip of tree. Looking back through my terminal history, I'm 99% sure
this was a failure to update after a pull, but the diff itself looks
suspicious like other user error. I've run ninja check-llvm on this one
multiple times. :)
[clang-format] Fix a bug in SkipMacroDefinitionBody (#155346)
All comments before the macro definition body should be skipped.
[RISCV] Add tied source operand to Zvqdotq MC instructions. (#155286)
This is consistent with what we do for integer and FP multiply
accumulate instructions.
We need new classes because normal multiply accumulate have the operands
in a different order.
[X86] Use array instead of SmallVector. NFC (#155321)
[TableGen][DecoderEmitter] Optimize single-case OPC_ExtractField (#155414)
OPC_ExtractField followed by a single OPC_FilterValue is equivalent to
OPC_CheckField. Optimize this relatively common case.
Revert "[libc++] Refactor key extraction for __hash_table and __tree (#154512)"
This reverts commit af1f06e.
This is causing some build failures in premerge as some of the LLDB
tests fail.
[gn build] Port 72c04bb
[OpenACC] Add C tests for recipe generation, fix NYI
I realized while messing with other things that I'd written all of the
recipe tests for C++, so this patch adds a bunch of tests for C mode.
The assert wasn't quite accurate (as C default init doesn't really do
anything/have an AST node), so that is corrected. Also, the lack of
cir.copy causes some of the firstprivate tests to be incomplete, so
added TODOs for that as well.
[clang-format] Use proper flags for git diff-tree (#155247)
From local testing, git diff-tree does not support three dot diffs
correctly, instead expecting the --merge-base flag to be passed along
with two commits. From my reading, the documentation
(https://git-scm.com/docs/git-diff-tree) also confirms this. This patch
updates the git-clang-format script to be correct.
I don't think we ever ran into this issue before because we never ended
up using it. For the PR code format job I believe we would just
explicitly pass the merge base, completely bypassing the problem.
[HLSL][DirectX] Add the Qdx-rootsignature-strip driver option (#154454)
This pr adds the
Qstrip-rootsignature
as aDXC
driver option.To do so, this pr introduces the
BinaryModifyJobClass
as anAction
to modify a produced object file before its final output.
Further, it registers
llvm-objcopy
as the tool to modify a producedDXContainer
on theHLSL
toolchain.This allows us to specify the
Qstrip-rootsignature
option toclang-dxc
which will invokellvm-objcopy
with a--remove-section=RTS0
argument to implement its functionality.Resolves: #150275.
[DirectX] Fix the writing of ConstantExpr GEPs to DXIL bitcode (#154446)
Fixes #153304
Changes:
ConstantExpr
GEPs to DXIL bitcode, the bitcode writerwill use the old Constant Code
CST_CODE_CE_GEP_OLD = 12
instead of thenewer
CST_CODE_CE_GEP = 32
which is interpreted as an undef in DXIL.Additional context: CST_CODE_CE_GEP = 12 in
DXC
while the same constant code is labeled CST_CODE_CE_GEP_OLD in
LLVM
PointerTypeAnalysis
to be able to analyze pointer-typedconstants that appear in the operands of instructions so that the
correct type of the
ConstantExpr
GEP is determined and written intothe DXIL bitcode.
PointerTypeAnalysis
test and dxil-dis test to ensure that thepointer type of
ConstantExpr
GEPs are resolved andConstantExpr
GEPsare written to DXIL bitcode correctly
In addition, this PR also adds a missing call to
GV.removeDeadConstantUsers()
in the DXILFinalizeLinkage pass, andremoves an unnecessary manual removal of a ConstantExpr in the
DXILFlattenArrays pass.
[clang-tidy][test] Make check_clang_tidy.py work with very long file paths (#155318)
http://github.com/llvm/llvm-project/pull/95220 added a test with a very
long file path, which can fail if run on Windows with a long directory
path.
On Windows, there are file path length limits, which can be worked
around by prefixing the (absolute) path with '\?':
https://learn.microsoft.com/en-us/windows/win32/fileio/maximum-file-path-limitation
Co-authored-by: Reid Kleckner rnk@google.com
[AArch64] AArch64TargetLowering::computeKnownBitsForTargetNode - add support for AArch64ISD::MOV/MVN constants (#154039)
Add support for the following constant nodes in
AArch64TargetLowering::computeKnownBitsForTargetNode
:Also add
AArch64TargetLowering::computeKnownBitsForTargetNode
testsfor all the MOVI constant nodes in
llvm/unittests/Target/AArch64/AArch64SelectionDAGTest.cpp
Fixes: #153159
Co-authored-by: Simon Pilgrim llvm-dev@redking.me.uk
[Interpreter] Fix a warning
This patch fixes:
clang/lib/Interpreter/IncrementalAction.h:37:21: error: private
field 'CI' is not used [-Werror,-Wunused-private-field]
[NFC][DirectX] Fix variable set but not used warning (#155445)
[compiler-rt] Fix a warning
This patch fixes:
compiler-rt/lib/asan/tests/asan_test.cpp:398:27: error: allocation
of insufficient size '0' for type 'int' with size '4'
[-Werror,-Walloc-size]
[NFC][MC][XCore] Rearrange decoder functions for XCore disassembler (#155009)
Rearrange decode functions to be before including the generated
disassembler code and eliminate forward declarations for most of them.
This is possible because
fieldFromInstruction
is now in MCDecoder.hand not in the generated disassembler code.
[flang] optimize
sind
precision (#155429)Part of #150452.
[NFC][MC][ARM] Rearrange decode functions in ARM disassembler (#154988)
Move
tryAddingSymbolicOperand
andtryAddingPcLoadReferenceComment
tobefore including the generated disassembler code. This is in preparation
for rearranging the decoder functions to eliminate forward declarations.
[LV] Remove unused ILV::VectorTripCount (NFC).
The field is no longer used, remove it.
[NFC][Asan] Fix warning in test (#155447)
After #150028.
Warning:
[CI] Save sccache logs (#155444)
This patch saves the sccache logs to the artifacts. If sccache dies and
the server prints logs, we currently do not collect them anywhere and
they do not get dumped to STDOUT/STDERR. If the process is directly
getting killed (SIGTERM), it seems like it doesn't dump anything, but in
most other cases we should be able to see something.
Related to #155442.
[NFC][DirectX] Fix build failure (#155441)
Add
BinaryFormat
toLINK_COMPONENTS
to fix the following linkererror:
Root cause: #154249 changed a
header-only dependency to a real dependency without noticing that the
dependency was missing in CMakeLists.txt
Bitcode: Stop combining function alignments into MaxAlignment.
MaxAlignment is used to produce the abbreviation for MODULE_CODE_GLOBALVAR
and is not used for anything related to function alignments, so stop
combining function alignments and rename it to make its purpose clearer.
Reviewers: teresajohnson
Reviewed By: teresajohnson
Pull Request: #155341
[SCEV] Try to push op into ZExt: C * zext (A + B) -> zext (AC + BC) (#155300)
Try to push constant multiply operand into a ZExt containing an add, if
possible. In general we are trying to push down ops through ZExt if
possible. This is similar to
#151227 which did the same for
additions.
For now this is restricted to adds with a constant operand, which is
similar to some of the logic above.
This enables some additional simplifications.
Alive2 Proof: https://alive2.llvm.org/ce/z/97pbSL
PR: #155300
[CIR] Add VTTAddrPointOp (#155048)
This adds the definition, verification, and lowering for CIR's
VTTAddrPointOp. This is a bit ahead of the current codegen
implementation, which doesn't yet have support for emitting VTT
definitions, but since this doesn't depend on any of the other work in
progress, it is being upstreamed in advance.
[RISCV][VLOPT] Update vl-opt-op-info.mir test with extra COPYs. NFC
[Transforms] Allow non-regex Source in SymbolRewriter in case of using ExplicitRewriteDescriptor (#154319)
Do not check that Source is a valid regex in case of Target (explicit)
transformation. Source may contain special symbols that may cause an
incorrect
invalid regex
error.Note that source and exactly one of [Target, Transform] must be
provided.
Target (explicit transformation)
: In this kind of ruleSource
istreated as a symbol name and is matched in its entirety.
Target
fieldwill denote the symbol name to transform to.
Transform (pattern transformation)
: This rule treatsSource
as aregex that should match the complete symbol name.
Transform
is a regexspecifying the name to transform to.
[MLIR] Apply clang-tidy fixes for modernize-use-using in IRCore.cpp (NFC)
[MLIR] Apply clang-tidy fixes for performance-move-const-arg in IRCore.cpp (NFC)
[MLIR] Apply clang-tidy fixes for readability-identifier-naming in IRCore.cpp (NFC)
[clang] NFC: introduce Type::getAsEnumDecl, and cast variants for all TagDecls (#155463)
And make use of those.
These changes are split from prior PR #155028, in order to decrease the
size of that PR and facilitate review.
[Flang] Fix BUILD_SHARED_LIBS build (#155422)
In contrast to linking a static library, when linking a shared library
all referenced symbols must be available in either the objects files,
static libraries, or shared libraries passed to the linker command line
and cannot be deferred to when building the executable.
Fixes #150027
Same fix as included in #152223, but with only the changes necessary to
fix #150027 (which is unrelated to GCC 15)
[NFC][WPD] code style fixes (#155454)
Revert "[CI] Save sccache logs (#155444)"
This reverts commit c81cc9f.
This is causing premerge failures and needs more testing.
[ARM] Update a number of MVE tests to use -cost-kind=all. NFC
[gn build] Disable objc rewriter (#155479)
This is off by default in the CMake build:
llvm-project/clang/CMakeLists.txt
Line 441 in b90f4ff
[AMDGCN] Add missing gfx1250 clang tests. NFC. (#155478)
Remove trailing whitespace in DiagnosticSemaKinds.td. NFC (#155482)
[clang][PAC] Fix builtins that claim address discriminated types are bitwise compatible (#154490)
A number of builtins report some variation of "this type is compatibile
with some bitwise equivalent operation", but this is not true for
address discriminated values. We had address a number of cases, but not
all of them. This PR corrects the remaining builtins.
Fixes #154394
[mlir][acc] Add destroy region to reduction recipes (#155480)
Reduction recipes capture how a private copy is created. In some
languages, like C++ class variables with destructors - that private copy
also must be properly destroyed. Thus update the reduction recipe to
contain a
destroy
region similarly to the private recipes.[hwasan] Add hwasan-static-linking option (#154529)
Discarding the
.note.hwasan.globals
section in ldscript causes alinker error, since
hwasan_globals
refers to the discarded section.The issue comes from
hwasan.dummy.global
being associated via metadatawith
.note.hwasan.globals
.Add a new
-hwasan-static-linking
option to skip inserting.note.hwasan.globals
for static binaries, as it is only needed forinstrumenting globals from dynamic libraries. In static binaries, the
global variables section can be accessed directly via the
__start_hwasan_globals
and__stop_hwasan_globals
symbols inserted bythe linker.
[lldb] Do not use LC_FUNCTION_STARTS data to determine symbol size as symbols are created (#155282)
Note: This is a resubmission of #106791. I had to revert this a year ago
for a failing test that I could not understand. I have time now to try
and get this in again.
Summary:
This improves the performance of ObjectFileMacho::ParseSymtab by
removing eager and expensive work in favor of doing it later in a
less-expensive fashion.
Experiment:
My goal was to understand LLDB's startup time.
First, I produced a Debug build of LLDB (no dSYM) and a
Release+NoAsserts build of LLDB. The Release build debugged the Debug
build as it debugged a small C++ program. I found that
ObjectFileMachO::ParseSymtab accounted for somewhere between 1.2 and 1.3
seconds consistently. After applying this change, I consistently
measured a reduction of approximately 100ms, putting the time closer to
1.1s and 1.2s on average.
Background:
ObjectFileMachO::ParseSymtab will incrementally create symbols by
parsing nlist entries from the symtab section of a MachO binary. As it
does this, it eagerly tries to determine the size of symbols (e.g. how
long a function is) using LC_FUNCTION_STARTS data (or eh_frame if
LC_FUNCTION_STARTS is unavailable). Concretely, this is done by
performing a binary search on the function starts array and calculating
the distance to the next function or the end of the section (whichever
is smaller).
However, this work is unnecessary for 2 reasons:
is usually stored right after the function's entry. Performing this work
right before parsing the next entry is unnecessary work.
in
Symtab::InitAddressIndexes
after all the symbols are added to theSymtab. It also does this more efficiently by walking over a list of
symbols sorted by address, so the work to calculate the size per symbol
is constant instead of O(log n).
[IA][RISCV] Recognize interleaving stores that could lower to strided segmented stores (#154647)
This is a sibling patch to #151612: passing gap masks to the renewal TLI
hooks for lowering interleaved stores that use shufflevector to do the
interleaving.
NFC: remove some instances of deprecated capture (#154884)
Co-authored-by: Jeremy Kun j2kun@users.noreply.github.com
[LV] Remove unneeded ILV::LoopScalarPreHeader (NFC).
Follow-up suggested in #153643.
Remove some more global state by directly returning the scalar
preheader from createScalarPreheader.
[AArch64] Add another switch clustering test with power-of-2 constants.
Adds more test coverage for
#139736.
[AMDGPU] wmma_scale* IR verification (#155493)
[DAG] ComputeNumSignBits - ISD::EXTRACT_ELEMENT needs to return at least 1 (#155455)
When going through the ISD::EXTRACT_ELEMENT case,
KnownSign - rIndex * BitWidth
could produce a negative. When a negative is produced, the lower bound
of the
std::clamp
is returned. Change that lower bound to one to avoidpotential underflows, because the expectation is that
ComputeNumSignBits
should always return at least 1.
Fixes #155452.
[AMDGPU] Do not assert on non-zero COMPUTE_PGM_RSRC3 on gfx1250. NFCI (#155498)
COMPUTE_PGM_RSRC3 does exist on gfx1250, we are just not using it yet.
[NFC][Asan] Remove volatile from test
After #155447.
It's not needed, but does not compile on PowerPC.
Reapply "[compiler-rt] Remove %T from shared object substitutions (#155302)"
This reverts commit 1d3c302.
There were three test failures:
odr-violation.cpp - Attempted to fix by keeping everything in the same
folder.
interception-in-shared-lib-test.cpp - Tried folding comments to preserve
line numberings. Almost seems like a debug info issue on PPC.
odr_c_test.c - Attempted to fix by keeping everything in the same
folder.
[lldb] Corretly parse Wasm segments (#154727)
My original implementation for parsing Wasm segments was wrong in two
related ways. I had a bug in calculating the file vm address and I
didn't fully understand the difference between active and passive
segments and how that impacted their file vm address.
With this PR, we now support parsing init expressions for active
segments, rather than just skipping over them. This is necessary to
determine where they get loaded.
Similar to llvm-objdump, we currently only support simple opcodes (i.e.
constants). We also currently do not support active segments that use a
non-zero memory index. However this covers all segments for a
non-trivial Swift binary compiled to Wasm.
[flang][openacc] Only generate acc.terminator in compute construct (#155504)
When the end of a block is inside a data region (not a compute region),
generating an
acc.terminator
will lead to a missing terminator whentranslating to LLVM.
Only generate acc.terminator instead of fir.unreachable when nested in
acc compute region.
[MLIR] Apply clang-tidy fixes for performance-unnecessary-value-param in IRTypes.cpp (NFC)
Revert "[AArch64] AArch64TargetLowering::computeKnownBitsForTargetNode - add support for AArch64ISD::MOV/MVN constants" (#155503)
Reverts #154039, as it breaks bots.
[lldb] Adding structured types for existing MCP calls. (#155460)
This adds or renames existing types to match the names of the types on
https://modelcontextprotocol.io/specification/2025-06-18/schema for the
existing calls.
The new types are used in the unit tests and server implementation to
remove the need for crafting various
llvm::json::Object
values byhand.
[ProfCheck] Exclude new LoopVectorize Test (#155502)
[MLIR][LLVMIR][DLTI] Pass to update #llvm.target's features per relevant backend (#154938)
Modifies
#llvm.target<..., features = $FEATURES>
so that$FEATURES
is now an
#llvm.target_features<[...]>
attribute (rather than aStringAttr
). This enables the attribute to respond to DLTI queries forthe different target features.
The pass updates the
$FEATURES
attribute of the target attr at namellvm.target
in accordance with the (Sub)Target's features that therelevant LLVM backend knows about.
DEMO:
by way of
-llvm-target-to-target-features
turns into:[flang] Consolidate copy-in/copy-out determination in evaluate framework (#151408)
New implementation of
MayNeedCopy()
is used to consolidatecopy-in/copy-out checks.
IsAssumedShape()
andIsAssumedRank()
were simplified and are bothnow in
Fortran::semantics
workspace.preparePresentUserCallActualArgument()
in lowering was modified to useMayNeedCopyInOut()
Fixes #138471
[clang] Fix clang module build by declaring new textual header (#155510)
Add
clang/Basic/ABIVersions.def
introduced in #151995 to textualheader
to fix clang module build.
[fuzzer][Fuchsia] Forward fix for undefined StartRssThread (#155514)
The declaration was static when it shouldn't be since it can be defined
in FuzzerUtilFuchsia.cpp
Support: Add proxies for raw_ostream and raw_pwrite_stream (#113362)
Add proxies classes for
raw_ostream
andraw_pwrite_stream
calledraw_ostream_proxy
andraw_pwrite_stream_proxy
. Add adaptor classes,raw_ostream_proxy_adaptor<>
andraw_pwrite_stream_proxy_adaptor<>
,to allow subclasses to use a different parent class than
raw_ostream
or
raw_pwrite_stream
.The adaptors are used by a future patch to help a subclass of
llvm::vfs::OutputFile
, an abstract subclass ofraw_pwrite_stream
, toproxy a
raw_fd_ostream
.Patched by dexonsmith.
[gn build] Port 90670b5
[libc][NFC] Clean up utimes and setsid (#155495)
Simplify utims a bit and add proper error handling to setsid as
described in the standard
[NFC][MC][XCore] Eliminate forward decls by rearranging functions (#155456)
Revert "Reapply "[compiler-rt] Remove %T from shared object substitutions (#155302)""
This reverts commit 7624197.
This is causing more buildbot failures that probably need some offline
investigation:
[NFC][WPD] Pass the module analysis manager instead of lambdas (#155338)
Easier to evolve - if we need more analyses, it becomes clumsy to keep passing around lambdas.
Reapply "[AMDGPU] gfx1250 trans instructions bf16 codegen tests update. NFC (#155310)" (#155515)
[CIR] Add support for initializing classes with multiple vtables (#155275)
This adds support for initializing the vptr members in a class that
requires multiple vtables because of multiple inheritence. This still
does not handle virtual bases.
[CI] Strip strings from filenames in compute_projects.py (#155519)
This can otherwise mess up some of the path detection logic,
particularly around ensuring the premerge checks are run when the
workflow YAML file is changed.
[RISCV][TTI] Implement getAddressComputationCost() in RISCV TTI. (#149955)
This patch implements the
getAddressComputationCost()
in RISCV TTIwhich
make the gather/scatter with address calculation more expansive that
stride cost.
Note that the only user of
getAddressComputationCost()
with vectortype is in
VPWidenMemoryRecipe::computeCost()
. So this patch make someLV tests changes.
I've checked the tests changes in LV and seems those changes can be
divided into two groups.
masked.load.
[lldb-dap] Improving lldbdap_testcase.py error diagnosability (#155352)
Improved response Message handling in lldbdap_testcase.py to handle
various formats. Allows for more descriptive error messaging (Provides
useful info even when error details are malformed)
Co-authored-by: Piyush Jaiswal piyushjais@meta.com
[orc-rt] Fix comment typos in unit tests. NFC.
[lld][WebAssembly] -r: force -Bstatic (#108264)
This is a port of a recent ELF linker change: 8cc6a24.
[AMDGPU] Set GRANULATED_WAVEFRONT_SGPR_COUNT of compute_pgm_rsrc1 to 0 for gfx10+ (#154666)
According to
llvm-project/llvm/docs/AMDGPUUsage.rst::L5212
theGRANULATED_WAVEFRONT_SGPR_COUNT
, which iscompute_pgm_rsrc1[6:9]
hasto be 0 for gfx10+ arch
Co-authored-by: Matt Arsenault Matthew.Arsenault@amd.com
Revert "[RISCV][TTI] Implement getAddressComputationCost() in RISCV TTI." (#155535)
Reverts #149955
Reapply "[CI] Save sccache logs (#155444)" (#155520)
This reverts commit b90f4ff.
Relands the change after making the relevant fixes (not missing the
artifacts
directory).AMDGPU: Fold mov imm to copy to av_32 class (#155428)
Previously we had special case folding into copies to AGPR_32,
ignoring AV_32. Try folding into the pseudos.
Not sure why the true16 case regressed.
[NFC] [clangd] [C++20 Modules] Add a warning if clangd detected multiple
source declares the same module
Now clangd assumes no duplicated module declared by different source
file in a sinlge project. But in practice, it may not be the case.
Although we can't fix it now, emitting a warning is helpful for users to
understand what's going on.
[DAGCombiner] Avoid double deletion when replacing multiple frozen/unfrozen uses (#155427)
Closes #155345.
In the original case, we have one frozen use and two unfrozen uses:
In
DAGCombiner::visitFREEZE
, we replace all uses oft18
witht59
.After updating the uses,
t59: i8 = freeze t18
will be updated tot59: i8 = freeze t59
(AddModifiedNodeToCSEMaps
) and CSEed intot80: i8 = freeze t59
(ReplaceAllUsesWith
). As the previous call toAddModifiedNodeToCSEMaps
already removedt59
from the CSE map,ReplaceAllUsesWith
cannot removet59
again.For clarity, see the following call graph:
This patch unfreezes all the uses first to avoid triggering CSE when
introducing cycles.
[clang][HeuristicResolver] Resolve explicit object parameter to enclosing record type (#155143)
Heuristically resolve the type of a
this auto
parameter to the record typein the declaration.
Fixes clangd/clangd#2323
[flang][acc] Fix the indexing of the reduction combiner for multidimensional static arrays (#155536)
In the following example of reducing a static 2D array, we have
incorrect coordinates for array access in the reduction combiner. This
PR reverses the order of the induction variables used for such array
indexing. For other cases of static arrays, we reverse the loop order as
well so that the innermost loop can handle the innermost dimension.
Currently, we have:
We'll obtain:
[Github][CI] Install the correct binary of sccache on aarch64 (#155328)
[LoongArch][NFC] Pre-commit for BR_CC and SELECT_CC optimization (#151788)
AMDGPU: Remove unused argument from adjustAllocatableRegClass (#155554)
[RISCV] Lower (setugt X, 2047) as (setne (srl X, 11), 0) (#155541)
This matches 4095 and other pow2-1 constants larger simm12. We normally
do this through a DAGCombine controlled by isLegalICmpImmediate. 2047 is
considered a legal immediate because we have a setult instruction. In
this case we have setugt which isn't natively supported.
I added tests for 4095 for comparison.
[orc-rt] Add bind_front, a pre-c++-20 std::bind_front substitute. (#155557)
This can be used until the ORC runtime is able to move to c++-20.
Also adds a CommonTestUtils header with a utility class, OpCounter, that
counts the number of default constructions, copy constructions and
assignments, move constructions and assignments, and destructions. This
is used to test that orc_rt::bind_front doesn't introduce unnecessary
copies / moves.
AMDGPU: Remove special case of SGPR_LO class in imm folding (#155518)
Previous change accidentally broke this which shows it's not
doing anything.
[libc][math][c++23] Add {modf,remainder,remquo}bf16 math functions (#154652)
This PR adds the following basic math functions for BFloat16 type along
with the tests:
Signed-off-by: Krishna Pandey kpandey81930@gmail.com
Co-authored-by: OverMighty its.overmighty@gmail.com
[RISCV] Group Zcf and Zcd instructions and CompressPats together. NFC (#155555)
Instead of repeatedly changing Predicates for each instruction.
[CodeGen] Optimize/simplify finalizeBundle. NFC (#155448)
When tracking defs in finalizeBundle two sets are used. LocalDefs is
used to track defined virtual and physical registers, while LocalDefsP
is used to track defined register units for the physical registers.
This patch moves the updates of LocalDefsP to only iterate over regunits
when a new physical register is added to LocalDefs. When the physical
register already is present in LocalDefs, then the corresponding
register units are present in LocalDefsP. So it was a waste of time to
add them to the set again.
[mlir][amx] Direct AMX data transfers (#154114)
Extends Vector to AMX conversion to attempt populating AMX tiles
directly from memory.
When possible, contraction producers and consumers are replaced by AMX
tile data transfer operations. This shortens data path by skipping
intermediate register loads and stores.
Add tools needed by build_symbolizer.sh to runtime deps when internal symbolizer enabled. (#153723)
[mlir][Transforms] Dialect conversion: Context-aware type conversions (#140434)
This commit adds support for context-aware type conversions: type
conversion rules that can return different types depending on the IR.
There is no change for existing (context-unaware) type conversion rules:
There is now an additional overload to register context-aware type
conversion rules:
For performance reasons, the type converter caches the result of type
conversions. This is no longer possible when there context-aware type
conversions because each conversion could compute a different type
depending on the context. There is no performance degradation when there
are only context-unaware type conversions.
Note: This commit just adds context-aware type conversions to the
dialect conversion framework. There are many existing patterns that
still call
converter.convertType(someValue.getType())
. These should begradually updated in subsequent commits to call
converter.convertType(someValue)
.Co-authored-by: Markus Böck markus.boeck02@gmail.com
[BOLT][AArch64] Fix another cause of extra entry point misidentification (#155055)
[PowerPC] ppc64-P9-vabsd.ll - update v16i8 abdu test now that it vectorizes in the middle-end (#154712)
The scalarized IR was written before improvements to SLP / cost models
ensured that the abs intrinsic was easily vectorizable
opt -O3 : https://zig.godbolt.org/z/39T65vh8M
Now that it is we need a more useful llc test
[AMDGPU] Refactor insertWaveSizeFeature (#154850)
If a wavefrontsize32 or wavefrontsize64 is the only possible value
insert it into feature list by default and use that value as an
indication that another wavefront size is not legal.
s390x: optimize 128-bit fshl and fshr by high values (#154919)
Turn a funnel shift by N in the range
121..128
into a funnel shift inthe opposite direction by
128 - N
. Because there are dedicatedinstructions for funnel shifts by values smaller than 8, this emits
fewer instructions.
This additional rule is useful because LLVM appears to canonicalize
fshr
intofshl
, meaning that the rules forfshr
on values lessthan 8 would not match on organic input.
[clang] Post-commit review for #150028 (#155351)
std::nullopt
instead of{}
.[ASan] Prevent assert from scalable vectors in FunctionStackPoisoner. (#155357)
This has recently started causing 'Invalid size request on a scalable
vector.'
[LV] Add test for vectorisation of SAXPY unrolled by 5 (NFC). (#153039)
This test contains a vectorisation example of a loop based on SAXPY
manually unrolled by five, as discussed in #148808.
[Flang-RT][OpenMP] Define _GLIBCXX_NO_ASSERTIONS (#155440)
Since GCC 15.1, libstdc++ enabled assertions/hardening by default in
non-optimized (-O0) builds [1]. That is, _GLIBCXX_ASSERTIONS is defined
in the libstdc++ headers itself so defining/undefining it on the
compiler command line no longer has an effect in non-optimized builds.
As the commit message[2] suggests, define _GLIBCXX_NO_ASSERTIONS
instead.
For libstdc++ headers before 15.1, -U_GLIBCXX_ASSERTIONS still has to be
on the command line as well.
Defining _GLIBCXX_NO_ASSERTIONS was previously proposed in #152223
[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112808
[2] gcc-mirror/gcc@361d230
[AArch64][SME] Simplify initialization of the TPIDR2 block (#141049)
This patch updates the definition of
AArch64ISD::INIT_TPIDR2OBJ
totake the number of save slices (which is currently always all ZA
slices). Using this, we can initialize the TPIDR2 block with a single
STP of the save buffer pointer and the number of save slices. The
reserved bytes (10-15) will be implicitly zeroed as the result of RDSVL
will always be <= 16-bits.
Note: We used to write the number of save slices to the TPIDR2 block
before every call with a lazy save; however, based on 6.6.9 "Changes to
the TPIDR2 block" in the aapcs64
(https://github.com/ARM-software/abi-aa/blob/main/aapcs64/aapcs64.rst#changes-to-the-tpidr2-block),
it seems we can rely on callers preserving the contents of the TPIDR2 block.
[AMDGPU] More radical feature initialization refactoring (#155222)
Factoring in flang, just have a single fillAMDGPUFeatureMap
function doing it all as an external interface and returing
an error.
[mlir] Consistently add TableGen generated files as deps to
mlir-headers
/mlir-generic-headers
CMake targets (#155474)Tool targets like
mlir-opt
rely on themlir-headers
ormlir-generic-headers
targets to run first to generate headers.However, many of the
IncGen
targets are not specified as dependenciesof the header targets in CMake, which causes spurious build failures when
using a high number of parallel build jobs.
Thus, this commit introduces a pair of new CMake macros
add_mlir_dialect_tablegen_target
andadd_mlir_generic_tablegen_target
toAddMLIR.cmake
, which can be used in place ofadd_public_tablegen_target
toensure (by convention) that
IncGen
targets are added to themlir-headers
(resp.
mlir-generic-headers
) target dependencies.Most uses of
add_public_tablegen_target
in the dialects have beenrefactored to use the new macros.
[MLIR] Adopt LDBG() in EliminateBarriers.cpp (NFC) (#155092)
Also add an extra optional TYPE argument to the LDBG() macro to make it
easier to punctually overide DEBUG_TYPE.
[KeyInstr] Enable -gkey-instructions by default if optimisations are enabled (#149509)
That's enabling Clang's -gkey-instructions, cc1's -gkey-instructions
remains off by default.
Key Instructions improves the optimized-code debug-stepping experience
in debuggers that use DWARF's
is_stmt
line table register to determinestepping behaviour.
The feature can be disabled with -gno-key-instructions (note that the
positive and negative flag both imply -g).
RFC:
https://discourse.llvm.org/t/rfc-improving-is-stmt-placement-for-better-interactive-debugging/82668
[Bazel] Add missing Support dep to VectorToAMX (#155576)
[MLIR] Migrate Transform/IR/TransformOps.cpp to LDBG() debugging macro (NFC) (#155098)
[clang] AST: fix getAs canonicalization of leaf types (#155028)
[GlobalISel] Add support for scalarizing vector insert and extract elements (#153274)
This Adds scalarization handling for fewer vector elements of insert and
extract, so that i128 and fp128 types can be handled if they make it
past combines. Inserts are unmerged with the inserted element added to
the remerged vector, extracts are unmerged then the correct element is
copied into the destination. With a non-constant vector the usual stack
lowering is used.
[MLIR] Apply clang-tidy fixes for performance-unnecessary-value-param in Pass.cpp (NFC)
[Bazel] Add missing SCFTransforms dep to TestDialect (#155581)
[MLIR] Apply clang-tidy fixes for llvm-include-order in RegisterEverything.cpp (NFC)
[mlir][linalg] Produce canonical linalg.generic for im2col (#134675)
Before this patch, the Img2Col transform produced a non-canonical
linalg.generic whose input tensor was not reported in the inputs of the
operation: instead, it was accessed manually from inside the op body,
after an internal calculation of the access offsets. This patch modifies
the Im2Col rewrite to produce a canonical linalg.generic whose input is
correctly reported in its 'ins()', whose access offsets are computed
through an indexing map, and whose body contains only a 'linalg.yield'
op.
Signed-off-by: Fabrizio Indirli Fabrizio.Indirli@arm.com
Co-authored-by: Georgios Pinitas georgios.pinitas@arm.com
[clang][bytecode] Handle vector assignments (#155573)
[clang-repl] Put CompilerInstance fr