[mlir][amx] Add write side effect to AMX tile creation ops #155403

arun-thmn · 2025-08-26T12:31:21Z

Adds MemWrite side effect to amx.tile_zero and amx.tile_load ops.

Memory write models hardware populating AMX tiles with specified values through tile zero and load ops.
Making the side effect explicit allows to use multiple op instances as a compilation hint to use different AMX tile registers. This can prevent less efficient lowering through tile store-load copies compared to directly populating tiles with values.

To illustrate the trade off:
Without explicit side effects, CSE optimizes two amx.tile_zero into a single op which lowers to a copy for the second tile:

  tilezero        %tmm0
  tilestored      %tmm0, -2032(%rbp,%rbx) # 1024-byte Folded Spill
  tileloadd       -2032(%rbp,%rbx), %tmm1 # 1024-byte Folded Reload

By keeping the two amx.tile_zero ops and, thus, lowering to two separate intrinsic invocations, the two tile registers are zeroed out directly without the additional round trip through memory:

  tilezero        %tmm0
  tilezero        %tmm1

The same principle applies to amx.tile_load ops.

llvmbot · 2025-08-26T12:45:55Z

@llvm/pr-subscribers-mlir

@llvm/pr-subscribers-mlir-amx

Author: Arun Thangamani (arun-thmn)

Changes

Updating amx.tile_zero with memory effect (MemWrite).

CSE optimizes multiple amx.tile_zero into one amx.tile_zero. It is okay for reads but for writes to multiple amx.tile_zero the CSE optimized code leads to a spill.

Below is the example IR, where four amx.tile_zero get CSE optimized into one amx.tile_zero but it lowers to assembly with three spills. Adding, memory effect (MemWrite) will avoid CSE if there is a write.

    tilezero        %tmm0
    tilestored      %tmm0, -2032(%rbp,%rbx) # 1024-byte Folded Spill
    tileloadd       -2032(%rbp,%rbx), %tmm3 # 1024-byte Folded Reload
    tilestored      %tmm0, -3056(%rbp,%rbx) # 1024-byte Folded Spill
    tileloadd       -3056(%rbp,%rbx), %tmm2 # 1024-byte Folded Reload
    tilestored      %tmm0, -4080(%rbp,%rbx) # 1024-byte Folded Spill
    tileloadd       -4080(%rbp,%rbx), %tmm1 # 1024-byte Folded Reload

Full diff: https://github.com/llvm/llvm-project/pull/155403.diff

1 Files Affected:

(modified) mlir/include/mlir/Dialect/AMX/AMX.td (+3-2)

diff --git a/mlir/include/mlir/Dialect/AMX/AMX.td b/mlir/include/mlir/Dialect/AMX/AMX.td
index 6bbde43e2d011..91fe1bfb5cd35 100644
--- a/mlir/include/mlir/Dialect/AMX/AMX.td
+++ b/mlir/include/mlir/Dialect/AMX/AMX.td
@@ -142,8 +142,9 @@ class AMX_Op<string mnemonic, list<Trait> traits = []> :
 // Tile reset.
 //
 
-def TileZeroOp : AMX_Op<"tile_zero", [Pure,
-    AMXIntrinsicOpInterface
+def TileZeroOp : AMX_Op<"tile_zero", [
+    AMXIntrinsicOpInterface,
+    MemoryEffects<[MemWrite]>
   ]> {
   let summary = "tile zero operation";
   let description = [{

arun-thmn · 2025-08-26T12:49:42Z

@adam-smnk @rengolin Please review this PR to avoid CSE on amx.tile_zero if there is a write.

rengolin · 2025-08-26T13:01:01Z

Can you add a test? Runs CSE and make sure the tile_zero doesn't get removed.

What happens if there isn't a write?

adam-smnk · 2025-08-26T13:04:23Z

Can you add a test? Runs CSE and make sure the tile_zero doesn't get removed.

Not sure if that in particular needs testing. It's a generic feature of memory effect traits.
To really test the desired behavior, we'd need to pipe it all the way to llc IMO.

arun-thmn · 2025-08-26T13:07:05Z

What happens if there isn't a write?

For no writes the CSE should optimize the amx.tile_zero as we added memory effect only for MemWrite

rengolin · 2025-08-26T13:15:20Z

Can you add a test? Runs CSE and make sure the tile_zero doesn't get removed.

Not sure if that in particular needs testing. It's a generic feature of memory effect traits. To really test the desired behavior, we'd need to pipe it all the way to llc IMO.

Just that IR on the description through cse and show that only three are left?

adam-smnk · 2025-08-26T14:26:03Z

Can you add a test? Runs CSE and make sure the tile_zero doesn't get removed.

Not sure if that in particular needs testing. It's a generic feature of memory effect traits. To really test the desired behavior, we'd need to pipe it all the way to llc IMO.

Just that IR on the description through cse and show that only three are left?

Fair enough, doesn't hurt to have extra coverage.

@arun-thmn Could you add a new test file under test/Dialect/AMX for this side effect testing?
For consistency, tile_load should also have MemWrite to model side effect of writing into an AMX tile.
And the op descriptions should mention the updated semantics.

rengolin · 2025-08-27T12:38:28Z

This is a variation of https://discourse.llvm.org/t/bug-in-operationequivalence-breaks-cse-on-linalg-index/85773/43.

Here, each tile_zero op is meant to reserve a new register that will be used in parallel on the main FMA loop. When CSE runs without side-effects, 4 calls get collapsed into 1 and all FMAs use the same "tile".

There are two main lowering strategies here, both bad:

(current) one tile_zero + 3 register copies. This is bad because AMX has no move instruction, so you need store+load for the other three.
(possible) one tile_zero and all 4 FMAs run on the same register, spilling every time. This is equivalent to the one above because of the memory semantics of AMX.

There's a potential "fix" in LLVM's back-end for AMX, to simplify zero+store+load into two zeros. (@phoebewang), but I'm not sure what else this could break.

Currently, MemWrite side-effect is "wrong" for tile_zero, but there's no other way to represent register side-effects. Ultimately, we want tile_zero to "reserve a new concurrent SSA value", so InitValue side-effect of some sort would apply, which currently doesn't exist.

My advice here is to use MemWrite for now on both tile_zero and tile_load as @adam-smnk mentions, and discuss the other side-effect in the forums, then update the ops with the result of that discussion.

phoebewang · 2025-08-27T13:04:39Z

There's a potential "fix" in LLVM's back-end for AMX, to simplify zero+store+load into two zeros. (@phoebewang), but I'm not sure what else this could break.

We have improved similar case before: 6753eb0

arun-thmn · 2025-08-28T04:50:23Z

With MemoryEffects<[MemWrite] the CSE is restricted even for reads - I guess. For the below example, %0 and %1 doesn't get CSE to one amx.tile_zero. Now, I'm not very sure that MemoryEffects<[MemWrite] can be a valid fix.

Example:
%alloca = memref.alloca() : memref<16x32xf32>
%0 = amx.tile_zero : !amx.tile<16x16xf32>
%1 = amx.tile_zero : !amx.tile<16x16xf32>
amx.tile_store %alloca[%c0, %c0], %0 : memref<16x32xf32>, !amx.tile<16x16xf32>
amx.tile_store %alloca[%c0, %c16], %1 : memref<16x32xf32>, !amx.tile<16x16xf32>

adam-smnk · 2025-08-28T10:50:36Z

With MemoryEffects<[MemWrite] the CSE is restricted even for reads - I guess.

Good 👍

For the below example, %0 and %1 doesn't get CSE to one amx.tile_zero. Now, I'm not very sure that MemoryEffects<[MemWrite] can be a valid fix.

Example: %alloca = memref.alloca() : memref<16x32xf32> %0 = amx.tile_zero : !amx.tile<16x16xf32> %1 = amx.tile_zero : !amx.tile<16x16xf32> amx.tile_store %alloca[%c0, %c0], %0 : memref<16x32xf32>, !amx.tile<16x16xf32> amx.tile_store %alloca[%c0, %c16], %1 : memref<16x32xf32>, !amx.tile<16x16xf32>

I think it is in line with the updated semantics we want to achieve. Each tile load/zero serves as a hint to use a separate tile register. User will have to be a bit more deliberate when creating AMX ops and not rely as much on an automatic cleanup.

I'd suggest to move forward with MemWrite to model side effects of tile zero and load.
Please also update op descriptions to explain the new intended behavior/usage.

arun-thmn · 2025-08-29T08:03:56Z

Updated MemEffects for tileload + added a test-case ensuring CSE doesn't fuse multiple tilezero and tileload

mlir/test/Dialect/AMX/memory-effects.mlir

mlir/include/mlir/Dialect/AMX/AMX.td

adam-smnk

@rengolin Any more comments?

mlir/test/Dialect/side-effects.mlir

rengolin · 2025-08-29T11:21:57Z

@rengolin Any more comments?

Nope, all good, thanks!

adding memeroy side-effects to amx.tile_zero

3138ca8

arun-thmn marked this pull request as ready for review August 26, 2025 12:45

arun-thmn requested review from aartbik and dcaballe as code owners August 26, 2025 12:45

llvmbot added mlir mlir:amx labels Aug 26, 2025

updating the description + adding a test-case

c1d4259

adam-smnk reviewed Aug 29, 2025

View reviewed changes

mlir/test/Dialect/AMX/memory-effects.mlir Outdated Show resolved Hide resolved

mlir/test/Dialect/AMX/memory-effects.mlir Outdated Show resolved Hide resolved

mlir/include/mlir/Dialect/AMX/AMX.td Outdated Show resolved Hide resolved

updating comments + name correction + description updates

c2d068d

adam-smnk reviewed Aug 29, 2025

View reviewed changes

mlir/test/Dialect/side-effects.mlir Outdated Show resolved Hide resolved

mlir/test/Dialect/side-effects.mlir Outdated Show resolved Hide resolved

moved tests to AMX folder

fbf578e

adam-smnk approved these changes Aug 29, 2025

View reviewed changes

adam-smnk changed the title ~~[mlir][AMX] Memory effects to amx.tile_zero~~ [mlir][amx] Add write side effect to AMX tile creation ops Aug 29, 2025

adam-smnk merged commit 448811d into llvm:main Aug 29, 2025
9 checks passed

[mlir][amx] Add write side effect to AMX tile creation ops #155403

[mlir][amx] Add write side effect to AMX tile creation ops #155403

Conversation

arun-thmn commented Aug 26, 2025 • edited by adam-smnk Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Aug 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

arun-thmn commented Aug 26, 2025

Uh oh!

rengolin commented Aug 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

adam-smnk commented Aug 26, 2025

Uh oh!

arun-thmn commented Aug 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rengolin commented Aug 26, 2025

Uh oh!

adam-smnk commented Aug 26, 2025

Uh oh!

rengolin commented Aug 27, 2025

Uh oh!

phoebewang commented Aug 27, 2025

Uh oh!

arun-thmn commented Aug 28, 2025

Uh oh!

adam-smnk commented Aug 28, 2025

Uh oh!

arun-thmn commented Aug 29, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

adam-smnk left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

rengolin commented Aug 29, 2025

Uh oh!

Uh oh!

Uh oh!

arun-thmn commented Aug 26, 2025 •

edited by adam-smnk

Loading

llvmbot commented Aug 26, 2025 •

edited

Loading

rengolin commented Aug 26, 2025 •

edited

Loading

arun-thmn commented Aug 26, 2025 •

edited

Loading