[inductor] Improve GEMM logging to display batch size for batched operations #155544

penknife6153 · 2025-06-10T06:51:27Z

Improves the GEMM overview logging in PyTorch Inductor to properly display batch size information for batched matrix operations like torch.bmm and torch.baddbmm.

Fixes #155307

# Repro
import os
os.environ["TORCH_LOGS"] = "inductor"
import torch

M, N, K = 1024, 1024, 1024
dtype = torch.bfloat16
A = torch.randn(10, M, K, device="cuda", dtype=dtype)
B = torch.randn(10, K, N, device="cuda", dtype=dtype)

compiled_model = torch.compile(torch.bmm, fullgraph=True)
_ = compiled_model(A, B)

Before:

Name                 | M                    | N                    | K                    | Count               
----------------------------------------------------------------------------------------------------
aten.bmm             | 1024                 | 1024                 | 1024                 | 1                   
----------------------------------------------------------------------------------------------------

The batch size (10) is missing from the logs, making it unclear what the actual operation dimensions were.

After:

Name                 | B                    | M                    | N                    | K                    | Count               
---------------------------------------------------------------------------------------------------------------------------
aten.bmm             | 10                   | 1024                 | 1024                 | 1024                 | 1                   
---------------------------------------------------------------------------------------------------------------------------

Changes Made

compile_fx.py:
- Detects batched operations by checking if operation name ends with 'bmm' or 'baddbmm'
- For batched operations: takes last 4 parts as batch, m, n, k
- For non-batched operations: takes last 3 parts as m, n, k
- Added separate column for batch size
- Shows batch size for batched operations, shows "-" for non-batched operations
bmm.py:
- Extract batch size from mat1.get_size()[0] for both tuned_bmm and tuned_baddbmm
- Use positional counter keys: aten.bmm_{batch_size}_{m}_{n}_{k}
- Enhanced log messages to include batch size information

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov @henrylhtsang

Changes: - Extract batch size from input tensor shape in tuned_bmm() - Include batch size in log messages: "batch=B, m=M, n=N, k=K" - Update counter keys to include batch info for better tracking - Apply same enhancement to tuned_baddbmm() for consistency

…rations The GEMM overview table in inductor logs was missing batch size information for batched matrix operations like torch.bmm and torch.baddbmm. This made it difficult to distinguish between different batched operations with the same M, N, K dimensions but different batch sizes. Changes: - Updated counter key format in kernel files to use prefixed values (e.g., "aten.bmm_b10_m1024_n1024_k1024" instead of "aten.bmm_10_1024_1024_1024") - Enhanced parsing logic in compile_fx.py to handle both new prefixed format and legacy format for backward compatibility - Added batch size display in overview table for batched operations (e.g., "aten.bmm (B=10)" instead of just "aten.bmm") - Increased table width to accommodate batch size information Before: ``` Name | M | N | K | Count aten.bmm | 1024 | 1024 | 1024 | 1 ``` After: ``` Name | M | N | K | Count aten.bmm (B=10) | 1024 | 1024 | 1024 | 1 ``` This provides clearer visibility into batched GEMM operations while maintaining backward compatibility with existing counter formats. Fixes pytorch#155307

pytorch-bot · 2025-06-10T06:51:31Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/155544

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit 1c3bfb3 with merge base b0fbbef ():

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

pull / linux-jammy-py3-clang12-executorch / build (gh) (trunk failure)
Final attempt failed. Child_process exited with error code 1

This comment was automatically generated by Dr. CI and updates every 15 minutes.

linux-foundation-easycla · 2025-06-10T06:51:33Z

The committers listed above are authorized under a signed CLA.

✅ login: penknife6153 (585f5bc, 18d7ce0, 6d223fe, 1c3bfb3)

penknife6153 · 2025-06-10T06:55:32Z

@pytorchbot label "topic: not user facing"

henrylhtsang · 2025-06-11T16:47:44Z

@penknife6153 can you merge with @ pytorchbot merge

penknife6153 · 2025-06-11T16:50:08Z

@pytorchbot merge

pytorchmergebot · 2025-06-11T16:52:02Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

…rations (pytorch#155544) Improves the GEMM overview logging in PyTorch Inductor to properly display batch size information for batched matrix operations like `torch.bmm` and `torch.baddbmm`. **Fixes pytorch#155307** ## Problem The current GEMM logging for `torch.bmm` shows: ```python # Repro import os os.environ["TORCH_LOGS"] = "inductor" import torch M, N, K = 1024, 1024, 1024 dtype = torch.bfloat16 A = torch.randn(10, M, K, device="cuda", dtype=dtype) B = torch.randn(10, K, N, device="cuda", dtype=dtype) compiled_model = torch.compile(torch.bmm, fullgraph=True) _ = compiled_model(A, B) ``` **Before:** ``` Name | M | N | K | Count ---------------------------------------------------------------------------------------------------- aten.bmm | 1024 | 1024 | 1024 | 1 ---------------------------------------------------------------------------------------------------- ``` The batch size (10) is missing from the logs, making it unclear what the actual operation dimensions were. ## Solution **After:** ``` Name | B | M | N | K | Count ---------------------------------------------------------------------------------------------------------------------------------- aten.bmm | 10 | 1024 | 1024 | 1024 | 1 aten.mm | - | 1024 | 1024 | 1024 | 2 ---------------------------------------------------------------------------------------------------------------------------------- ``` ## Changes Made ### 1. Enhanced Parsing Logic in compile_fx.py - Detects batched operations by checking if operation name ends with `'bmm'` or `'baddbmm'` - For batched operations: takes last 4 parts as `batch, m, n, k` - For non-batched operations: takes last 3 parts as `m, n, k` - **Dedicated "B" column**: Added separate column for batch size instead of embedding in operation name - Shows batch size for batched operations, shows "-" for non-batched operations ### 2. Updated All MM Operations for Consistency - **bmm.py**: - Extract batch size from `mat1.get_size()[0]` for both `tuned_bmm` and `tuned_baddbmm` - Use positional counter keys: `aten.bmm_{batch_size}_{m}_{n}_{k}` - Enhanced log messages to include batch size information - **mm.py**: Updated counter keys for consistency: - `aten.mm_{m}_{n}_{k}` (no batch dimension) - `aten.addmm_{m}_{n}_{k}` (no batch dimension) - `aten._int_mm_{m}_{n}_{k}` (no batch dimension) - `aten._scaled_mm.default_{m}_{n}_{k}` (no batch dimension) Pull Request resolved: pytorch#155544 Approved by: https://github.com/jansel, https://github.com/BoyuanFeng

penknife6153 and others added 2 commits June 6, 2025 10:15

pytorch-bot bot added the module: inductor label Jun 10, 2025

pytorch-bot bot added the topic: not user facing topic category label Jun 10, 2025

pytorchbot added the open source label Jun 10, 2025

[inductor] More concise log values separation using last k elements.

18d7ce0

zou3519 added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Jun 10, 2025

zou3519 requested review from jansel and BoyuanFeng June 10, 2025 13:33

jansel approved these changes Jun 10, 2025

View reviewed changes

jansel added ciflow/trunk Trigger trunk jobs on your pull request ciflow/inductor labels Jun 10, 2025

henrylhtsang self-requested a review June 10, 2025 16:36

[inductor] Fix linting errors in GEMM logging table implementation.

1c3bfb3

pytorch-bot bot removed ciflow/trunk Trigger trunk jobs on your pull request ciflow/inductor labels Jun 10, 2025

BoyuanFeng approved these changes Jun 10, 2025

View reviewed changes

BoyuanFeng added the ciflow/trunk Trigger trunk jobs on your pull request label Jun 10, 2025

henrylhtsang mentioned this pull request Jun 11, 2025

[inductor] Improve GEMM loggings #155427

Open

pytorchmergebot added the merging label Jun 11, 2025

pytorchmergebot added the Merged label Jun 11, 2025

pytorchmergebot closed this in 59eb61b Jun 11, 2025

pytorchmergebot removed the merging label Jun 11, 2025

pytorch-bot bot added the ciflow/inductor label Jun 13, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[inductor] Improve GEMM logging to display batch size for batched operations #155544

[inductor] Improve GEMM logging to display batch size for batched operations #155544

Uh oh!

penknife6153 commented Jun 10, 2025 •

edited

Loading

Uh oh!

pytorch-bot bot commented Jun 10, 2025 •

edited

Loading

Uh oh!

linux-foundation-easycla bot commented Jun 10, 2025 •

edited

Loading

Uh oh!

penknife6153 commented Jun 10, 2025

Uh oh!

henrylhtsang commented Jun 11, 2025

Uh oh!

penknife6153 commented Jun 11, 2025

Uh oh!

pytorchmergebot commented Jun 11, 2025

Uh oh!

Uh oh!

[inductor] Improve GEMM logging to display batch size for batched operations #155544

[inductor] Improve GEMM logging to display batch size for batched operations #155544

Uh oh!

Conversation

penknife6153 commented Jun 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes Made

Uh oh!

pytorch-bot bot commented Jun 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/155544

✅ You can merge normally! (1 Unrelated Failure)

Uh oh!

linux-foundation-easycla bot commented Jun 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

penknife6153 commented Jun 10, 2025

Uh oh!

henrylhtsang commented Jun 11, 2025

Uh oh!

penknife6153 commented Jun 11, 2025

Uh oh!

pytorchmergebot commented Jun 11, 2025

Merge started

Uh oh!

Uh oh!

penknife6153 commented Jun 10, 2025 •

edited

Loading

pytorch-bot bot commented Jun 10, 2025 •

edited

Loading

linux-foundation-easycla bot commented Jun 10, 2025 •

edited

Loading