[inductor] Add TLParse artifact for logging runtime of collective and compute ops #159730

skarjala · 2025-08-03T18:39:47Z

Summary:

debug.py: Added log_runtime_estimates() function to dump runtime estimation data as structured tlparse artifacts in JSON format
test_structured_trace.py: Added comprehensive test coverage with testing compute and collective ops

Stack from ghstack (oldest at bottom):

-> [inductor] Add TLParse artifact for logging runtime of collective and compute ops #159730

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov @coconutruben @Lucaskabela @mlazos

[ghstack-poisoned]

pytorch-bot · 2025-08-03T18:39:50Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/159730

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

ghstack-mergeability-check and Check labels failing with 'Resource not accessible by integration'

✅ You can merge normally! (3 Unrelated Failures)

As of commit 68a0c0c with merge base 2249284 ():

UNSTABLE - The following jobs are marked as unstable, possibly due to flakiness on trunk:

Check Labels / Check labels (gh) (#159894)
RuntimeError: GraphQL query
Check mergeability of ghstack PR / ghstack-mergeability-check (gh) (#159899)
RuntimeError: GraphQL query
pull / linux-jammy-py3_9-clang9-xla / test (xla, 1, 1, lf.linux.12xlarge, unstable) (gh) (#158876)
sccache: error: couldn't connect to server

This comment was automatically generated by Dr. CI and updates every 15 minutes.

ghstack-source-id: 28dc116 Pull-Request: #159730

yushangdi · 2025-08-04T16:30:47Z

torch/_inductor/debug.py

+            },
+            payload_fn=lambda: data,
+        )
+    except Exception:


You don't need a try-except here, you can just directly use trace_structured in log_runtime_estimates without adding a separate _dump_tlparse_runtime function.

yushangdi · 2025-08-04T16:35:00Z

torch/_inductor/scheduler.py

@@ -818,7 +817,7 @@ def get_estimated_runtime(self) -> float:
            return 0

        # Collective kernels
-        if is_collective(self.node):
+        if isinstance(self.node, ir._CollectiveKernel):


why do we need to change this here?

Earlier it wasn't picking up schedule collectives, but after further review if I ran the code a different way it was able to pick up the collectives, the orig implementation works fine. Fixed

[ghstack-poisoned]

ghstack-source-id: c9eb67f Pull-Request: #159730 fix pr feedback

skarjala · 2025-08-04T22:57:34Z

@pytorchbot merge -i

pytorchmergebot · 2025-08-04T22:59:53Z

Merge started

Your change will be merged while ignoring the following 3 checks: Check mergeability of ghstack PR / ghstack-mergeability-check, pull / cuda12.8-py3.10-gcc9-sm75 / test (pr_time_benchmarks, 1, 1, linux.g4dn.metal.nvidia.gpu), pull / linux-jammy-py3_9-clang9-xla / test (xla, 1, 1, lf.linux.12xlarge, unstable)

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2025-08-04T23:05:42Z

Merge failed

Reason: 1 jobs have failed, first few of them are: Check Labels / Check labels

Details for Dev Infra team

Raised by workflow job

skarjala · 2025-08-04T23:09:37Z

@pytorchbot merge -i

pytorchmergebot · 2025-08-04T23:12:10Z

Merge started

Your change will be merged while ignoring the following 3 checks: Check mergeability of ghstack PR / ghstack-mergeability-check, pull / cuda12.8-py3.10-gcc9-sm75 / test (pr_time_benchmarks, 1, 1, linux.g4dn.metal.nvidia.gpu), pull / linux-jammy-py3_9-clang9-xla / test (xla, 1, 1, lf.linux.12xlarge, unstable)

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2025-08-04T23:17:56Z

Merge failed

Reason: 1 jobs have failed, first few of them are: Check Labels / Check labels

Details for Dev Infra team

Raised by workflow job

skarjala · 2025-08-05T04:35:33Z

@pytorchbot merge -i

pytorchmergebot · 2025-08-05T04:37:35Z

Merge started

Your change will be merged while ignoring the following 3 checks: Check mergeability of ghstack PR / ghstack-mergeability-check, pull / cuda12.8-py3.10-gcc9-sm75 / test (pr_time_benchmarks, 1, 1, linux.g4dn.metal.nvidia.gpu), pull / linux-jammy-py3_9-clang9-xla / test (xla, 1, 1, lf.linux.12xlarge, unstable)

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2025-08-05T04:43:31Z

Merge failed

Reason: 1 jobs have failed, first few of them are: Check Labels / Check labels

Details for Dev Infra team

Raised by workflow job

[ghstack-poisoned]

ghstack-source-id: ddb7f4f Pull-Request: #159730 fix pr feedback update to graph pass once

[ghstack-poisoned]

ghstack-source-id: c1e4a3a Pull-Request: #159730 fix pr feedback update to graph pass once new flag

[ghstack-poisoned]

ghstack-source-id: 3b5771f Pull-Request: #159730 fix pr feedback update to graph pass once new flag update test_structured_trace

skarjala · 2025-08-05T21:57:48Z

@pytorchbot merge -i

pytorchmergebot · 2025-08-05T22:00:47Z

Merge started

Your change will be merged while ignoring the following 3 checks: Check Labels / Check labels, Check mergeability of ghstack PR / ghstack-mergeability-check, pull / linux-jammy-py3_9-clang9-xla / test (xla, 1, 1, lf.linux.12xlarge, unstable)

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

Update

3c3688c

[ghstack-poisoned]

pytorch-bot bot added ciflow/inductor module: dynamo module: inductor labels Aug 3, 2025

skarjala added a commit that referenced this pull request Aug 3, 2025

Create TLParse artifact for logging runtime

61996be

ghstack-source-id: 28dc116 Pull-Request: #159730

skarjala changed the title ~~Create TLParse artifact for logging runtime~~ [inductor] Create TLParse artifact for logging runtime Aug 3, 2025

skarjala added the topic: not user facing topic category label Aug 3, 2025

skarjala changed the title ~~[inductor] Create TLParse artifact for logging runtime~~ [inductor] Add Logging runtime TLParse artifact for collective and compute ops Aug 3, 2025

skarjala changed the title ~~[inductor] Add Logging runtime TLParse artifact for collective and compute ops~~ [inductor] Add TLParse artifact for logging runtime of collective and compute ops Aug 3, 2025

skarjala requested review from xmfan, yushangdi and exclamaforte August 3, 2025 18:59

yushangdi reviewed Aug 4, 2025

View reviewed changes

Update

f7245a9

[ghstack-poisoned]

skarjala added a commit that referenced this pull request Aug 4, 2025

Create TLParse artifact for logging runtime

801cada

ghstack-source-id: c9eb67f Pull-Request: #159730 fix pr feedback

skarjala added topic: not user facing topic category release notes: inductor ciflow/inductor and removed topic: not user facing topic category ciflow/inductor labels Aug 4, 2025

yushangdi approved these changes Aug 4, 2025

View reviewed changes

skarjala added ciflow/inductor and removed ciflow/inductor release notes: inductor labels Aug 4, 2025

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Aug 4, 2025

pytorchmergebot added the merging label Aug 4, 2025

pytorchmergebot removed the merging label Aug 4, 2025

pytorchmergebot added the merging label Aug 4, 2025

pytorchmergebot removed the merging label Aug 4, 2025

skarjala added the ci: sev critical failure affecting PyTorch CI label Aug 5, 2025

pytorchmergebot added the merging label Aug 5, 2025

skarjala removed the ci: sev critical failure affecting PyTorch CI label Aug 5, 2025

pytorchmergebot removed the merging label Aug 5, 2025

xmfan mentioned this pull request Aug 5, 2025

ghstack-mergeability-check and Check labels failing with 'Resource not accessible by integration' #159825

Closed

Update

151a9f4

[ghstack-poisoned]

skarjala added a commit that referenced this pull request Aug 5, 2025

Create TLParse artifact for logging runtime

b0dfaee

ghstack-source-id: ddb7f4f Pull-Request: #159730 fix pr feedback update to graph pass once

Update

c982546

[ghstack-poisoned]

skarjala added a commit that referenced this pull request Aug 5, 2025

Create TLParse artifact for logging runtime

9487942

ghstack-source-id: c1e4a3a Pull-Request: #159730 fix pr feedback update to graph pass once new flag

Update

68a0c0c

[ghstack-poisoned]

skarjala added a commit that referenced this pull request Aug 5, 2025

Create TLParse artifact for logging runtime

0e34c46

ghstack-source-id: 3b5771f Pull-Request: #159730 fix pr feedback update to graph pass once new flag update test_structured_trace

pytorchmergebot added the merging label Aug 5, 2025

pytorchmergebot added the Merged label Aug 5, 2025

pytorchmergebot closed this in 8034b2a Aug 5, 2025

pytorchmergebot removed the merging label Aug 5, 2025

AaronWang04 mentioned this pull request Aug 8, 2025

[Inductor] addmm + activation function fusion #158137

Open

[inductor] Add TLParse artifact for logging runtime of collective and compute ops #159730

[inductor] Add TLParse artifact for logging runtime of collective and compute ops #159730

Uh oh!

Conversation

skarjala commented Aug 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Aug 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/159730

❗ 1 Active SEVs

✅ You can merge normally! (3 Unrelated Failures)

Uh oh!

yushangdi Aug 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

skarjala Aug 4, 2025

Choose a reason for hiding this comment

Uh oh!

yushangdi Aug 4, 2025

Choose a reason for hiding this comment

Uh oh!

skarjala Aug 4, 2025

Choose a reason for hiding this comment

Uh oh!

skarjala commented Aug 4, 2025

Uh oh!

pytorchmergebot commented Aug 4, 2025

Merge started

Uh oh!

pytorchmergebot commented Aug 4, 2025

Merge failed

Uh oh!

skarjala commented Aug 4, 2025

Uh oh!

pytorchmergebot commented Aug 4, 2025

Merge started

Uh oh!

pytorchmergebot commented Aug 4, 2025

Merge failed

Uh oh!

skarjala commented Aug 5, 2025

Uh oh!

pytorchmergebot commented Aug 5, 2025

Merge started

Uh oh!

pytorchmergebot commented Aug 5, 2025

Merge failed

Uh oh!

skarjala commented Aug 5, 2025

Uh oh!

pytorchmergebot commented Aug 5, 2025

Merge started

Uh oh!

Uh oh!

skarjala commented Aug 3, 2025 •

edited

Loading

pytorch-bot bot commented Aug 3, 2025 •

edited

Loading

yushangdi Aug 4, 2025 •

edited

Loading