Skip to content

Add more logging #155219

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: gh/eellison/805/base
Choose a base branch
from
Open

Add more logging #155219

wants to merge 2 commits into from

Conversation

eellison
Copy link
Contributor

@eellison eellison commented Jun 5, 2025

[ghstack-poisoned]
Copy link

pytorch-bot bot commented Jun 5, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/155219

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Merge Blocking SEVs

There is 1 active merge blocking SEVs. Please view them below:

If you must merge, use @pytorchbot merge -f.

❌ 1 New Failure, 3 Unrelated Failures

As of commit b1c9249 with merge base 4405dc1 (image):

NEW FAILURE - The following job has failed:

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

[ghstack-poisoned]
eellison added a commit that referenced this pull request Jun 5, 2025
ghstack-source-id: 5876af1
Pull Request resolved: #155219
@eellison eellison added the topic: not user facing topic category label Jun 5, 2025
if coalesce_tiling[0] != non_coalesce_tiling[0]:
for n in node_schedule:
if isinstance(n, SchedulerNode):
V.graph.scheduler.debug_diff_tilings[n] = (
Copy link
Contributor

@shunting314 shunting314 Jun 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why collect the diffs and print later rather than print it right away here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't have the kernel name yet

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. So the logging is a kernel level logging rather than SchedulerNode level, right? Should the node-schedule be a better key for debug_diff_tilings then

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the node schedule is a list so we cant hash on that. we could potentially hash on the tuple of that list, but i'm not sure that's much better

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally, having the ability to decide kernel name from node-schedule earlier will make this logging much simpler, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can't know that without codegen..

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just share one idea here.

You can know majority part of the kernel name (kernel category, fused op names) from the node schedule already. The tricky part is the numeric suffix. I think one solution here is to introduce a concept for node schedule id. Instead of logging kernel name, you can log node schedule id. This requires we optionally codegen one line for the node schedule id around the call site of the kernel though. Thus you can find the kernel name and call site based on the logged node schedule id.

But it is a bit complex then what I initially thought. So, either way is fine to me.

Copy link
Contributor

github-actions bot commented Aug 5, 2025

Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as Stale.
Feel free to remove the Stale label if you feel this was a mistake.
If you are unable to remove the Stale label please contact a maintainer in order to do so.
If you want the bot to never mark this PR stale again, add the no-stale label.
Stale pull requests will automatically be closed after 30 days of inactivity.

@github-actions github-actions bot added the Stale label Aug 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants