-
Notifications
You must be signed in to change notification settings - Fork 24.9k
Add more logging #155219
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: gh/eellison/805/base
Are you sure you want to change the base?
Add more logging #155219
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/155219
Note: Links to docs will display an error until the docs builds have been completed. ❗ 1 Merge Blocking SEVsThere is 1 active merge blocking SEVs. Please view them below:
If you must merge, use ❌ 1 New Failure, 3 Unrelated FailuresAs of commit b1c9249 with merge base 4405dc1 ( NEW FAILURE - The following job has failed:
FLAKY - The following jobs failed but were likely due to flakiness present on trunk:
UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
if coalesce_tiling[0] != non_coalesce_tiling[0]: | ||
for n in node_schedule: | ||
if isinstance(n, SchedulerNode): | ||
V.graph.scheduler.debug_diff_tilings[n] = ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why collect the diffs and print later rather than print it right away here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't have the kernel name yet
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. So the logging is a kernel level logging rather than SchedulerNode level, right? Should the node-schedule be a better key for debug_diff_tilings then
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the node schedule is a list so we cant hash on that. we could potentially hash on the tuple of that list, but i'm not sure that's much better
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally, having the ability to decide kernel name from node-schedule earlier will make this logging much simpler, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can't know that without codegen..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just share one idea here.
You can know majority part of the kernel name (kernel category, fused op names) from the node schedule already. The tricky part is the numeric suffix. I think one solution here is to introduce a concept for node schedule id. Instead of logging kernel name, you can log node schedule id. This requires we optionally codegen one line for the node schedule id around the call site of the kernel though. Thus you can find the kernel name and call site based on the logged node schedule id.
But it is a bit complex then what I initially thought. So, either way is fine to me.
Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as |
Stack from ghstack (oldest at bottom):
cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov @coconutruben