-
-
Notifications
You must be signed in to change notification settings - Fork 31.9k
gh-109329: Support for basic pystats for Tier 2 #109913
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is so great to have, thanks! We can tweak the collection and output further if need be, but this is already going to be so much more insightful!
Just a couple of suggestions, but otherwise I'll plan on merging tomorrow.
Include/internal/pycore_code.h
Outdated
#define OPT_STAT_INC(name) do { if (_Py_stats) _Py_stats->optimization_stats.name++; } while (0) | ||
#define UOP_EXE_INC(opname) do { if (_Py_stats) _Py_stats->optimization_stats.opcode[opname].execution_count++; } while (0) | ||
#define OPT_UNSUPPORTED_OPCODE(opname) do { if (_Py_stats) _Py_stats->optimization_stats.unsupported_opcode[opname]++; } while (0) | ||
#define OPT_HIST(length, name) do { if (_Py_stats) _Py_stats->optimization_stats.name[_Py_bit_length((length - 1) >> 2)]++; } while (0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why ((length - 1) >> 2)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand correctly, there is a minimum trace size of 2, so the first two power-of-two buckets (for 0 and 1) would always be empty, so might as well just not support them (that's the >> 2
). Then I also think it's easier to reason about buckets that have an inclusive maximum of a round number (that's the - 1
).
Include/cpython/pystats.h
Outdated
uint64_t execution_count; | ||
} UOpStats; | ||
|
||
#define _Py_UOP_HIST_SIZE 5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll probably want more buckets than this (since execution lengths for closed loops will get pretty big), but it's probably fine for now. We can just bump it whenever.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought the max trace length was 64 (_Py_UOP_MAX_TRACE_LENGTH
). Can they be longer? (It's quite possible I don't understand the code). In any event, I should probably add a comment here that this should be log2 of whatever the expected maximum length is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can they be longer? (It's quite possible I don't understand the code).
Not statically, but dynamically they can be quite long (imagine executing a trace for a closed loop hundreds of times).
So we should probably modify the code that adds to the histogram to be Py_MAX(_Py_UOP_HIST_SIZE - 1, _Py_bit_length((length - 1) >> _Py_UOP_HIST_BIAS))
, so that anything overflowing the histogram ends up in the largest bucket instead of writing past the end of the array.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that's a good idea to add the range-checking.
Also, this made me realize I was counting the wrong thing -- I was recording the "instruction pointer at exit" to be the execution length. But (after discussing in person) it seems what we want is the number of uops executed per execution of the trace. I've updated this to report that instead.
Python/specialize.c
Outdated
print_histogram(FILE *out, const char *name, uint64_t hist[_Py_UOP_HIST_SIZE]) | ||
{ | ||
for (int i = 0; i < _Py_UOP_HIST_SIZE; i++) { | ||
fprintf(out, "%s[%d]: %" PRIu64 "\n", name, (1 << (i + 2)), hist[i]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why i + 2
? Is it to correct for the >> 2
when recording the stats?
Oh, I think I see. We're starting with the "third" bucket, since tiny traces are basically nonexistent. Maybe add another constant next to _Py_UOP_HIST_SIZE
(like #define _Py_UOP_HIST_BIAS 2
or something) so it's clearer why we're doing this and where.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's right. Adding the #define
for bias makes sense.
Tools/scripts/summarize_stats.py
Outdated
continue | ||
name, _, rest = key[7:].partition("]") | ||
name, _, rest = key[len(prefix) + 2:].partition("]") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this correct? The old value was 7
, but len("opcode") + 2
is 8
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. It seems to not matter, but might as well revert it just in case.
Tools/scripts/summarize_stats.py
Outdated
def _emit_comparative_execution_counts(base_rows, head_rows): | ||
base_data = dict((x[0], x[1:]) for x in base_rows) | ||
head_data = dict((x[0], x[1:]) for x in head_rows) | ||
opcodes = set(base_data.keys()) | set(head_data.keys()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dictionary keys support set operations:
opcodes = set(base_data.keys()) | set(head_data.keys()) | |
opcodes = base_data.keys() | head_data.keys() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TIL!
|
||
rows = [] | ||
default = [0, "0.0%", "0.0%", 0] | ||
for opcode in opcodes: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe give these a deterministic (and somewhat meaningful) order?
for opcode in opcodes: | |
for opcode in sorted(opcodes): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could, be we end up sorting by the counts a few lines below anyway...
Tools/scripts/summarize_stats.py
Outdated
(opcode, base_entry[0], head_entry[0], | ||
f"{100*change:0.1f}%")) | ||
|
||
rows.sort(key=lambda x: -abs(percentage_to_float(x[-1]))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rows.sort(key=lambda x: -abs(percentage_to_float(x[-1]))) | |
rows.sort(key=lambda x: abs(percentage_to_float(x[-1])), reverse=True) |
(Also, it looks like CI caught some trailing whitespace.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, thanks!
This just implements a set of tier 2 pystats mentioned in #109329.
This implements:
It also fixes a "bug" where specialization "hits" in the running Tier 2 interpreter would count against the Tier 1 stats.
Not implemented (since it will require reworking of DEOPT_IF calls):
Example output (for nbody benchmark):
Optimization (Tier 2) stats
statistics about the Tier 2 optimizer
Overall stats
overall stats
Trace length histogram
Optimized trace length histogram
Trace run length histogram
Uop stats
uop stats
Unsupported opcodes
unsupported opcodes