-
-
Notifications
You must be signed in to change notification settings - Fork 31.8k
Side exit temperature requires exponential backoff #116968
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I believe the original plan was to begin tracing both on |
IMO we should not be maintaining infrastructure like this (which can easily be copied) until we actually need it. |
@markshannon Mind if I take a look at this? I'd like to understand exactly how we do exponential backoff in other cases and maybe generalize that in the form of macros or inline functions. |
Lab notes: This description of the specialization counters probably belongs in a comment somewhere:
Somehow, for
My proposal is to unify the two algorithms.
For the I think I've talked myself into creating a draft PR. |
Oh, I missed something important. Optimizers may come and go and the counter in I'll have to figure a compromise. |
Not at all |
Once, a long time ago, the initial counter version was part of the on-disk format, so we couldn't use bit fields. struct counter {
int16_t value:12;
uint16_t backoff:4;
} |
@markshannon Before I try to implement the bitfield suggestion, can I get your opinion on my proposal above? |
Unifying the counters seems reasonable. They all do much the same thing. I'd like to keep the thresholds (at least for now) for two reasons:
|
We can get rid of the thresholds, if it makes things significantly simpler. We can support turning off tier 2 by changing |
Actually, there is no need for |
Although objectively the PR makes things worse (it is larger and perhaps slower), I still think factoring out the counter is a good idea. I think that the various optimizer thresholds are a blocker.
(1) limits our options to vary the thresholds at runtime, but I'm not seeing any value in being able to do that anyway. We use constants for tier 1 and that seems to work just fine. I've removed the thresholds as an experiment. I'm running the benchmarks now. |
Thanks! Looking forward to the results. I have a theory BTW about why my PR makes things slower -- it could be that the bitfields end up generating worse code for either the zero check or the decrement operation, and that would affect Tier 1 (especially cases where specializations fails). |
Strangely slow https://github.com/faster-cpython/benchmarking/tree/main/results/bm-20240328-3.13.0a5%2B-174fc24-JIT |
More discussion in the PR. I've removed the thresholds there as well and done some limited local benchmarking; my theory is that the tweaks cause more Tier 2 uops to be executed and that's still slower (even with the JIT, though less slower in that case). |
Introduce a unified 16-bit backoff counter type (``_Py_BackoffCounter``), shared between the Tier 1 adaptive specializer and the Tier 2 optimizer. The API used for adaptive specialization counters is changed but the behavior is (supposed to be) identical. The behavior of the Tier 2 counters is changed: - There are no longer dynamic thresholds (we never varied these). - All counters now use the same exponential backoff. - The counter for ``JUMP_BACKWARD`` starts counting down from 16. - The ``temperature`` in side exits starts counting down from 64.
Introduce a unified 16-bit backoff counter type (``_Py_BackoffCounter``), shared between the Tier 1 adaptive specializer and the Tier 2 optimizer. The API used for adaptive specialization counters is changed but the behavior is (supposed to be) identical. The behavior of the Tier 2 counters is changed: - There are no longer dynamic thresholds (we never varied these). - All counters now use the same exponential backoff. - The counter for ``JUMP_BACKWARD`` starts counting down from 16. - The ``temperature`` in side exits starts counting down from 64.
The
temperature
field used in_PyExitData
(akastruct _exit_data
) currently is a simple counter. When it reaches 15 we try to optimize the side exit's target; when this optimization fails we reset the counter to -16, meaning that after 32 times taking this side exit we will retry again.This is fine if something has changed that makes the side exit's target eventually optimizable; however, in cases where it deterministically fails (e.g.: when it encounters an unsupported opcode early on in the trace, or when the abstract interpreter gets a miss from the function version cache), we can waste a lot of time attempting to optimize the same piece of code over and over.
We should implement exponential backoff here just like we do for the backedge threshold.
PS: There's another threshold,
resume_threshold
, that doesn't appear to be used at all (though there is a failr bit of infrastructure around it that obscure this fact). My guess is that we intended to use this one for the side exit threshold but we changed the direction of the counter because of the idea to name it "temperature", and then introduced a new threshold variable for that.Linked PRs
The text was updated successfully, but these errors were encountered: