py/vm.c: Document the SELECTIVE_EXC_IP optimisation. #8870
Closed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
I've often wondered what the backstory is behind
MARK_EXC_IP_SELECTIVE
and in #8869 and #8845 the inlined pending exception handler uses it.This PR just adds comments to explain what it does. I am fairly ambivalent about whether we should keep it, although if we do keep it perhaps we should properly make it an mpconfig.h option and enable it by default on boards with more flash?
One argument might be to remove it and re-visit the optimisation if NLR removal happens. As far as I understand, the value of code_state->ip only really matters outside this function for yield (i.e. re-entry to this function). So for the purpose of the exception handler, a regular variable storing the "current ip" would be fine which could be cheaply set to
ip
at dispatch (and could be optimised and potentially registered by the compiler). However with NLR, a local variable won't work and the value will be trashed by the nlr jump.I also did some performance testing of this optimisation. On PYBV11 it adds +360 bytes and has a 1-3% performance improvement:
On rp2, interestingly it makes no size difference but a much smaller performance improvement.
I thought it would be interesting to see how it interacts with computed goto. On PYBV11, disabling computed goto saves -1000 bytes, but on arm-none-eabi-gcc 12.1 computed goto has a negligible performance gain (interesting to compare to last time I checked -- #7680 (comment))
On rp2040 though, computed goto is a significant performance gain.