Skip to content

py/vm.c: Document the SELECTIVE_EXC_IP optimisation. #8870

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

jimmo
Copy link
Member

@jimmo jimmo commented Jul 6, 2022

I've often wondered what the backstory is behind MARK_EXC_IP_SELECTIVE and in #8869 and #8845 the inlined pending exception handler uses it.

This PR just adds comments to explain what it does. I am fairly ambivalent about whether we should keep it, although if we do keep it perhaps we should properly make it an mpconfig.h option and enable it by default on boards with more flash?

One argument might be to remove it and re-visit the optimisation if NLR removal happens. As far as I understand, the value of code_state->ip only really matters outside this function for yield (i.e. re-entry to this function). So for the purpose of the exception handler, a regular variable storing the "current ip" would be fine which could be cheaply set to ip at dispatch (and could be optimised and potentially registered by the compiler). However with NLR, a local variable won't work and the value will be trashed by the nlr jump.

I also did some performance testing of this optimisation. On PYBV11 it adds +360 bytes and has a 1-3% performance improvement:

$ ./run-perfbench.py -s ~/exc-ip-sel-baseline ~/exc-ip-sel-enabled
diff of scores (higher is better)
N=100 M=100                /home/jimmo/exc-ip-sel-baseline -> /home/jimmo/exc-ip-sel-enabled         diff      diff% (error%)
bm_chaos.py                    359.45 ->     366.21 :      +6.76 =  +1.881% (+/-0.01%)
bm_fannkuch.py                  77.39 ->      78.12 :      +0.73 =  +0.943% (+/-0.00%)
bm_fft.py                     2517.50 ->    2594.00 :     +76.50 =  +3.039% (+/-0.00%)
bm_float.py                   5720.43 ->    5862.47 :    +142.04 =  +2.483% (+/-0.00%)
bm_hexiom.py                    46.21 ->      46.72 :      +0.51 =  +1.104% (+/-0.00%)
bm_nqueens.py                 4393.94 ->    4469.23 :     +75.29 =  +1.713% (+/-0.00%)
bm_pidigits.py                 633.47 ->     641.64 :      +8.17 =  +1.290% (+/-0.23%)
misc_aes.py                    406.41 ->     418.07 :     +11.66 =  +2.869% (+/-0.00%)
misc_mandel.py                3478.31 ->    3530.72 :     +52.41 =  +1.507% (+/-0.00%)
misc_pystone.py               2424.83 ->    2464.01 :     +39.18 =  +1.616% (+/-0.01%)
misc_raytrace.py               371.74 ->     378.69 :      +6.95 =  +1.870% (+/-0.01%)

On rp2, interestingly it makes no size difference but a much smaller performance improvement.

$ ./run-perfbench.py -s ~/exc-ip-sel-baseline-rp2 ~/exc-ip-sel-enabled-rp2 
diff of scores (higher is better)
N=100 M=100                /home/jimmo/exc-ip-sel-baseline-rp2 -> /home/jimmo/exc-ip-sel-enabled-rp2         diff      diff% (error%)
bm_chaos.py                    200.78 ->     201.68 :      +0.90 =  +0.448% (+/-0.07%)
bm_fannkuch.py                  55.91 ->      56.66 :      +0.75 =  +1.341% (+/-0.01%)
bm_fft.py                     1596.88 ->    1617.56 :     +20.68 =  +1.295% (+/-0.01%)
bm_float.py                   3785.43 ->    3817.17 :     +31.74 =  +0.838% (+/-0.05%)
bm_hexiom.py                    26.38 ->      26.54 :      +0.16 =  +0.607% (+/-0.04%)
bm_nqueens.py                 2846.73 ->    2864.82 :     +18.09 =  +0.635% (+/-0.03%)
bm_pidigits.py                 379.92 ->     380.36 :      +0.44 =  +0.116% (+/-0.04%)
misc_aes.py                    299.53 ->     302.94 :      +3.41 =  +1.138% (+/-0.02%)
misc_mandel.py                1796.23 ->    1812.03 :     +15.80 =  +0.880% (+/-0.04%)
misc_pystone.py               1341.51 ->    1338.97 :      -2.54 =  -0.189% (+/-0.05%)
misc_raytrace.py               212.89 ->     213.50 :      +0.61 =  +0.287% (+/-0.08%)

I thought it would be interesting to see how it interacts with computed goto. On PYBV11, disabling computed goto saves -1000 bytes, but on arm-none-eabi-gcc 12.1 computed goto has a negligible performance gain (interesting to compare to last time I checked -- #7680 (comment))

$ ./run-perfbench.py -s ~/comp-goto-baseline-pybv11 ~/comp-goto-off-pybv11 
diff of scores (higher is better)
N=100 M=100                /home/jimmo/comp-goto-baseline-pybv11 -> /home/jimmo/comp-goto-off-pybv11         diff      diff% (error%)
bm_chaos.py                    359.45 ->     359.42 :      -0.03 =  -0.008% (+/-0.00%)
bm_fannkuch.py                  77.39 ->      77.32 :      -0.07 =  -0.090% (+/-0.01%)
bm_fft.py                     2517.50 ->    2500.03 :     -17.47 =  -0.694% (+/-0.00%)
bm_float.py                   5720.43 ->    5735.13 :     +14.70 =  +0.257% (+/-0.00%)
bm_hexiom.py                    46.21 ->      46.40 :      +0.19 =  +0.411% (+/-0.00%)
bm_nqueens.py                 4393.94 ->    4381.79 :     -12.15 =  -0.277% (+/-0.00%)
bm_pidigits.py                 633.47 ->     634.79 :      +1.32 =  +0.208% (+/-0.35%)
misc_aes.py                    406.41 ->     415.39 :      +8.98 =  +2.210% (+/-0.00%)
misc_mandel.py                3478.31 ->    3420.89 :     -57.42 =  -1.651% (+/-0.01%)
misc_pystone.py               2424.83 ->    2433.96 :      +9.13 =  +0.377% (+/-0.01%)
misc_raytrace.py               371.74 ->     374.55 :      +2.81 =  +0.756% (+/-0.01%)

On rp2040 though, computed goto is a significant performance gain.

$ ./run-perfbench.py -s ~/comp-goto-baseline-rp2 ~/comp-goto-off-rp2 
diff of scores (higher is better)
N=100 M=100                /home/jimmo/comp-goto-baseline-rp2 -> /home/jimmo/comp-goto-off-rp2         diff      diff% (error%)
bm_chaos.py                    200.78 ->     193.03 :      -7.75 =  -3.860% (+/-0.06%)
bm_fannkuch.py                  55.91 ->      51.74 :      -4.17 =  -7.458% (+/-0.01%)
bm_fft.py                     1596.88 ->    1448.57 :    -148.31 =  -9.287% (+/-0.00%)
bm_float.py                   3785.43 ->    3633.17 :    -152.26 =  -4.022% (+/-0.06%)
bm_hexiom.py                    26.38 ->      25.18 :      -1.20 =  -4.549% (+/-0.04%)
bm_nqueens.py                 2846.73 ->    2712.96 :    -133.77 =  -4.699% (+/-0.02%)
bm_pidigits.py                 379.92 ->     382.44 :      +2.52 =  +0.663% (+/-0.04%)
misc_aes.py                    299.53 ->     271.97 :     -27.56 =  -9.201% (+/-0.03%)
misc_mandel.py                1796.23 ->    1726.77 :     -69.46 =  -3.867% (+/-0.03%)
misc_pystone.py               1341.51 ->    1268.82 :     -72.69 =  -5.419% (+/-0.05%)
misc_raytrace.py               212.89 ->     207.63 :      -5.26 =  -2.471% (+/-0.07%)

Signed-off-by: Jim Mussared <jim.mussared@gmail.com>
@codecov-commenter
Copy link

codecov-commenter commented Jul 6, 2022

Codecov Report

Merging #8870 (d49654c) into master (07cae91) will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           master    #8870   +/-   ##
=======================================
  Coverage   98.31%   98.31%           
=======================================
  Files         157      157           
  Lines       20349    20349           
=======================================
  Hits        20006    20006           
  Misses        343      343           
Impacted Files Coverage Δ
py/vm.c 98.98% <ø> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 07cae91...d49654c. Read the comment docs.

@dpgeorge dpgeorge added the py-core Relates to py/ directory in source label Jul 8, 2022
@dpgeorge
Copy link
Member

Thanks, this is a good improvement to at least document it. Merged in 158f179

perhaps we should properly make it an mpconfig.h option and enable it by default on boards with more flash?

That sounds like a good idea, especially given the performance increases measured above. It's a simple/cheap way to get a non-trivial gain.

Once nlr is removed, I agree, we can revisit this.

@dpgeorge dpgeorge closed this Jul 12, 2022
tannewt pushed a commit to tannewt/circuitpython that referenced this pull request Feb 7, 2024
…speeds

Boost flash and RAM speeds on ESP boards
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
py-core Relates to py/ directory in source
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants