py/vm.c: Document the SELECTIVE_EXC_IP optimisation. #8870

jimmo · 2022-07-06T06:32:07Z

I've often wondered what the backstory is behind MARK_EXC_IP_SELECTIVE and in #8869 and #8845 the inlined pending exception handler uses it.

This PR just adds comments to explain what it does. I am fairly ambivalent about whether we should keep it, although if we do keep it perhaps we should properly make it an mpconfig.h option and enable it by default on boards with more flash?

One argument might be to remove it and re-visit the optimisation if NLR removal happens. As far as I understand, the value of code_state->ip only really matters outside this function for yield (i.e. re-entry to this function). So for the purpose of the exception handler, a regular variable storing the "current ip" would be fine which could be cheaply set to ip at dispatch (and could be optimised and potentially registered by the compiler). However with NLR, a local variable won't work and the value will be trashed by the nlr jump.

I also did some performance testing of this optimisation. On PYBV11 it adds +360 bytes and has a 1-3% performance improvement:

$ ./run-perfbench.py -s ~/exc-ip-sel-baseline ~/exc-ip-sel-enabled
diff of scores (higher is better)
N=100 M=100                /home/jimmo/exc-ip-sel-baseline -> /home/jimmo/exc-ip-sel-enabled         diff      diff% (error%)
bm_chaos.py                    359.45 ->     366.21 :      +6.76 =  +1.881% (+/-0.01%)
bm_fannkuch.py                  77.39 ->      78.12 :      +0.73 =  +0.943% (+/-0.00%)
bm_fft.py                     2517.50 ->    2594.00 :     +76.50 =  +3.039% (+/-0.00%)
bm_float.py                   5720.43 ->    5862.47 :    +142.04 =  +2.483% (+/-0.00%)
bm_hexiom.py                    46.21 ->      46.72 :      +0.51 =  +1.104% (+/-0.00%)
bm_nqueens.py                 4393.94 ->    4469.23 :     +75.29 =  +1.713% (+/-0.00%)
bm_pidigits.py                 633.47 ->     641.64 :      +8.17 =  +1.290% (+/-0.23%)
misc_aes.py                    406.41 ->     418.07 :     +11.66 =  +2.869% (+/-0.00%)
misc_mandel.py                3478.31 ->    3530.72 :     +52.41 =  +1.507% (+/-0.00%)
misc_pystone.py               2424.83 ->    2464.01 :     +39.18 =  +1.616% (+/-0.01%)
misc_raytrace.py               371.74 ->     378.69 :      +6.95 =  +1.870% (+/-0.01%)

On rp2, interestingly it makes no size difference but a much smaller performance improvement.

$ ./run-perfbench.py -s ~/exc-ip-sel-baseline-rp2 ~/exc-ip-sel-enabled-rp2 
diff of scores (higher is better)
N=100 M=100                /home/jimmo/exc-ip-sel-baseline-rp2 -> /home/jimmo/exc-ip-sel-enabled-rp2         diff      diff% (error%)
bm_chaos.py                    200.78 ->     201.68 :      +0.90 =  +0.448% (+/-0.07%)
bm_fannkuch.py                  55.91 ->      56.66 :      +0.75 =  +1.341% (+/-0.01%)
bm_fft.py                     1596.88 ->    1617.56 :     +20.68 =  +1.295% (+/-0.01%)
bm_float.py                   3785.43 ->    3817.17 :     +31.74 =  +0.838% (+/-0.05%)
bm_hexiom.py                    26.38 ->      26.54 :      +0.16 =  +0.607% (+/-0.04%)
bm_nqueens.py                 2846.73 ->    2864.82 :     +18.09 =  +0.635% (+/-0.03%)
bm_pidigits.py                 379.92 ->     380.36 :      +0.44 =  +0.116% (+/-0.04%)
misc_aes.py                    299.53 ->     302.94 :      +3.41 =  +1.138% (+/-0.02%)
misc_mandel.py                1796.23 ->    1812.03 :     +15.80 =  +0.880% (+/-0.04%)
misc_pystone.py               1341.51 ->    1338.97 :      -2.54 =  -0.189% (+/-0.05%)
misc_raytrace.py               212.89 ->     213.50 :      +0.61 =  +0.287% (+/-0.08%)

I thought it would be interesting to see how it interacts with computed goto. On PYBV11, disabling computed goto saves -1000 bytes, but on arm-none-eabi-gcc 12.1 computed goto has a negligible performance gain (interesting to compare to last time I checked -- #7680 (comment))

$ ./run-perfbench.py -s ~/comp-goto-baseline-pybv11 ~/comp-goto-off-pybv11 
diff of scores (higher is better)
N=100 M=100                /home/jimmo/comp-goto-baseline-pybv11 -> /home/jimmo/comp-goto-off-pybv11         diff      diff% (error%)
bm_chaos.py                    359.45 ->     359.42 :      -0.03 =  -0.008% (+/-0.00%)
bm_fannkuch.py                  77.39 ->      77.32 :      -0.07 =  -0.090% (+/-0.01%)
bm_fft.py                     2517.50 ->    2500.03 :     -17.47 =  -0.694% (+/-0.00%)
bm_float.py                   5720.43 ->    5735.13 :     +14.70 =  +0.257% (+/-0.00%)
bm_hexiom.py                    46.21 ->      46.40 :      +0.19 =  +0.411% (+/-0.00%)
bm_nqueens.py                 4393.94 ->    4381.79 :     -12.15 =  -0.277% (+/-0.00%)
bm_pidigits.py                 633.47 ->     634.79 :      +1.32 =  +0.208% (+/-0.35%)
misc_aes.py                    406.41 ->     415.39 :      +8.98 =  +2.210% (+/-0.00%)
misc_mandel.py                3478.31 ->    3420.89 :     -57.42 =  -1.651% (+/-0.01%)
misc_pystone.py               2424.83 ->    2433.96 :      +9.13 =  +0.377% (+/-0.01%)
misc_raytrace.py               371.74 ->     374.55 :      +2.81 =  +0.756% (+/-0.01%)

On rp2040 though, computed goto is a significant performance gain.

$ ./run-perfbench.py -s ~/comp-goto-baseline-rp2 ~/comp-goto-off-rp2 
diff of scores (higher is better)
N=100 M=100                /home/jimmo/comp-goto-baseline-rp2 -> /home/jimmo/comp-goto-off-rp2         diff      diff% (error%)
bm_chaos.py                    200.78 ->     193.03 :      -7.75 =  -3.860% (+/-0.06%)
bm_fannkuch.py                  55.91 ->      51.74 :      -4.17 =  -7.458% (+/-0.01%)
bm_fft.py                     1596.88 ->    1448.57 :    -148.31 =  -9.287% (+/-0.00%)
bm_float.py                   3785.43 ->    3633.17 :    -152.26 =  -4.022% (+/-0.06%)
bm_hexiom.py                    26.38 ->      25.18 :      -1.20 =  -4.549% (+/-0.04%)
bm_nqueens.py                 2846.73 ->    2712.96 :    -133.77 =  -4.699% (+/-0.02%)
bm_pidigits.py                 379.92 ->     382.44 :      +2.52 =  +0.663% (+/-0.04%)
misc_aes.py                    299.53 ->     271.97 :     -27.56 =  -9.201% (+/-0.03%)
misc_mandel.py                1796.23 ->    1726.77 :     -69.46 =  -3.867% (+/-0.03%)
misc_pystone.py               1341.51 ->    1268.82 :     -72.69 =  -5.419% (+/-0.05%)
misc_raytrace.py               212.89 ->     207.63 :      -5.26 =  -2.471% (+/-0.07%)

Signed-off-by: Jim Mussared <jim.mussared@gmail.com>

codecov-commenter · 2022-07-06T06:50:52Z

Codecov Report

Merging #8870 (d49654c) into master (07cae91) will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           master    #8870   +/-   ##
=======================================
  Coverage   98.31%   98.31%           
=======================================
  Files         157      157           
  Lines       20349    20349           
=======================================
  Hits        20006    20006           
  Misses        343      343

Impacted Files	Coverage Δ
py/vm.c	`98.98% <ø> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 07cae91...d49654c. Read the comment docs.

dpgeorge · 2022-07-12T06:15:29Z

Thanks, this is a good improvement to at least document it. Merged in 158f179

perhaps we should properly make it an mpconfig.h option and enable it by default on boards with more flash?

That sounds like a good idea, especially given the performance increases measured above. It's a simple/cheap way to get a non-trivial gain.

Once nlr is removed, I agree, we can revisit this.

…speeds Boost flash and RAM speeds on ESP boards

py/vm.c: Document SELECTIVE_EXC_IP.

d49654c

Signed-off-by: Jim Mussared <jim.mussared@gmail.com>

jimmo force-pushed the vm-sel-exc-ip branch from e0432a3 to d49654c Compare July 6, 2022 06:43

dpgeorge added the py-core Relates to py/ directory in source label Jul 8, 2022

dpgeorge closed this Jul 12, 2022

tannewt pushed a commit to tannewt/circuitpython that referenced this pull request Feb 7, 2024

Merge pull request micropython#8870 from tannewt/boost_esp_flash_ram_…

09fcc59

…speeds Boost flash and RAM speeds on ESP boards

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

py/vm.c: Document the SELECTIVE_EXC_IP optimisation. #8870

py/vm.c: Document the SELECTIVE_EXC_IP optimisation. #8870

Uh oh!

jimmo commented Jul 6, 2022

Uh oh!

codecov-commenter commented Jul 6, 2022 •

edited

Loading

Uh oh!

dpgeorge commented Jul 12, 2022

Uh oh!

Uh oh!

Uh oh!

py/vm.c: Document the SELECTIVE_EXC_IP optimisation. #8870

py/vm.c: Document the SELECTIVE_EXC_IP optimisation. #8870

Uh oh!

Conversation

jimmo commented Jul 6, 2022

Uh oh!

codecov-commenter commented Jul 6, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

dpgeorge commented Jul 12, 2022

Uh oh!

Uh oh!

codecov-commenter commented Jul 6, 2022 •

edited

Loading