[proof of concept] completely remove nlr from py, extmod and unix port #4131

dpgeorge · 2018-09-13T01:54:50Z

This is a proof of concept and work in progress to investigate the possibility of completely removing nlr (setjmp/longjmp-like exception handling) from the code base.

Instead of doing a longjmp, exception handling is now basically handled as follows:

to raise an exception: create an exception the usual way and store it into MP_STATE_THREAD(cur_exc), then return MP_OBJ_NULL
to propagate an exception: if a function returned MP_OBJ_NULL then just return MP_OBJ_NULL
to catch an exception: if a function returned MP_OBJ_NULL then access the exception in MP_STATE_THREAD(cur_exc) and clear this variable so the exception doesn't propagate further

For functions that don't return an mp_obj_t there are some additional rules about how to indicate an exception, but they are quite simple. There are also some helper functions to make the above rules easier to follow (eg for iterating through iterables). In a lot of cases code doesn't need to change much at all from the existing nlr scheme.

For this proof of concept the unix port is completely converted to non-nlr code. All tests pass (basics, extmod, io, import, float, etc), excluding threading, and excluding those that use the native emitter (because it still uses nlr).

Change in code size wrt master (absolute bytes and percent bytes):

       bare-arm:  +900 +1.332%
    minimal x86: +1304 +0.831%
       unix x64: +3032 +0.623%
    unix nanbox: +1956 +0.446%
          stm32: +1732 +0.486%
         cc3200: +1352 +0.731%
        esp8266: +1812 +0.280%
          esp32: +1897 +0.178%

Performance seems to be unchanged, if anything slightly improved (only tested with pystone.py).

The benefits of removing nlr are:

no need for any assembler in the core
possible to port to architectures that don't have setjmp/longjmp and where it's not feasible to write custom nlr code in assembler
less stack usage (eg for stm32, the VM stack usage decreases from 120 bytes to 72 bytes)
much simpler catching and handling of exceptions
easier to reason about the code because it will no longer arbitrarily do a longjmp
easier and safer to call subsystem code (eg vfs code) from non-uPy contexts without worrying about nlr

The drawbacks of removing nlr are:

all existing C code (eg stm32, other ports, 3rd party extensions, etc) must be rewritten to handle the new rules for raising exceptions
increased existing code size
all new C code that will be written in the future will be ~1% larger due to the need to explicitly handle exceptions/errors

dpgeorge · 2018-09-13T01:59:17Z

There is still a lot of work to do here, even for the core, mostly to check the return value of all "malloc" calls. The existing test suite doesn't check out-of-memory errors for all possible cases (eg OOM when increasing the size of a list due to a slice insertion) so tests would need to be written for these and then error checking added.

pfalcon · 2018-09-13T07:10:11Z

Ha-ha, looking at your recent efforts to revamp exception handling for native, I also again was thinking whether "pass explicitly" scheme would have been better... Reading thru the description though...

pfalcon · 2018-09-13T07:18:04Z

mp_raise_o

So, what's "o" ?

pfalcon · 2018-09-13T07:24:49Z

py/argcheck.c

        }
+        return 1;


Is this good idea to return 1 in case of failure, and 0 on success? Why not bool return?

I was waiting until it was all converted to see what the best thing to return would be for functions like this. It might be best to use errno-like codes, -1 for error, 0 for success, etc. Then there can be common wrappers/macros to deal with it all.

pfalcon · 2018-09-13T07:41:07Z

Change in code size wrt master (absolute bytes and percent bytes):

IMHO, this is rather modest changes. Just need to cut some code somewhere as a moral compensation ;-).

Performance seems to be unchanged, if anything slightly improved (only tested with pystone.py).

This is harder to believe in. Maybe on modern superscalar CPUs it's not noticeable, but on simple pipelined architecture that would eat at least 2 cycle (cmp + false jump) in many places. It's hard to believe it's not noticed. And pipelining is still an advanced arch, there're CPUs without it.

So, I'd say performance aspect should be studied more, and all steps should be taken to minimize effect of exception checks (MP_UNLIKELY used, and then the check abstracted to a macro (likely a few macros)).

But overall, if the direction of native code generation is to be pursued, I think that's the adopting this change is the only sane way, because NLR is limiting factor both in terms of complexity, as the recent patches showed, and achieving greater performance. But it means that generated native code will only take more space.

A good matter is that the conversion can happen gradually, with 2 systems coexist while conversion is performed. But as there's no talking about making this configurable, doing some additional impact analysis before going for the switchover makes sense.

stinos · 2018-09-13T08:01:24Z

I'd say performance aspect should be studied more

Agreed, would be interesting to measure impact on some real-life code.
For the rest I think I'm +1, should make for nicer code overall. The 'think' is just because I realize I'm going to have to re-visit and rewrite quite some code which relies on exceptions. Unless this is configurable. E.g. I use C++ anyway, replacing NLR with C++ exceptions would be interesting to look at as well though I assume it would be slower.

dpgeorge · 2018-09-13T13:02:53Z

mp_raise_o

So, what's "o" ?

It just means "this function returns an invalid object" (o for object) which can be returned by the caller to make it easier to propagate the error up. I was also going to add mp_raise_int etc but it wasn't really needed. Also, I needed to change the name anyway because these functions changed signature/behaviour and I wanted a way for the compiler to catch any code that was not yet converted. Likely it would be renamed back to mp_raise.

dpgeorge · 2018-09-13T13:06:04Z

Thanks for fast feedback. My general feeling is that this is the right direction to move in, eventually. The reason I post the code here now is because it was much easier than I expected to convert things. But still, I think the biggest barrier is that it will really break the C API and a lot of people are going to have to rewrite (or at least audit) a lot of code. So this would have to wait until a major version bump, eg 2.0 or even 3.0.

pfalcon · 2018-09-30T19:00:45Z

But still, I think the biggest barrier is that it will really break the C API and a lot of people are going to have to rewrite (or at least audit) a lot of code. So this would have to wait until a major version bump, eg 2.0 or even 3.0.

Why putting off this for so long, ain't there enough great patches are sitting in half-done/stale shape? This one already conflicting in a bunch of files.

I was probably was too optimistic above re: being able to convert incrementally, without much effort, because any occurrence of nlr_push would need to check for both old and new exception-raising, but then how much of "user code" uses nlr_push? Grepping extmod/ should give a good idea:

micropython/extmod$ grep -r nlr_push *
uos_dupterm.c:    if (nlr_push(&nlr) == 0) {
uos_dupterm.c:        if (nlr_push(&nlr) == 0) {
uos_dupterm.c:        if (nlr_push(&nlr) == 0) {
vfs.c:    if (nlr_push(&nlr) == 0) {

Both uos_dupterm.c and vfs.c implement special functionality. No other application-level modules use it.

Just for comparison, I also grepped openmv's repo, it has hits only in "main.c".

pfalcon · 2018-09-30T19:03:43Z

Btw, another missed comment:

to raise an exception: create an exception the usual way and store it into MP_STATE_THREAD(cur_exc), then return MP_OBJ_NULL

That was smart. I always was thinking that it would require returning exception object as a result of each function, and thus digging out an extra bit in object representations, and then noticeable changes/compromises/drawbacks.

dpgeorge · 2018-10-05T05:52:32Z

Why putting off this for so long, ain't there enough great patches are sitting in half-done/stale shape?

Because there's still a lot more to do to get this even working with the core: all existing uses of the GC malloc must be augmented with a check for success/failure, and then handle failure in an appropriate way. And then (coverage) tests need to be written for all these additions.

pfalcon · 2018-12-09T21:45:32Z

py/mpstate.h

@@ -242,6 +242,8 @@ typedef struct _mp_state_thread_t {
    mp_obj_dict_t *dict_locals;
    mp_obj_dict_t *dict_globals;

+    mp_obj_base_t *cur_exc;


So, we already have cur_exception. But it's currently in mp_state_vm_t, which is not correct: https://docs.python.org/3/library/sys.html#sys.exc_info

The information returned is specific both to the current thread and to the current stack frame.

Please move it to mp_state_thread_t and apparently use for this patchset.

I added a new entry to not tangle this set of patches too much with the existing code. The idea would be to clean up things like this if the patches get merged. And it's not immediately clear that cur_exc and cur_exception can be the same thing, due to the VM setting/clearing cur_exception at various points.

pfalcon · 2018-12-09T21:47:52Z

py/runtime.h

@@ -158,6 +160,13 @@ NORETURN void mp_raise_NotImplementedError(const char *msg);
 NORETURN void mp_raise_OSError(int errno_);
 NORETURN void mp_raise_recursion_depth(void);

+mp_obj_t mp_raise_o(mp_obj_t exc);


I think these should be macros instead. Or at least, macro variants provided:

#define MP_RAISE_O(exc) { return mp_raise_o(exc); }

As I mentioned above, this was just a temporary name that would be renamed back to mp_raise() once everything was converted. If it's important to have them as macros/inline functions then they should be lower case because of #293 (which is still a to-do). In that case there's no hurry to make them inline functions because it can be easily changed later (eg true function mp_raise changed to inline function/macro mp_raise which calls helper mp_raise_helper).

If it's important to have them as macros/inline functions

So, I'm interested in starting to adopt these sooner rather than later. Then, having that as a macro is important. For example, MP_RAISE_NEW() could be defined as:

#if USE_NEW_EXC #define MP_RAISE_NEW(exc) { return mp_raise_o(exc); } #else #define MP_RAISE_NEW(exc) nlr_raise(exc); #endif

And gradual conversion can be started right away. Again, exact naming is not important, the argument for incremental conversion started "soon". Whereas waiting for "once everything was converted" is a separate, always-bitrotting patch may take years.

pfalcon · 2018-12-09T22:06:34Z

So, can again this be refactored into gradual parts, and the first part pushed sooner rather than later? Just a patch which allows native functions to return MP_OBJ_NULL to signify exception, to be handled in vm.c for MP_BC_CALL_FUNCTION and friends. Complete switchover can be left for next steps, which are again can be done in similar gradual steps, literally bytecode by bytecode. In the end what will be left is to just turn off nlr_* stuff.

Note that this stuff can be seen as the root cause of arguments like in #4217 - exceptions are known to be heavy so there're a pressure to devise APIs which avoid them. (To be fair, even with this patch, exceptions require memalloc, so the need for no-throw functions doesn't go away.)

dpgeorge · 2018-12-10T12:49:33Z

So, can again this be refactored into gradual parts, and the first part pushed sooner rather than later?

There's still a lot of research to do in this direction to see if it's even the right way to go. In particular 1) get close to 100% conversion of py/ code, include all uses of malloc; 2) quantify in more detail the change in performance using a more comprehensive benchmark test suite, not just pystone.

Just a patch which allows native functions to return MP_OBJ_NULL to signify exception, to be handled in vm.c for MP_BC_CALL_FUNCTION and friends.

Even doing something small this will likely need modifications to the native emitter, and likely reduce performance of it.

pfalcon · 2018-12-10T13:18:42Z

There's still a lot of research to do in this direction to see if it's even the right way to go.

#1245 exists for 3 years. And it's how many other projects do it. So, "a lot of research" could be also a bit less of research relying on others' experience (or more exactly, the reasons to going that way).

Even doing something small this will likely need modifications to the native emitter, and likely reduce performance of it.

Reduce performance and increase generated code size. But native emitter is already known to be slow, as slow or even slower than just bytecode with lookup caching enabled. There's little to lose there, except for a ball and chain which limits possibilities to improve performance further. And complexity of NLR makes it quite complicated. Not totally disabled, like your work in a few recent months showed. But nobody else understands what happens here. And that work is catching up with just supporting all of bytecode features, whereas a key in improving native performance lies in fighting over-dynamic nature of Python, with things like lookup caching, loop optimizations, tracing JIT, etc. And complications like NLR pull attentions to themselves instead of enabling to spend time on more interesting things like the above.

pfalcon · 2019-02-03T08:28:11Z

quantify in more detail the change in performance using a more comprehensive benchmark test suite, not just pystone.

Just noticed there's https://github.com/python/performance now (previous ideas would have been using PyPy benchmarks).

pfalcon · 2019-02-03T08:29:28Z

there's https://github.com/python/performance now

Heh, but won't help, it has stuff like bm_dulwich_log.py ;-)

dpgeorge · 2019-02-03T10:15:51Z

Just noticed there's https://github.com/python/performance now

Yes, I've been (slowly) working on porting this to uPy for a few months now. Have about 13 working. The trick is getting them running on all targets, given the wide range of available RAM and computation power.

dpgeorge · 2019-06-25T07:06:32Z

I rebased this onto latest master and did some benchmarking. For the unix port the result are:

diff of scores (higher is better)
N=5000 M=1000              unix_nlr -> unix_no_nlr         diff      diff% (error%)
bm_chaos.py                27228.57 ->   26802.19 :    -426.38 =  -1.566% (+/-3.48%)
bm_fannkuch.py                13.67 ->      13.59 :      -0.08 =  -0.585% (+/-0.93%)
bm_float.py               539895.29 ->  539184.84 :    -710.45 =  -0.132% (+/-1.08%)
bm_hexiom.py                1037.31 ->    1063.13 :     +25.82 =  +2.489% (+/-1.40%)
bm_nqueens.py             510088.81 ->  520799.21 :  +10710.40 =  +2.100% (+/-0.61%)
bm_pidigits.py              5653.65 ->    5816.84 :    +163.19 =  +2.886% (+/-0.47%)
misc_aes.py                31826.67 ->   33749.58 :   +1922.91 =  +6.042% (+/-1.09%)
misc_mandel.py            203821.71 ->  199391.93 :   -4429.78 =  -2.173% (+/-2.44%)
misc_pystone.py           155957.30 ->  155210.14 :    -747.16 =  -0.479% (+/-3.53%)
misc_raytrace.py            8390.60 ->    8288.99 :    -101.61 =  -1.211% (+/-2.21%)

For PYBD_SF2:

diff of scores (higher is better)
N=100 M=100                pybd_nlr -> pybd_no_nlr         diff      diff% (error%)
bm_chaos.py                  301.39 ->     304.76 :      +3.37 =  +1.118% (+/-0.04%)
bm_fannkuch.py                79.27 ->      76.34 :      -2.93 =  -3.696% (+/-0.05%)
bm_float.py                 4678.00 ->    4775.79 :     +97.79 =  +2.090% (+/-0.03%)
bm_hexiom.py                  38.80 ->      38.29 :      -0.51 =  -1.314% (+/-0.03%)
bm_nqueens.py               4167.27 ->    4167.37 :      +0.10 =  +0.002% (+/-0.02%)
bm_pidigits.py               746.46 ->     698.95 :     -47.51 =  -6.365% (+/-0.04%)
misc_aes.py                  376.95 ->     376.66 :      -0.29 =  -0.077% (+/-0.05%)
misc_mandel.py              3227.21 ->    3148.61 :     -78.60 =  -2.436% (+/-0.08%)
misc_pystone.py             1946.06 ->    1956.00 :      +9.94 =  +0.511% (+/-0.02%)
misc_raytrace.py              36.25 ->      37.09 :      +0.84 =  +2.317% (+/-0.02%)

That's mostly "not much of a change", within error. So far it looks good, but it would be nice to have more benchmark tests and run on more targets, to see how they all are affected.

all unix coverage tests pass change in size so far: bare-arm: +876 +1.310% [incl +4(bss)] minimal x86: +2396 +1.552% [incl +4(data) +4(bss)] unix x64: +4768 +0.958% [incl +32(data) +8(bss)] unix nanbox: +4952 +1.114% [incl +4(data) +4(bss)] stm32: +2468 +0.680% [incl +4(bss)] cc3200: +1624 +0.875% esp8266: +2456 +0.375% [incl +8(bss)] esp32: +1956 +0.173% [incl -32(data)] nrf: +1476 +1.012% [incl +4(bss)] samd: +1208 +1.186% [incl +4(bss)]

Change in code size up to here: bare-arm: +1036 +1.560% [incl +4(bss)] minimal x86: +2632 +1.715% [incl +4(data) +4(bss)] unix x64: +5304 +1.065% [incl +32(data) +8(bss)] unix nanbox: +5488 +1.236% [incl +4(data) +4(bss)] stm32: +2664 +0.728% [incl +4(bss)] PYBV10 cc3200: +1800 +0.974% esp8266: +2744 +0.420% esp32: +2240 +0.202% GENERIC nrf: +1480 +1.018% [incl +4(bss)] pca10040 samd: +1332 +1.313% [incl +4(bss)] ADAFRUIT_ITSYBITSY_M4_EXPRESS

Change the newly-added "if False" to "if True" in tests/run-tests to enable this fuzzing-like test mode that searches for OOM error handling.

pmp-p · 2020-05-16T06:05:18Z

@dpgeorge I've been using that model (in a configurable way) for a year now. So will it happen someday or not ?

dpgeorge · 2020-05-16T06:28:21Z

I've been using that model (in a configurable way) for a year now

How did you make the removal of NLR configurable?

So will it happen someday or not ?

Yes, one day. But it needs to wait for MicroPython 2.0 because it is a very big change.

pmp-p · 2020-05-16T06:42:48Z

How did you make the removal of NLR configurable?

to keep the code clean everywhere possible : mostly with macro on mp_raise_* , when in NO_NLR they { return *_o() } otherwise they nlr_raise() it's a bit hacky but very readable .

Some files are specifically tagged _no_nlr.c for two reasons :

they would crawl under useless ifdef spaghetti.
they are in py/* so and they don't (won't) change much (now that coding style is hardcoded).

also my no_nlr vm.c is a clearly separate file because design is different ( calling model , c-stack usage ... ) so it does not interfere with nlr model at all.
the removing of nlr allowed me to implement http://man7.org/linux/man-pages/man3/aio_suspend.3.html as a python call on wasm. ( so https://pypi.org/project/aio/ is no more an april prank , at least on MicroPython ! )

dpgeorge · 2022-09-23T03:11:16Z

I will close this PR for the following reasons:

It's still a lot of work to get over the line and make it ready to merge (a LOT of work, making all the tests pass, converting all ports).
Emscripten now supports longjmp/setjmp properly, so that use-case is now gone.
We have yet to encounter any other architecture for which we couldn't either use the builtin setjmp/longjmp mechanism, or write a custom NLR handler.
It has a significant code size increase.
There are other ways (eg STACKLESS mode) to reduce C stack usage.
It puts a very big burden on users to rewrite all their C code/extensions.
Users will not see any benefit from this PR (apart from those wanting to deeply embed MicroPython in a non-standard architecture).
The time working on this could be better spent on other things which would have a much bigger positive improvement for MicroPython.

dpgeorge mentioned this pull request Sep 13, 2018

What if we get rid of NLR stuff? #1245

Closed

pfalcon reviewed Sep 13, 2018

View reviewed changes

dpgeorge force-pushed the py-remove-nlr branch from 93f8ae8 to 9347f34 Compare October 5, 2018 06:26

dpgeorge force-pushed the py-remove-nlr branch from 9347f34 to 882e2b9 Compare October 20, 2018 12:32

dpgeorge force-pushed the py-remove-nlr branch 4 times, most recently from be7def9 to 86e7fd5 Compare December 5, 2018 12:46

pfalcon reviewed Dec 9, 2018

View reviewed changes

dpgeorge force-pushed the py-remove-nlr branch from 86e7fd5 to 6fd780d Compare December 10, 2018 05:44

GeorgeWort mentioned this pull request May 31, 2019

Remove nlr microbit-foundation/micropython-simulator#3

Closed

dpgeorge force-pushed the py-remove-nlr branch from 6fd780d to 2cded98 Compare June 25, 2019 06:55

dpgeorge added 21 commits December 13, 2019 22:42

WIP return MP_OBJ_NULL to indicate exc raised in more API functions

eb1794c

WIP get native code working without NLR

ac23580

WIP py: rename cur_exc to active_exception

9a6fb4d

WIP extmod: rename cur_exc to active_exception

ca8e055

WIP py: handle some more exceptions

7f42429

WIP extmod: handle some more exceptions

d8b751a

WIP unix: handle OS errno exceptions

0dfd1c6

WIP qemu-arm: get working without NLR

4affcc0

WIP lib/upytesthelper: remove need for NLR

070880f

WIP py: support stackless without NLR

83431c3

WIP py: start to check for OOM in lexer/parser/compiler

50e45bf

WIP add a way to automatically test for OOM errors

599d965

Change the newly-added "if False" to "if True" in tests/run-tests to enable this fuzzing-like test mode that searches for OOM error handling.

WIP py/runtime: Clear active_exception at very start of mp_init().

82d712c

WIP py/vm: Fix access of current exception with sys.settrace enabled.

578f7a2

WIP esp8266: Use minimal manifest for 512k build.

7885d49

WIP extmod/modussl_axtls: Check NULL return from mp_obj_str_get_data.

4561694

WIP py/objstr: Handle invalid iteration when constructing bytes.

b4710b4

WIP extmod/vfs_lfs: Make it work without NLR.

b00107a

WIP updates to work after rebase

c5da53f

dpgeorge force-pushed the py-remove-nlr branch from 16291d4 to c5da53f Compare December 13, 2019 12:35

pmp-p mentioned this pull request Dec 30, 2020

Improve the MicroPython simulator lvgl/lvgl#1320

Closed

6 tasks

dpgeorge added the py-core Relates to py/ directory in source label Nov 30, 2021

dpgeorge closed this Sep 23, 2022

dpgeorge mentioned this pull request Sep 23, 2022

WIP: use a root stack to explicitly track root pointers #4723

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[proof of concept] completely remove nlr from py, extmod and unix port #4131

[proof of concept] completely remove nlr from py, extmod and unix port #4131

dpgeorge commented Sep 13, 2018

dpgeorge commented Sep 13, 2018

pfalcon commented Sep 13, 2018

pfalcon commented Sep 13, 2018

pfalcon Sep 13, 2018

dpgeorge Sep 13, 2018

pfalcon commented Sep 13, 2018

stinos commented Sep 13, 2018 •

edited

Loading

dpgeorge commented Sep 13, 2018

dpgeorge commented Sep 13, 2018

pfalcon commented Sep 30, 2018

pfalcon commented Sep 30, 2018

dpgeorge commented Oct 5, 2018

pfalcon Dec 9, 2018

dpgeorge Dec 10, 2018

pfalcon Dec 9, 2018

dpgeorge Dec 10, 2018

pfalcon Dec 10, 2018

pfalcon commented Dec 9, 2018

dpgeorge commented Dec 10, 2018

pfalcon commented Dec 10, 2018

pfalcon commented Feb 3, 2019

pfalcon commented Feb 3, 2019

dpgeorge commented Feb 3, 2019

dpgeorge commented Jun 25, 2019

pmp-p commented May 16, 2020

dpgeorge commented May 16, 2020

pmp-p commented May 16, 2020 •

edited

Loading

dpgeorge commented Sep 23, 2022

[proof of concept] completely remove nlr from py, extmod and unix port #4131

[proof of concept] completely remove nlr from py, extmod and unix port #4131

Conversation

dpgeorge commented Sep 13, 2018

dpgeorge commented Sep 13, 2018

pfalcon commented Sep 13, 2018

pfalcon commented Sep 13, 2018

pfalcon Sep 13, 2018

Choose a reason for hiding this comment

dpgeorge Sep 13, 2018

Choose a reason for hiding this comment

pfalcon commented Sep 13, 2018

stinos commented Sep 13, 2018 • edited Loading

dpgeorge commented Sep 13, 2018

dpgeorge commented Sep 13, 2018

pfalcon commented Sep 30, 2018

pfalcon commented Sep 30, 2018

dpgeorge commented Oct 5, 2018

pfalcon Dec 9, 2018

Choose a reason for hiding this comment

dpgeorge Dec 10, 2018

Choose a reason for hiding this comment

pfalcon Dec 9, 2018

Choose a reason for hiding this comment

dpgeorge Dec 10, 2018

Choose a reason for hiding this comment

pfalcon Dec 10, 2018

Choose a reason for hiding this comment

pfalcon commented Dec 9, 2018

dpgeorge commented Dec 10, 2018

pfalcon commented Dec 10, 2018

pfalcon commented Feb 3, 2019

pfalcon commented Feb 3, 2019

dpgeorge commented Feb 3, 2019

dpgeorge commented Jun 25, 2019

pmp-p commented May 16, 2020

dpgeorge commented May 16, 2020

pmp-p commented May 16, 2020 • edited Loading

dpgeorge commented Sep 23, 2022

stinos commented Sep 13, 2018 •

edited

Loading

pmp-p commented May 16, 2020 •

edited

Loading