-
-
Notifications
You must be signed in to change notification settings - Fork 8.2k
[proof of concept] completely remove nlr from py, extmod and unix port #4131
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There is still a lot of work to do here, even for the core, mostly to check the return value of all "malloc" calls. The existing test suite doesn't check out-of-memory errors for all possible cases (eg OOM when increasing the size of a list due to a slice insertion) so tests would need to be written for these and then error checking added. |
Ha-ha, looking at your recent efforts to revamp exception handling for native, I also again was thinking whether "pass explicitly" scheme would have been better... Reading thru the description though... |
So, what's "o" ? |
} | ||
return 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this good idea to return 1 in case of failure, and 0 on success? Why not bool return?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was waiting until it was all converted to see what the best thing to return would be for functions like this. It might be best to use errno-like codes, -1 for error, 0 for success, etc. Then there can be common wrappers/macros to deal with it all.
IMHO, this is rather modest changes. Just need to cut some code somewhere as a moral compensation ;-).
This is harder to believe in. Maybe on modern superscalar CPUs it's not noticeable, but on simple pipelined architecture that would eat at least 2 cycle (cmp + false jump) in many places. It's hard to believe it's not noticed. And pipelining is still an advanced arch, there're CPUs without it. So, I'd say performance aspect should be studied more, and all steps should be taken to minimize effect of exception checks (MP_UNLIKELY used, and then the check abstracted to a macro (likely a few macros)). But overall, if the direction of native code generation is to be pursued, I think that's the adopting this change is the only sane way, because NLR is limiting factor both in terms of complexity, as the recent patches showed, and achieving greater performance. But it means that generated native code will only take more space. A good matter is that the conversion can happen gradually, with 2 systems coexist while conversion is performed. But as there's no talking about making this configurable, doing some additional impact analysis before going for the switchover makes sense. |
Agreed, would be interesting to measure impact on some real-life code. |
It just means "this function returns an invalid object" (o for object) which can be returned by the caller to make it easier to propagate the error up. I was also going to add |
Thanks for fast feedback. My general feeling is that this is the right direction to move in, eventually. The reason I post the code here now is because it was much easier than I expected to convert things. But still, I think the biggest barrier is that it will really break the C API and a lot of people are going to have to rewrite (or at least audit) a lot of code. So this would have to wait until a major version bump, eg 2.0 or even 3.0. |
Why putting off this for so long, ain't there enough great patches are sitting in half-done/stale shape? This one already conflicting in a bunch of files. I was probably was too optimistic above re: being able to convert incrementally, without much effort, because any occurrence of nlr_push would need to check for both old and new exception-raising, but then how much of "user code" uses nlr_push? Grepping extmod/ should give a good idea:
Both uos_dupterm.c and vfs.c implement special functionality. No other application-level modules use it. Just for comparison, I also grepped openmv's repo, it has hits only in "main.c". |
Btw, another missed comment:
That was smart. I always was thinking that it would require returning exception object as a result of each function, and thus digging out an extra bit in object representations, and then noticeable changes/compromises/drawbacks. |
Because there's still a lot more to do to get this even working with the core: all existing uses of the GC malloc must be augmented with a check for success/failure, and then handle failure in an appropriate way. And then (coverage) tests need to be written for all these additions. |
93f8ae8
to
9347f34
Compare
9347f34
to
882e2b9
Compare
be7def9
to
86e7fd5
Compare
py/mpstate.h
Outdated
@@ -242,6 +242,8 @@ typedef struct _mp_state_thread_t { | |||
mp_obj_dict_t *dict_locals; | |||
mp_obj_dict_t *dict_globals; | |||
|
|||
mp_obj_base_t *cur_exc; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, we already have cur_exception
. But it's currently in mp_state_vm_t, which is not correct: https://docs.python.org/3/library/sys.html#sys.exc_info
The information returned is specific both to the current thread and to the current stack frame.
Please move it to mp_state_thread_t and apparently use for this patchset.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a new entry to not tangle this set of patches too much with the existing code. The idea would be to clean up things like this if the patches get merged. And it's not immediately clear that cur_exc
and cur_exception
can be the same thing, due to the VM setting/clearing cur_exception
at various points.
@@ -158,6 +160,13 @@ NORETURN void mp_raise_NotImplementedError(const char *msg); | |||
NORETURN void mp_raise_OSError(int errno_); | |||
NORETURN void mp_raise_recursion_depth(void); | |||
|
|||
mp_obj_t mp_raise_o(mp_obj_t exc); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think these should be macros instead. Or at least, macro variants provided:
#define MP_RAISE_O(exc) { return mp_raise_o(exc); }
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I mentioned above, this was just a temporary name that would be renamed back to mp_raise()
once everything was converted. If it's important to have them as macros/inline functions then they should be lower case because of #293 (which is still a to-do). In that case there's no hurry to make them inline functions because it can be easily changed later (eg true function mp_raise
changed to inline function/macro mp_raise
which calls helper mp_raise_helper
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's important to have them as macros/inline functions
So, I'm interested in starting to adopt these sooner rather than later. Then, having that as a macro is important. For example, MP_RAISE_NEW() could be defined as:
#if USE_NEW_EXC
#define MP_RAISE_NEW(exc) { return mp_raise_o(exc); }
#else
#define MP_RAISE_NEW(exc) nlr_raise(exc);
#endif
And gradual conversion can be started right away. Again, exact naming is not important, the argument for incremental conversion started "soon". Whereas waiting for "once everything was converted" is a separate, always-bitrotting patch may take years.
So, can again this be refactored into gradual parts, and the first part pushed sooner rather than later? Just a patch which allows native functions to return MP_OBJ_NULL to signify exception, to be handled in vm.c for MP_BC_CALL_FUNCTION and friends. Complete switchover can be left for next steps, which are again can be done in similar gradual steps, literally bytecode by bytecode. In the end what will be left is to just turn off nlr_* stuff. Note that this stuff can be seen as the root cause of arguments like in #4217 - exceptions are known to be heavy so there're a pressure to devise APIs which avoid them. (To be fair, even with this patch, exceptions require memalloc, so the need for no-throw functions doesn't go away.) |
86e7fd5
to
6fd780d
Compare
There's still a lot of research to do in this direction to see if it's even the right way to go. In particular 1) get close to 100% conversion of py/ code, include all uses of malloc; 2) quantify in more detail the change in performance using a more comprehensive benchmark test suite, not just pystone.
Even doing something small this will likely need modifications to the native emitter, and likely reduce performance of it. |
#1245 exists for 3 years. And it's how many other projects do it. So, "a lot of research" could be also a bit less of research relying on others' experience (or more exactly, the reasons to going that way).
Reduce performance and increase generated code size. But native emitter is already known to be slow, as slow or even slower than just bytecode with lookup caching enabled. There's little to lose there, except for a ball and chain which limits possibilities to improve performance further. And complexity of NLR makes it quite complicated. Not totally disabled, like your work in a few recent months showed. But nobody else understands what happens here. And that work is catching up with just supporting all of bytecode features, whereas a key in improving native performance lies in fighting over-dynamic nature of Python, with things like lookup caching, loop optimizations, tracing JIT, etc. And complications like NLR pull attentions to themselves instead of enabling to spend time on more interesting things like the above. |
Just noticed there's https://github.com/python/performance now (previous ideas would have been using PyPy benchmarks). |
Heh, but won't help, it has stuff like bm_dulwich_log.py ;-) |
Yes, I've been (slowly) working on porting this to uPy for a few months now. Have about 13 working. The trick is getting them running on all targets, given the wide range of available RAM and computation power. |
I rebased this onto latest master and did some benchmarking. For the unix port the result are:
For PYBD_SF2:
That's mostly "not much of a change", within error. So far it looks good, but it would be nice to have more benchmark tests and run on more targets, to see how they all are affected. |
all unix coverage tests pass change in size so far: bare-arm: +876 +1.310% [incl +4(bss)] minimal x86: +2396 +1.552% [incl +4(data) +4(bss)] unix x64: +4768 +0.958% [incl +32(data) +8(bss)] unix nanbox: +4952 +1.114% [incl +4(data) +4(bss)] stm32: +2468 +0.680% [incl +4(bss)] cc3200: +1624 +0.875% esp8266: +2456 +0.375% [incl +8(bss)] esp32: +1956 +0.173% [incl -32(data)] nrf: +1476 +1.012% [incl +4(bss)] samd: +1208 +1.186% [incl +4(bss)]
Change in code size up to here: bare-arm: +1036 +1.560% [incl +4(bss)] minimal x86: +2632 +1.715% [incl +4(data) +4(bss)] unix x64: +5304 +1.065% [incl +32(data) +8(bss)] unix nanbox: +5488 +1.236% [incl +4(data) +4(bss)] stm32: +2664 +0.728% [incl +4(bss)] PYBV10 cc3200: +1800 +0.974% esp8266: +2744 +0.420% esp32: +2240 +0.202% GENERIC nrf: +1480 +1.018% [incl +4(bss)] pca10040 samd: +1332 +1.313% [incl +4(bss)] ADAFRUIT_ITSYBITSY_M4_EXPRESS
Change the newly-added "if False" to "if True" in tests/run-tests to enable this fuzzing-like test mode that searches for OOM error handling.
16291d4
to
c5da53f
Compare
@dpgeorge I've been using that model (in a configurable way) for a year now. So will it happen someday or not ? |
How did you make the removal of NLR configurable?
Yes, one day. But it needs to wait for MicroPython 2.0 because it is a very big change. |
to keep the code clean everywhere possible : mostly with macro on mp_raise_* , when in NO_NLR they { return *_o() } otherwise they nlr_raise() it's a bit hacky but very readable . Some files are specifically tagged _no_nlr.c for two reasons :
also my no_nlr vm.c is a clearly separate file because design is different ( calling model , c-stack usage ... ) so it does not interfere with nlr model at all. |
I will close this PR for the following reasons:
|
This is a proof of concept and work in progress to investigate the possibility of completely removing nlr (setjmp/longjmp-like exception handling) from the code base.
Instead of doing a longjmp, exception handling is now basically handled as follows:
MP_STATE_THREAD(cur_exc)
, then returnMP_OBJ_NULL
MP_OBJ_NULL
then just returnMP_OBJ_NULL
MP_OBJ_NULL
then access the exception inMP_STATE_THREAD(cur_exc)
and clear this variable so the exception doesn't propagate furtherFor functions that don't return an
mp_obj_t
there are some additional rules about how to indicate an exception, but they are quite simple. There are also some helper functions to make the above rules easier to follow (eg for iterating through iterables). In a lot of cases code doesn't need to change much at all from the existing nlr scheme.For this proof of concept the unix port is completely converted to non-nlr code. All tests pass (basics, extmod, io, import, float, etc), excluding threading, and excluding those that use the native emitter (because it still uses nlr).
Change in code size wrt master (absolute bytes and percent bytes):
Performance seems to be unchanged, if anything slightly improved (only tested with pystone.py).
The benefits of removing nlr are:
The drawbacks of removing nlr are: