-
-
Notifications
You must be signed in to change notification settings - Fork 8.2k
Add "Stackless" support (don't use C stack to call bytecode functions) #1172
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…kw_var(). Allow for reuse for stackless design, where preparing args is separate from calling.
I.e. in this mode, C stack will never be used to call a Python function, but if there's no free heap for a call, it will be reported as RuntimeError, not MemoryError, as expected.
Ping. |
Yes, I'm review this now :) |
It's very clean. I agree that the strict mode is the interesting one, but actually I'd consider using the non-strict mode on stmhal. On pyboard without this patch (ie no stackless) I can get a max recursion depth of 70 python calls. With stackless that becomes around 2800! And with non-strict stackless you can still call functions on an interrupt (which you can't do with strict stackless). So I would suggest keeping the option to have strict or non-strict. One major issue: performance. On unix x64 my pystones drop from 87k to 28k with stackless! I'm not sure why it's such a large performance hit... any ideas? |
Also, there is a slight increase in code size even with stackless disabled (around 30-50 bytes on Thumb2 archs). But I would accept that as okay. |
Of course, the slow down is because memory allocation on the heap is slow. Probably it could be improved by explicitly freeing the allocated codestate when unwinding the call stack. |
Thanks for running pystones, yes, we know that uPy's allocation is not the fastest, now we can see how much. I'll try to play with explicit freeing. (OTOH, I have an idea to add #ifdef to not do explicit free's in compiler to save handful of flash bytes ;-) ). |
Neat idea. I just tried a quick hack of this and it saves around 400 bytes on bare-arm. |
Well, for me, with default unix build, it's from 56K to 43K. |
32 or 64 bit? |
64bit. |
And I don't see quantifiable improvements freeing codestate object explicitly in BC_RETURN. It's also kinda not possible to do properly with existing alloc protocol, because size of object is not known. So, skipping that for now. |
Also, looking thru patches, they go in incremental development order, so squashing something together will only complicate later code review/forensics, so going to merge as is. |
Im back - if you can send me a map file from before and after - it might be interesting ? |
This changes a number of things in displayio: * Introduces BuiltinFont and Glyph so the built in font can be used by libraries. For boards with a font it is available as board.TERMINAL_FONT. Fixes micropython#1172 * Remove _load_row from Bitmap in favor of bitmap[] access. Index can be x/y tuple or overall index. Fixes micropython#1191 * Add width and height properties to Bitmap. * Add insert and [] access to Group. Fixes micropython#1518 * Add index param to pop on Group. * Terminal no longer takes unicode character info. It takes a BuiltinFont instead. * Fix Terminal's handling of [###D vt100 commands used when up arrowing into repl history. * Add x and y positions to Group plus scale as well. * Add bitmap accessor for BuiltinFont
I finally think this is ready to be merged. The feature was implemented mostly during long flight day, and I was pretty surprised it went so smooth - I expected it to be more complicated. There were still few issues to resolve after that, but they were tackled too, and I rebased and fixed any uncovered issues for more than month.
So, there're 2 stackless modes: strict and non-strict. Fairly speaking, only strict is interesting in formal features it allows, non-strict was separated out to make the testsuite pass. In strict mode, heapalloc.py expectedly fail - because what it tests is that after heap is exhausted it's still possible to do some memory-requiring operations, which are not possible in stackless mode, because all allocations happen off heap (non-strict just switches to stack in this case). With that in mind, non-strict mode still may be pragmatically interesting for low-RAM ports, because it arguably (but not proven) saves on (bg) size of C stack.
Original commits are provided to show development process. For merging, it may make sense to squash them into smaller number, but probably still not 1.