-
-
Notifications
You must be signed in to change notification settings - Fork 8.2k
Win32 micropython builds corrupt memory upon repeated execution of a particular function #4652
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
If it helps, this issue can also be reproduced on the 'win32' build of Micropython 1.9.4. I have not seen the issue in 64-bit builds, nor have I reproduced the problem on the Gemma M0 (CircuitPython) or our own Atmel-based distribution. |
The first port to reproduce this issue would be "unix" port (under Linux). |
Sorry... I wasn't clear (enough) in my first comment. I cannot reproduce the problem on 64-bit Windows builds, 64-bit OS X builds, or the various embedded ports I have access to (and hardware for). I can try to compile a 32-bit OS X build (but that will have to wait until next week). I don't currently have a linux box to play with :(. |
Thanks for the report. I can't reproduce any issue with the code running under a 32-bit unix MicroPython executable. It could be related to GC scanning of the machine registers (and possibly C stack). You'll need to determine if you use |
Here's the process I used to compile the (Release) win32 port:
That said, according to Visual Studio, MICROPY_GCREGS_SETJMP is indeed set (value (1)) for this particular port. I've rebuilt the solution with MICROPY_GCREGS_SETJMP set to (0), with the same result: |
We'll certainly try debugging this on our end - I was hoping that someone more familiar with the code (that doesn't raise the bar much in my case, unfortunately :( ) might have some insight into the problem. |
Thanks for checking. Since such a problem was never seen/reported before, and it's not (so far) reproducible on the unix port or Win 64-bit, it's most likely something to do with the specific configuration of the Win 32-bit build. And my guess is that the GC is freeing some memory that should not be freed, because the code crashes when it explicitly tries to free a GC region, only to find that it's already freed. Thus it's likely to do with scanning of root pointers in the Win 32 build. |
Well... we have a line number for our first breakpoint (gc.c, line 587), and can try to unwind the stack from there. Looks like my team has some fun ahead of it.... |
Hmm, can't reproduce this in x86 nor x64 builds. Using VS2017 though. And the last commit. I'd be happy to debug this but I tried a bunch of things like more iterations but no avail. Just from the looks of it I'd guess it has something to do with using exec, the code itself alone shouldn't cause this, at least I think. |
Not very likely this has anything to do with timing (at least not in the sense of multithreaded/race condition problems where 'insert a sleep statement somwehere' seems to 'solve' the problem :P), there's only one thread which does everything. And there are quite some other differences in debug builds. I'll try again later with executing the loop more than once. |
Tried this a thousand times, no avail. Any chance you can test this with VS2017? And/or check whether it also happens with the current version of the uPy code? Or come up with a more clear reproduction case? |
@stinos: We use the same toolchain across all our Windows builds, so I'm afraid I'm stuck with VS 2015 right now (I could try to spin up a VM with VS 2017, but that will almost certainly have to wait awhile, unfortunately...). The next step for me might be to try to run the original test suite that spawned the problem without the test harness (we weren't "exec"ing anything in that case - the steps in the description above were the simplest I could find in the limited time I had available to reliably reproduce the problem on my end). I spent a bit of time running the steps above in the debugger - I haven't fully wrapped my head around the way the garbage collector works (there appears to be a memory allocation table, a space reserved for 'finalizers', and the actual memory pool itself, and for the example in this issue once the pool is exhausted, the code sweeps through the allocation table and marks blocks as free... the assertion is thrown during (or right after) one of these sweeps, but I still haven't figured out what the trigger is. It doesn't help that the code in question is called thousands of times before the problem occurs... but that's why we have computers manage the memory instead of humans ;-) ) |
From the assertion that fails, it happens in |
@ddiminnie tried again, this time building MicroPython with the VS2015 (v140) toolset (which should be what you are using, right?), still cannot reproduce it with x86 release builds, also not for v1.10 :( |
@stinos: Our build and test systems are Windows 7 (x64)-based machines. We've run into the cited problem on
Toolchain: @dpgeorge:
I'm going to try to switch to the '_ASSERT' macro from 'crtdbg.h' (and link in the corresponding debug runtime libraries) for the offending line (unlike 'assert', '_ASSERT' actually calls DebugBreak, so it should be possible to get a stack trace). (This is just part of the fun of debugging in Windows Land...) |
@dpgeorge: ucrtbase.DLL!wassert+0x16 |
... and here is that stack trace in more human-readable form: ucrtbase.DLL!_wassert+0x16 Of course this isn't the whole picture (I need to see if I can find a way to capture the gc heap at the time of the assertion failure, but even then the source of the problem may have occurred several steps earlier in the execution sequence). |
Welp... I'm an idiot (which should be apparent from following this thread). size_t block = BLOCK_FROM_PTR(ptr);
// Debugging code FIXME!!!
if (ATB_GET_KIND(block) != AT_HEAD) {
printf("Foo!\n");
}
// End of debugging code FIXME!!!
assert(ATB_GET_KIND(block) == AT_HEAD); Moreover, I've been able to dump the 1Mb memory space allocated for the garbage collector: |
@ddiminnie I also tested this on a Windows 7 system with the specs you mention. But can you state the actual version of the compiler please (i.e. output of
I don't think _wassert is a macro, but a function, so all you needed to do was setting a breakpoint on _wassert. Or else, break on abort: as uasual assert is a macro which eventually calls C's abort() function.. You see, debugging in Windows really isn't that different or more difficult than on another OS :)
This is a bit more tricky but with some digging you could still have figured this out: you have a notion of what the CRT is, you know _ASSERT, you can see there's different ways of handling what the program does on asserts, because yes, it's strange that assert just aborts without the more typcial 'Assertion failed' dialog and ways to hook the debugger, right? The documentation of _ASSERT (for instance) has the clue: turns out there's a _CrtSetReportMode function which alters what happens on asserts exactly in the way witnessed here. And indeed, that gets used in MicroPython, see init.c. Just remove the offending lines and you will get a chance to hook the debugger when running the exe directly. |
@stinos: Just to level-set, I'm not really a software developer (its MUCH worse than that - I'm a mathematician... If you've ever worked in an academic environment, you'll know what that entails... :-) ). My day job consists of validation of algorithms (when possible), or at the very least, testing of implementations of mathematical software. So, I know my way around (some of) the basics of an IDE, can write (simple) programs in C, and basically know just enough to get myself in trouble. (I do the vast majority of my programming in python, for the record.) I say this to make communications simpler as we proceed forward. Now, as for your questions: |
@stinos: In response to your much earlier request for a simpler example, I'm still struggling to find one that triggers this issue (well... if it were simple, lots of users would have complained by now, and any underlying problem would have been fixed long ago). We originally ran into the problem on our own customized Windows port of CircuitPython, which sets the heap size to 20480 bytes, and includes a few other tweaks that (hopefully) don't affect the behavior we're chasing here. We had backported a pull request (MicroPython issue #4133), and were running tests to verify we had merged the changes properly. The suite in question used a mix of decimal values from CPython's 'floating_points.txt' file, along with some additional values of interest to us. We ran the tests by writing to the stdin of our python executable and reading from stdout of the executable. Thus, the original motivating example was somewhat simpler (but more complex to communicate):
At (or around) the 302nd test we consistently hit the ATB_GET_KIND(block) != AT_HEAD assertion on our custom win32 build (but NOT our custom win64 build). (We use the 'flt_hex' function to allow us to reconstruct the exact floating-point value that MicroPython created for each example within our external test harness, to allow us to compare the value to reference IEEE-754 binary64 equivalents of the corresponding decimal values... just in case anyone was wondering at this point...) Naturally, the same set of examples doesn't trigger the problem on any of the MicroPython builds we've created (even after modifying the heap size), so we had to come up with another example. The result of some tinkering was the (awful) example in the original description above (which has the added benefit that it also triggers the issue in our custom build). Unfortunately, it appears that the cited problem has (so far) only been reproduced on builds created at our site. (At least we can reproduce the problem on executables built on our official build machine, on my laptop, and on the laptop of one of the (actual) developers assigned to this project...) |
I've been in academic environments more than anywhere else so yes I get your point :P
19.00.24234.1 here but for the rest it's the same.
Sorry should have been more clear. You don't need the source, you can set breakpoints at any address, and VS helps you with translating functions to addresses: use
Well, as I was typing this I thought "let's try it one more time to be sure" and for reasons I don't know I can now suddenly but reliably reproduce the issue (on a Windows 10 VM, both with VS2015 and VS2017 toolsets!, but still nothing on my main pc). Yay. So it's like really really random and maybe it's colder today and things were set loose :) |
Ok, that's good news, that it's now more reproducible. @stinos can you confirm which NLR setting the build is using, either MICROPY_NLR_SETJMP or MICROPY_NLR_X86? It might be that it needs to trace more of the registers in |
This is all with MICROPY_NLR_SETJMP: the assembly syntax of the other implementations is incompatible with cl. |
@stinos: Awesome! (And thanks for the pointers!) |
@dpgeorge the assert happens when freeing the lexer's |
@stinos I tried a 32-bit unix build with MICROPY_NLR_SETJMP enabled, but didn't see any issues with executing the If we assume it's failing to mark and/or trace the Note that the problem could also lie in @stinos are you able to provide me with a complete disassembly of lexer.c and parse.c, for the binary that crashes? If tracing as described above is not the problem then it's probably something like a memory corruption. |
@dpgeorge This is all with MICROPY_GCREGS_SETJMP set to 1 (just like for nlr the assembly code for x86_64 isn't compatible, so even when setting it to 0 in mpconfigport.h for instance, MICROPY_GCREGS_SETJMP gets reset to 1 in gccollect.c so that is probably what @ddiminnie witnessed). Thanks a lot for the explanation; when testing some more yesterday I was slowly coming to a similar conclusion as I saw gc_collect occurrences where So now I simply checked for I'm attaching the compiler-generated assembly/source listings which might not be the exact same thing as disassembling the object files but should be close enough I think, and fairly readable. |
@dpgeorge: |
Thanks @stinos for the disassembly. Some points to note:
@stinos are you able to provide the disassembly of @ddiminnie the title is ok for now. |
Indeed 'Link Time Code Generation' aka LTO is enabled (that's also one of the reasons I can't just disassemble the object files it seems). Btw I enjoy debugging this but there's only some limited amount of time I can spend on this and wasted hours because adding a bunch of statements to make debugging easier often results in an executable which doesn't exhibit the problem. So by now I'm out of time for today which is a pitty because I'm getting closer. The problem I'm seeing now is:
Disassembly for
Definition of
|
Thanks @stinos, you were indeed very close. It seems that the LTO is inlining The assumption made by uPy that is violated here by LTO is that Maybe the way to fix this is to mark this function as non-inlinable, or at least not touchable by LTO. |
@stinos , @dpgeorge: Thanks for taking the time to analyze this issue. In the short time I've worked with MicroPython and CircuitPython, I've been deeply impressed at both the quality of the implementation and the community around it. I'm grateful that the two of you have devoted so mcuh time to a rather obscure bug in a (seemingly) less active configuration. As a (very) quick-and-dirty experiment, I've turned off 'whole program optimization' (/GL compiler switch) and 'link time code generation' (/LTGC:Incremental linker switch) in the Visual Studio 'micropython' project and rebuilt the win32 release configuration (this is akin to driving a tack with a pile driver, but...). After 11000 iterations of the loop from the original description I have not seen the cited failure. Obviously this is not a long-term fix (though it may be good enough for my team, as performance is not really an issue for us, and this change has the advantage of not affecting the other ports we're building). Also, I've learned a few things from this exchange (which is always welcome!) - I look forward to seeing the proposed (official) fix (should you decide to pursue it)! |
@ddiminnie indeed disabling those optimisations should prohibit the linker from inlining |
Sorry for the late comment - I thought I had replied, but it turns out I had not. Further, the disassembly of the gccollect module shows no
Although disabling these switches does make the problem in the original description go away, I'm afraid the underlying problem has just moved around in "memory allocation/de-allocation space" (meaning another as yet undiscovered example will still cause the problem), assuming inlining of |
It's not a small patch (it's over 1000 lines) but #4723 is a comprehensive fix for the issue here. The alternative is do some trickery to ensure that all registers are indeed captured when doing a GC scan. |
Marking functions with
I verified this results in the 'correct' stack layout where @dpgeorge ok to make a patch for this? |
Yes, sounds good. |
When building with link time optimization enabled it is possible both gc_collect() and gc_collect_regs_and_stack() get inlined into gc_alloc() which can result in the regs variable being pushed on the stack earlier than some of the registers. Depending on the calling convention, those registers might however contain pointers to blocks which have just been allocated in the caller of gc_alloc(). Then those pointers end up higher on the stack than regs, aren't marked by gc_collect_root() and hence get sweeped, even though they're still in use. As reported in micropython#4652 this happened for in 32-bit msvc release builds: mp_lexer_new does two consecutive allocations and the latter triggered a gc_collect() which would sweep the memory of the first allocation again.
When building with link time optimization enabled it is possible both gc_collect() and gc_collect_regs_and_stack() get inlined into gc_alloc() which can result in the regs variable being pushed on the stack earlier than some of the registers. Depending on the calling convention, those registers might however contain pointers to blocks which have just been allocated in the caller of gc_alloc(). Then those pointers end up higher on the stack than regs, aren't marked by gc_collect_root() and hence get sweeped, even though they're still in use. As reported in #4652 this happened for in 32-bit msvc release builds: mp_lexer_new() does two consecutive allocations and the latter triggered a gc_collect() which would sweep the memory of the first allocation again.
This should be resolved by 34a7d7e, so I'll close the issue. |
To confirm, I've applied the patch and run through 50000 iterations of the example in the original description without seeing the issue. Thanks again for all your hard work! |
Thank you for discovering it as well, saves future headaches :) |
…hanges Referencing_documentation_other_libraries
(I apologize in advance for not finding a simpler example/use case for this.)
While running tests on our own fork of micropython, we encountered a bug that crashed the python interpreter (on Windows 32-bit builds only). This bug is also present in the 'stock' 1.10 release of Micropython. Rather than upload our entire test setup, the following (somewhat simpler) example should serve to illustrate the problem. Using the python module listed at the end of this comment, try the following (may need to repeat the last instruction a few times to see the issue):
After one or more iterations of the final step, the following assertion is triggered (followed by shutdown of the micropython.exe process):
Listing for file 'flt_hex.py':
The text was updated successfully, but these errors were encountered: