Skip to content

Work around potential Mono bug, that hangs the runtime when new threads start #1779

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 4, 2022

Conversation

lostmsu
Copy link
Member

@lostmsu lostmsu commented May 2, 2022

I suspect the issue is actually with Mono GC and the relatively new mode of suspending threads, which has a few (closed) bugs against it, complaining about hangs. In many of those bugs (you can find them here) switching to the old preemptive mode works around the issue. I tried it for our tests, and it seems to solve it too.

The issue also vaguely mentioned in https://www.mono-project.com/docs/advanced/runtime/docs/coop-suspend/#thread-startfinish-still-bad

So the fix is to simply set MONO_THREADS_SUSPEND to preemptive in CI before running tests in Mono.

Fixes #1766

This workaround seems to solve the issue on my local machine as well as in CI. I did 5+ CI full matrix reruns to confirm. Before the fix a few MacOS legs would time out in every run.

Below is some history of finding this workaround.

@lostmsu lostmsu force-pushed the bugs/1766-MacOS-Test-Hang branch 2 times, most recently from 88ecef6 to 64a04d3 Compare May 2, 2022 18:40
@lostmsu
Copy link
Member Author

lostmsu commented May 2, 2022

:/ it does not fail when the blame mode is on in dotnet test :/

On my own machine I was able to reproduce it on the first try, but then it never reproduced after that. :/

@lostmsu
Copy link
Member Author

lostmsu commented May 2, 2022

Looks like the issue may be with the TestThread function. Perhaps we are missing some synchronization there. I previously suspected that it needs memory barriers, because AFAIK acquiring GIL does not insert them. See this run passing on all MacOS versions with that method disabled: https://github.com/pythonnet/pythonnet/actions/runs/2260894105

@lostmsu lostmsu force-pushed the bugs/1766-MacOS-Test-Hang branch 3 times, most recently from 854f20b to 8d9d27e Compare May 3, 2022 20:59
@lostmsu lostmsu force-pushed the bugs/1766-MacOS-Test-Hang branch from 8d9d27e to f730963 Compare May 4, 2022 04:16
@lostmsu lostmsu changed the title WIP MacOS tests hang Work around potential Mono bug, that hangs the runtime when new threads start May 4, 2022
@lostmsu lostmsu requested a review from filmor May 4, 2022 04:33
@lostmsu lostmsu marked this pull request as ready for review May 4, 2022 04:35
@filmor
Copy link
Member

filmor commented May 4, 2022

This is probably also something that we should include in our docs.

@filmor filmor merged commit bbfa252 into pythonnet:master May 4, 2022
@lostmsu lostmsu mentioned this pull request May 4, 2022
@lostmsu lostmsu deleted the bugs/1766-MacOS-Test-Hang branch May 4, 2022 17:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Embedding tests randomly hang in MacOS
2 participants