-
-
Notifications
You must be signed in to change notification settings - Fork 7.9k
Change in OSX Catalina makes matplotlib + multiprocessing crash #15410
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I think we are holding the ttf open because we do not pull the full font up into ram, but rather go back to the file to pull glyphs out as needed. IIRC, this is done for performance reasons. It does seem wrong that we have it open twice though (and I also see that behavior). I also have a vague memory that multi-process behaves differently on OSX than on linux, could that be part of the problem here? |
Do you know who introduced it / which issue? (I couldn't find it with a quick search)
With or without multiprocessing involved? |
My (uneducated -- I don't have a mac to test this) guess is that this is not a change on mpl's side, but rather due to a change in how the newest MacOS handles FILE* pointers after a fork() (which is how multiprocessing is implemented by default on Unices), e.g. what happens if they are close()d in a separate process. |
I believe this was done in #5299 and #5410 as part of the 2.0 work. I get two file handles to each font in without multiprocessing. What version of mpl and python are you testing with? Poking around at this I found matplotlib/lib/matplotlib/font_manager.py Lines 1329 to 1334 in 19d589c
which suggest py37 and mpl31 may not have a problem? Is this a mac only problem? |
I've only seen this issue in a very specific case: on my macBook, after upgrade to macOS version 10.15 "Catalina", when running this with Python 3.7 from Anaconda, and MPL 3.1.1:
I didn't see the issue on older macOS versions, and we don't see it on Linux or Windows, and I couldn't reproduce the issue with a simpler test case or outside of pytest. Does Probably others will see the same issue as they update to macOS version 10.15 and run Python multiprocessing code (and most not making the connection that MPL trying to close a file from a child process is the issue, because the filename isn't printed). The problem is that we're really stuck here: the only option I know of is to remove the use of multiprocessing completely from our library (Gammapy) to fix this. I think it's very exceptional that scientific Python libs keep files open? Is changing this behaviour up for debate? |
I guess the problem is that cache_clear occurs too late -- it closes the file after the fork(). We could instead do this before the fork but then every new process spawn will result in the font cache being flushed, which seems wrong. |
Before we go to far down this route, I think we need to confirm that it actually is mpl's file handles that are the problem not something else in gammapy. That you can only reproduce it in a test suite run via pytest also makes me think that this is not Matplotlib specific, or at a minimum that we do not fully understand the source of the problem, as pytest does very magical things under the hood. I think https://bugs.python.org/issue33725 is relevant and suggests that this is not a Matplotlib specific problem.
Can you try forcing one of the other multi-process methods? I don't think holding files open is that odd. IIRC, Mike bench marked that getting the glyphs out of the c api was faster than dictionary lookup. I am also concerned about start up time. It may just be a few extra M of ram, but it is something we would have to read into memory on every import. I am deeply skeptical of a major overhaul of our font handling system based on the information we currently have. |
Will do, but only in a few days.
That's not the same. With
Makes sense. I don't know much about multiprocessing and fork/exec and such things, but I'll try to come up with simple test cases of when this crash does and doesn't occur with MPL. |
I am having the same issue, but unfortunately won't be able to contribute much to the problem's resolution. :/ I was simply wondering if there is a workaround available until you guys find a solution for this?! Thanks for any help! |
It may be worth trying to track down and close all of the file objects that get opened before you fork. The other option is to only import Matplotlib after you have launched your multiprocess code (so there are no file handles that can leak between processes). If I am reading the issues from upstream correctly, moving to py38 may also fix the problem, but you will need to compile many things from source as not a lot of wheels are up yet (ex ours ;) ). |
I too had this issue recently; and I too just upgraded my 2015 MacBook Pro to 10.15 Catalina. I wrote my own object-oriented code from scratch for a physics project, and it uses Every 5th - 10th time that I ran my code, I would receive the error above (libc++abi.dylib: etc etc) and had to crash my terminal to start all over. After searching the internet with the copy/paste of the error output, I found @cdeil's issue page on I'm telling you all of this to describe that after seeing what files were open on @cdeil's machine, I turned off the I had the feeling that it was 'just this one |
@exowanderer Are you using both |
Can you try whether #15104 helps? (at least the place throwing the exception shouldn't exist anymore...) If that doesn't help, does diff --git i/lib/matplotlib/font_manager.py w/lib/matplotlib/font_manager.py
index 6d56ed595..3428e831a 100644
--- i/lib/matplotlib/font_manager.py
+++ w/lib/matplotlib/font_manager.py
@@ -1331,7 +1331,7 @@ _get_font = lru_cache(64)(ft2font.FT2Font)
# would be too complicated to be worth it, the main way FT2Fonts get reused is
# via the cache of _get_font, which we can empty upon forking (in Py3.7+).
if hasattr(os, "register_at_fork"):
- os.register_at_fork(after_in_child=_get_font.cache_clear)
+ os.register_at_fork(before=_get_font.cache_clear)
def get_font(filename, hinting_factor=None): fix the issue? (Probably this should be done conditionally on the libc++ version to avoid degrading the performance of fork()ing unless needed.) (#15104 would be preferable, if it works.) |
@anntzer in case this is still useful, modifying the |
What about #15104? |
had the same issue with matplotlib 3.1.2 pymc3 3.8 on catalina. Updating the line os.register_at_fork made the issue disappear to me |
Can you please give a try to #15104? |
@anntzer Sorry I don't have experience with compiling MPL, I get the output below:
|
Looks like a llvm version issue, thanks for giving it a try. |
Let me know if there is something I can do to fix it and try again
|
Quite a few places relating to this error suggest to upgrade your llvm. If you can give it a try, that would be appreciated. |
I also just came across this problem (although it took me two days to realize it was a problem with matplotlib). I can confirm that changing the |
Does anyone who experienced the issue have a clean, simple repro (preferably simpler than #15410 (comment))? Could any core dev on a mac (@jklymak? @efiring? :-)) check based on that repro whether #15104 fixes the problem? |
Even the complicated repro requires a data directory that I don't know where to find and the failing test skips for me. So I can't test...
|
I see a comment above about pytest so to be clear I get the problem by running a unittest-style test file from the command line. No pytest involved. |
see matplotlib/matplotlib#15410 matplotlib's scheme for cacheing fonts in memory leads apparently random child processes to crash the PyMC3 sampler. (on OSX catalina). the fix is simple (but also bad): *do not use matplotlib and multiprocessing in the same program*.
As requested on twitter (https://twitter.com/matplotlib/status/1249878438883872768?s=20), here is some code that produces this issue for me on OSX 10.15.4 using Py37: import numpy as np
from qiskit.tools.parallel import parallel_map
#import matplotlib.pyplot as plt
#x = np.arange(30)
#plt.plot(x, np.sin(x))
def f(x):
return np.sin(x)/2
for _ in range(10):
parallel_map(f, np.arange(50)) Here It runs fine with all the MPL stuff commented out. However, after uncommenting and plotting it will fail with the error:
and the terminal (Above is running in a notebook) shows:
|
We have a similar issue on macOS Catalina, and one workaround that worked for us was upgrading to Python 3.8.2;; |
@nonhermitian @hisplan Can you please check https://bugs.python.org/issue33725 In particular https://bugs.python.org/issue33725#msg329923 which suggests that adding mp.set_start_method('forkserver') will fix this problem on all versions of python3 (the reason upgrading to 3.8.2 works is that this is now the default). |
Using the test notebook provided by @HeyLookItsBrandon (#15410 (comment)) I can reproduce the failure with python 3.7.3. Good news: with @anntzer's #15104, the failure does not occur. |
includes a vagrantfile for building on vagrant tests on macOS catalina run into errors using matplotlib because of error described here matplotlib/matplotlib#15410
I am going to close this as fixed by: a) upstream changing its defaults |
includes a vagrantfile for building on vagrant tests on macOS catalina run into errors using matplotlib because of error described here matplotlib/matplotlib#15410
…t version of matplotlib to avoid issue with file left open on OSX catalina - see matplotlib/matplotlib#15410.
I'm getting a crash on macOS Catalina when running some multiprocessing code after doing some plotting with matplotlib.
Looks like this:
and using
print(psutil.Process().open_files())
I figured out that likely the issue is that this MPL TTF font file is opened multiple times and (I guess) for some reason closing the file occurs only later, from my multiprocessing code, and that fails on macOS Catalina.I tried to extract a minimal test case that doesn't involve our software, but couldn't so far, sorry.
But you should be able to reproduce the crash by copy & pasting these commands:
Log: https://gist.github.com/cdeil/a75211856a3bcab751ead707df9708c9#file-gistfile1-txt-L88
I know MPL isn't thread-safe. But it should be possible to do some plotting, and then later after plotting is done to run some multiprocessing code, no?
Is it normal that the file handle for
DejaVuSansDisplay.ttf
remains open?Shouldn't MPL load the file content and close the file handle directly?
Maybe that is a bug?
I don't think it matters for this issue, but in case it does -- we are calling
plt.close()
after all plotting from our tests, because previously we had a problem with too many open figures (see [here](https://github.com/gammapy/gammapy/blob/d4db11d559b210c0e0d03bba75ddcff9c03e0511/gammapy/utils/testing.py#L209-L230
The text was updated successfully, but these errors were encountered: