Skip to content

Fix macosx segfault #17084

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
May 5, 2020
Merged

Conversation

cbrnr
Copy link
Contributor

@cbrnr cbrnr commented Apr 9, 2020

Fixes #17061.

Please note that I don't know if these changes have adverse effects; they fix the segfault described in #17061 though. It would be great if someone familiar with the macosx backend source took a quick look.

@QuLogic
Copy link
Member

QuLogic commented Apr 10, 2020

If you delete the figure, and run a gc, does it also crash? As you noted, the code in FigureManager_dealloc is the same and I would expect it to be called either on exit, or on a garbage collection run.

@cbrnr
Copy link
Contributor Author

cbrnr commented Apr 10, 2020

@QuLogic I'm not sure how to do this. I tried with the following code (which works on master):

import matplotlib.pyplot as plt

fig, ax = plt.subplots()
plt.show(block=False)
plt.close(fig)
del fig

However, del fig only deletes the name and not the object, after which GC may kick in and eventually remove the fig object (I also tried appending some code to the end to avoid it being the last line). I don't think that this triggers any of the two functions.

@QuLogic
Copy link
Member

QuLogic commented Apr 10, 2020

My thought is either add gc.collect() after deleting it, or never close or delete the figure (then it'd get collected on exit?)

@cbrnr
Copy link
Contributor Author

cbrnr commented Apr 10, 2020

No, this doesn't trigger the segfault.

@larsoner
Copy link
Contributor

I think plt.close(fig) followed by del fig and gc.collect() is not necessarily enough to get it to go away (at least in Linux testing). Adding this to the Figure class in matplotlib/figure.py:

    def __del__(self):
        print('deleting %s' % (self,))

And running:

import gc
import matplotlib.pyplot as plt

fig, ax = plt.subplots()
plt.show(block=False)
plt.close(fig)
del fig
gc.collect()

Nothing prints until exit() is actually called (and then it prints the "deleting" line). Starting to look at why, I tried:

import gc
import matplotlib.pyplot as plt

fig, ax = plt.subplots()
plt.show(block=False)
plt.close(fig)
print(gc.collect())
ref = [x for x in gc.get_referrers(fig) if not isinstance(x, dict)]
print(ref)
del fig
print(gc.collect())

We see:

0
[<bound method Figure.delaxes of <Figure size 640x480 with 1 Axes>>]
0

Commenting this line:

        ax._remove_method = self.delaxes

Makes the list empty, but it's still not GC'ed. Not sure why that would be, maybe someone wants to look deeper. But at least this suggests to me that the del and gc.collect() are not enough to trigger the segfault possibly because the figure is not actually getting GC'ed.

Also perhaps when exiting (which does indeed cause it to get GC'ed and print) the order of cleanup is different in a way that does not cause a segfault -- you could try printing messages for __del__ methods, and/or adding print messages to close statements to figure this out. It might be informative to see if the order matters.

@QuLogic
Copy link
Member

QuLogic commented Apr 13, 2020

Figures should be GC'able once closed and unreferenced, but I guess that's a separate issue from this one. Maybe adding a lambda around delaxes like @anntzer often does will be enough to break the ref.

@cbrnr
Copy link
Contributor Author

cbrnr commented Apr 17, 2020

I don't quite understand why there are so many NSAutoreleasePool objects in the code. I've never worked with Cocoa, but the documentation clearly states that this is typically not necessary: https://developer.apple.com/library/archive/documentation/Cocoa/Conceptual/MemoryMgmt/Articles/mmAutoreleasePools.html#//apple_ref/doc/uid/20000047

There are three situations where this might be necessary, but none of them apply to matplotlib AFAICT.

Actually, I removed all NSAutoreleasePool objects in the code, and everything I tried still works (and the segfault is also resolved).

@dopplershift
Copy link
Contributor

Reading those docs you linked, I'm inclined to agree with your conclusions. I'd feel more confident if we could do a stress test of the memory usage, though.

@QuLogic
Copy link
Member

QuLogic commented Apr 18, 2020

Ah, there's one small bug in that test; you need to delete ax too, since it holds a ref to its figure:

import matplotlib.pyplot as plt

fig, ax = plt.subplots()
del ax
plt.show(block=False)
plt.close(fig)
del fig
gc.collect()

or never create it:

import matplotlib.pyplot as plt

fig = plt.figure()
plt.show(block=False)
plt.close(fig)
del fig
gc.collect()

@cbrnr
Copy link
Contributor Author

cbrnr commented Apr 28, 2020

I'm not sure how to properly stress test this, but I've compared a memory profile using the following code snippet for both this branch and master:

import tracemalloc
from functools import partial
import matplotlib.pyplot as plt
import matplotlib


matplotlib.use("macosx")
print(matplotlib.__version__)


def close_event(event, fig):
    plt.close(fig)


snapshot1 = tracemalloc.take_snapshot()
for n in range(1000):
    fig = plt.figure()
    close_callback = partial(close_event, fig=fig)
    fig.canvas.mpl_connect("close_event", close_callback)
    plt.show(block=False)  # block=True works
    plt.close(fig)
snapshot2 = tracemalloc.take_snapshot()

diff = snapshot2.compare_to(snapshot1, 'lineno')
for stat in diff[:10]:
    print(stat)

Both branches show the same difference (+2156 KiB) when comparing before and after creating and closing 1000 figures. Here's the output for this branch:

3.2.1.post1906+gb96da85cf
/Users/clemens/Repositories/matplotlib/lib/matplotlib/figure.py:474: size=2156 KiB (+2156 KiB), count=1000 (+1000), average=2208 B
/usr/local/Cellar/python@3.8/3.8.2/Frameworks/Python.framework/Versions/3.8/lib/python3.8/weakref.py:51: size=938 KiB (+938 KiB), count=10000 (+10000), average=96 B
/Users/clemens/Repositories/matplotlib/lib/matplotlib/transforms.py:115: size=937 KiB (+937 KiB), count=15000 (+14997), average=64 B
/Users/clemens/Repositories/matplotlib/lib/matplotlib/transforms.py:199: size=867 KiB (+867 KiB), count=12000 (+12000), average=74 B
/Users/clemens/Repositories/matplotlib/lib/matplotlib/cbook/__init__.py:181: size=820 KiB (+820 KiB), count=5000 (+5000), average=168 B
/Users/clemens/Repositories/matplotlib/lib/matplotlib/artist.py:74: size=788 KiB (+788 KiB), count=2999 (+2999), average=269 B
/Users/clemens/Repositories/matplotlib/lib/matplotlib/cbook/__init__.py:172: size=704 KiB (+704 KiB), count=15001 (+15001), average=48 B
/Users/clemens/Repositories/matplotlib/lib/matplotlib/cbook/__init__.py:179: size=656 KiB (+656 KiB), count=3998 (+3998), average=168 B
/Users/clemens/Repositories/matplotlib/lib/matplotlib/cbook/__init__.py:180: size=539 KiB (+539 KiB), count=7000 (+7000), average=79 B
/Users/clemens/Repositories/matplotlib/lib/matplotlib/cbook/__init__.py:170: size=539 KiB (+539 KiB), count=6999 (+6999), average=79 B

And this is master:

3.2.1.post1906+gb96da85cf
/Users/clemens/Repositories/matplotlib/lib/matplotlib/figure.py:474: size=2156 KiB (+2156 KiB), count=1000 (+1000), average=2208 B
/usr/local/Cellar/python@3.8/3.8.2/Frameworks/Python.framework/Versions/3.8/lib/python3.8/weakref.py:51: size=938 KiB (+938 KiB), count=10000 (+10000), average=96 B
/Users/clemens/Repositories/matplotlib/lib/matplotlib/transforms.py:115: size=937 KiB (+937 KiB), count=15000 (+14997), average=64 B
/Users/clemens/Repositories/matplotlib/lib/matplotlib/transforms.py:199: size=867 KiB (+867 KiB), count=12000 (+12000), average=74 B
/Users/clemens/Repositories/matplotlib/lib/matplotlib/cbook/__init__.py:181: size=820 KiB (+820 KiB), count=5000 (+5000), average=168 B
/Users/clemens/Repositories/matplotlib/lib/matplotlib/artist.py:74: size=788 KiB (+788 KiB), count=2999 (+2999), average=269 B
/Users/clemens/Repositories/matplotlib/lib/matplotlib/cbook/__init__.py:172: size=704 KiB (+704 KiB), count=15001 (+15001), average=48 B
/Users/clemens/Repositories/matplotlib/lib/matplotlib/cbook/__init__.py:179: size=656 KiB (+656 KiB), count=3998 (+3998), average=168 B
/Users/clemens/Repositories/matplotlib/lib/matplotlib/cbook/__init__.py:180: size=539 KiB (+539 KiB), count=7000 (+7000), average=79 B
/Users/clemens/Repositories/matplotlib/lib/matplotlib/cbook/__init__.py:170: size=539 KiB (+539 KiB), count=6999 (+6999), average=79 B

However, I'm not sure if (1) I'm comparing memory usage correctly, and (2) I'm correctly creating and closing figures in order to test what I want to test.

@cbrnr cbrnr force-pushed the fix-macosx-segfault branch from b96da85 to af58544 Compare April 28, 2020 07:05
@tacaswell
Copy link
Member

I am inclined to merge this conditional on one of our OSX based developers testing it locally


The autorelease pools may be left over from porting the previous mac backend (which had it's own renderer) to one the current version that uses Agg to render the plot.

Copy link
Member

@tacaswell tacaswell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Conditional on a committer with OSX verifying it works locally.

@cbrnr
Copy link
Contributor Author

cbrnr commented Apr 29, 2020

I think that's a good strategy. Even if it turns out that these changes do not work as expected in all scenarios, this can be quickly reverted, and users can always switch to the Qt5 backend. MNE devs recommend the PyQt5 backend even on macOS anyway, so in the long run it might be another option to change the default backend to PyQt5/PySide2 (QtPy) as is already the case on Linux and Windows.

@tacaswell tacaswell requested a review from efiring April 29, 2020 18:43
@efiring
Copy link
Member

efiring commented Apr 29, 2020

This seems to have broken the save dialog. The error below results from clicking on the save-file icon in the toolbar after making a trivial plot.

In [5]: 2020-04-29 09:14:25.097 python[551:23058739] +[NSXPCSharedListener endpointForReply:withListenerName:]: an error occurred while attempting to obtain endpoint for listener 'com.apple.view-bridge': Connection interrupted

@lawrence-danna-apple
Copy link
Contributor

lawrence-danna-apple commented Apr 29, 2020

I can't reproduce the original crash, so I don't understand why this fixes it. But looking over the code it does appear to me that @tacaswell is right -- the autorelease pools were leftover from a previous implementation and are no longer serving a purpose.

Also, by the way, the best way to create an autorelease pool is @autoreleasepool rather than by allocating it directly. See: https://developer.apple.com/library/archive/documentation/Cocoa/Conceptual/MemoryMgmt/Articles/mmAutoreleasePools.html

@efiring
Copy link
Member

efiring commented Apr 29, 2020

https://bugs.openjdk.java.net/browse/JDK-8227836
It seems an identical error arises in a completely different context (java), but again connected with trying to bring up a file dialog.

@cbrnr
Copy link
Contributor Author

cbrnr commented Apr 30, 2020

@lawrence-danna-apple after checking out cbrnr:fix-maxosx-segfault, did you compile the Objective-C file (i.e. did you pip install -ve .)? I just tried again, and the segfault still occurs with master. For reference, here's the example script which you need to run in script mode (the issue isn't raised when run in interactive mode), i.e. run python example.py:

from functools import partial
import matplotlib.pyplot as plt
import matplotlib


matplotlib.use("macosx")


def close_event(event, fig):
    plt.close(fig)


fig, ax = plt.subplots()
close_callback = partial(close_event, fig=fig)
fig.canvas.mpl_connect("close_event", close_callback)
plt.show(block=False)

@cbrnr
Copy link
Contributor Author

cbrnr commented Apr 30, 2020

@efiring I cannot reproduce the error you encounter with the save dialog. I tried plotting a figure, clicking on save, and saving the figure, and everything worked (on master as well as on this branch). I guess this is a different issue not related to these changes here.

@efiring
Copy link
Member

efiring commented Apr 30, 2020

I couldn't reproduce the original crash, either. Maybe there is some subtle timing involved, such that different hardware and/or OS versions behave differently. Two tests, A and B: for you, A fails and B passes; for me, A passes and B fails. Very strange. I'm on a Macbook pro 16", 6-core I7 at 2.6 GHz, Catalina 10.15.3.

@cbrnr
Copy link
Contributor Author

cbrnr commented Apr 30, 2020

@efiring that's really strange, but with macOS you never know what they changed. I'm on 10.15.4 on an older MacBook Pro 15" mid-2014. You did try the example in script mode (#17084 (comment)), right?

@efiring
Copy link
Member

efiring commented Apr 30, 2020

Yes, that's the one. But now I'm baffled. Repeating the tests, I still can't reproduce the original failure on master...but neither can I reproduce the save-dialog failure on your branch. I know it failed for me twice before. I'm stumped.

@cbrnr
Copy link
Contributor Author

cbrnr commented Apr 30, 2020

Oh wow, did I fix this by accident too 😛? On a serious note, does #17263 address the dialog issue?

This keeps happening to me, but most of the times I am on the wrong branch, forget to compile with pip install -ve . or use the wrong matplotlib package (not the dev version). I find that it helps to completely start from scratch in these situations. But given that some people can reproduce the issue whereas others cannot, there seems to be something else (like different hardware, OS versions, and what not) involved. As long as after the fix this example works for everyone, it is not that important that everyone can reproduce the issue in the first place (but at least some people can confirm).

Copy link
Member

@efiring efiring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still can't reproduce my earlier problem with this, so let's assume it wasn't caused by this PR.

Copy link
Contributor

@dopplershift dopplershift left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was able to reproduce the crash from #17061, and this PR fixes the crash for me. I also tested with a variety of static, animation, and event handling examples; everything seems to be working fine.

@dopplershift dopplershift merged commit cb34934 into matplotlib:master May 5, 2020
@lumberbot-app
Copy link

lumberbot-app bot commented May 5, 2020

Owee, I'm MrMeeseeks, Look at me.

There seem to be a conflict, please backport manually. Here are approximate instructions:

  1. Checkout backport branch and update it.
$ git checkout v3.2.x
$ git pull
  1. Cherry pick the first parent branch of the this PR on top of the older branch:
$ git cherry-pick -m1 cb34934c6f9688571508151068c72e535e40162a
  1. You will likely have some merge/cherry-pick conflict here, fix them and commit:
$ git commit -am 'Backport PR #17084: Fix macosx segfault'
  1. Push to a named branch :
git push YOURFORK v3.2.x:auto-backport-of-pr-17084-on-v3.2.x
  1. Create a PR against branch v3.2.x, I would have named this PR:

"Backport PR #17084 on branch v3.2.x"

And apply the correct labels and milestones.

Congratulation you did some good work ! Hopefully your backport PR will be tested by the continuous integration and merged soon!

If these instruction are inaccurate, feel free to suggest an improvement.

@cbrnr cbrnr deleted the fix-macosx-segfault branch May 5, 2020 06:53
QuLogic pushed a commit to QuLogic/matplotlib that referenced this pull request May 6, 2020
QuLogic added a commit that referenced this pull request May 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Segmentation fault with macosx backend
7 participants