Skip to content

🐛 [#423] fix reference cycle #434

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Aug 6, 2019
Merged

🐛 [#423] fix reference cycle #434

merged 9 commits into from
Aug 6, 2019

Conversation

GuangTianLi
Copy link
Contributor

@GuangTianLi GuangTianLi commented Jul 16, 2019

Fixes #423

According to the Python Data Model, the code below will cause a reference cycle between __self__ and __dict__.

class Test:
    def start(self, *a, **kw):
        self.run()

    def run(self):
        ...


old_start = Test.start


def _wrap_run(old_run):
    def run(*a, **kw):
        result = old_run(*a, **kw)
        return result
    return run


def test_start(self, *a, **kw):
    self.run = _wrap_run(self.run)
    return old_start(self, *a, **kw) 


Test.start = test_start

if __name__ == "__main__":
    tmp = Test()
    tmp.start()

The references graph like this::

Meantime Python Standard Library - concurrent.futures used thread to manage tasks queue, therefore when sentry patched start method of Thread, it will cause reference cycle and prevent gc from collecting.

So in concurrent.futures.process, the _queue_management_thread in _threads_wakeups will be discarded until running a full gc collection, if not, exception will be raise when python exited, because the actual manage thread has already stopped.

@untitaker
Copy link
Member

This is quite impressive! I don't yet understand how the reference cycle is created but I will take a closer look tomorrow. Thanks for your work!

@GuangTianLi
Copy link
Contributor Author

In the end, I found more proper way to prevent reference cycle, but it is a sort of tricky.

@untitaker
Copy link
Member

Patching the class itself will cause issues if one instantiates multiple instances of thread and starts them. We don't want to wrap run more than once per instance.

@GuangTianLi
Copy link
Contributor Author

GuangTianLi commented Jul 18, 2019

I fixed it by passing the__class__ of instance into the descriptor.

@shidenggui
Copy link

Is there any progress about this pr? We also meet this problem in production.

@bluetech
Copy link
Contributor

bluetech commented Jul 31, 2019

I fixed it by passing the__class__ of instance into the descriptor.

IIUC, this is still not thread safe. When you do this

                self.__class__.run = PatchedInstanceMethodDescriptor(
                    self.__class__.run, hub_
                )

it affects the entire Thread class, not just the single instance which is intended.

@bluetech
Copy link
Contributor

How about something like this (untested):

diff --git a/sentry_sdk/integrations/threading.py b/sentry_sdk/integrations/threading.py
index 3bd6032..5907b45 100644
--- a/sentry_sdk/integrations/threading.py
+++ b/sentry_sdk/integrations/threading.py
@@ -2,7 +2,7 @@ from __future__ import absolute_import
 
 import sys
 
-from threading import Thread
+from threading import Thread, current_thread
 
 from sentry_sdk import Hub
 from sentry_sdk._compat import reraise
@@ -35,26 +35,24 @@ class ThreadingIntegration(Integration):
                 else:
                     hub_ = Hub(hub)
 
-                self.run = _wrap_run(hub_, self.run)
+                # Use the function object rather than the method
+                # descriptor to avoid a reference cycle (issue #434).
+                old_run = self.run.__func__
+                def sentry_run(*a, **kw):
+                    hub = hub_ or Hub.current
+                    with hub:
+                        try:
+                            self = current_thread()
+                            return old_run(self, *a, **kw)
+                        except Exception:
+                            reraise(*_capture_exception())
+                self.run = sentry_run
 
             return old_start(self, *a, **kw)  # type: ignore
 
         Thread.start = sentry_start  # type: ignore
 
 
-def _wrap_run(parent_hub, old_run):
-    def run(*a, **kw):
-        hub = parent_hub or Hub.current
-
-        with hub:
-            try:
-                return old_run(*a, **kw)
-            except Exception:
-                reraise(*_capture_exception())
-
-    return run
-
-
 def _capture_exception():
     hub = Hub.current
     exc_info = sys.exc_info()

@GuangTianLi
Copy link
Contributor Author

I fixed it by passing the__class__ of instance into the descriptor.

IIUC, this is still not thread safe. When you do this

                self.__class__.run = PatchedInstanceMethodDescriptor(
                    self.__class__.run, hub_
                )

it affects the entire Thread class, not just the single instance which is intended.

In my opinion, pathing Thread.start has already affected the entire Thread class and it has nothing with thread safe. But due to an API of Threading module - current_thread will access current thread instance, we can avoid a reference cycle with an easier way.

@untitaker
Copy link
Member

I really like that last fix!

@untitaker untitaker merged commit 5b3412d into getsentry:master Aug 6, 2019
@untitaker
Copy link
Member

Thanks @GuangTianLi !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Exception: raise OSError("handle is closed")
4 participants