Track Runtime run number #1074

lostmsu · 2020-03-03T02:31:05Z

Assert, that PyObjects are only disposed in the same run they were created in.

What does this implement/fix? Explain your changes.

TBH, I failed to find the relevant documentation, but from a quick ducking around my understanding is that Py_Finalize will attempt to destroy all objects, allocated by Python (e.g. instances of C type PyObject).

That means, that any .NET objects, that own handles to Python objects (e.g. PyObject or PythonException) should not try to deallocate those handles after Py_Finalize has been called, even if Python runtime is later initialized again for a new run.

Right now PyObject.Dispose only checks for Py_IsInitialized, without giving any regard to which runtime run does it belong to versus which run is it now.

In this change I add run tracking to Runtime and PyObject, and force-crash PyObject.Dispose instead of calling XDecref when they do not match. We need to determine what should the proper behavior be in that scenario.

I believe #958 also suffers from this issue.

Does this close any currently open issues?

Related to #1073 , #958

lostmsu · 2020-03-03T05:08:10Z

@amos402 can you take a look at this one. How does it affect Finalizer?

filmor · 2020-03-03T08:26:06Z

The term run is a bit generic, and the private values should be prefixed by an underscore (I know that this is not everywhere the case, I'll try to watch out for it on future PRs). Apart from this I think this is an elegant solution, although I'm not quite sure whether the interlocked read is really required. Can new PyObjects be created while Initialize is running?

lostmsu · 2020-03-03T08:28:45Z

@filmor the finalizer, that calls Runtime.GetRun() will run from .NET finalizer thread, which is always a dedicated thread. So at least the finalizer thread and the thread, that runs Runtime.Initialize must somehow synchronize.

lostmsu · 2020-03-03T08:32:05Z

@filmor , BTW, this is not a fix. I just wanted to see if the PR would have the problem stand out in a CI run (which it does, but only a few tests fail; the effect would be much more dramatic, if Finalizer would not simply eat all exceptions, that happens on a scale).

filmor · 2020-03-03T09:46:25Z

But can't this be handled by simply locking a mutex in the thread and Initialize?

lostmsu · 2020-03-04T18:31:50Z

But can't this be handled by simply locking a mutex in the thread and Initialize?

That would have basically the same effect at slightly higher cost.

We need input here to decide what to do with the problem. @amos402 @benoithudson @Martin-Molinero

After giving this some thought, I feel we might need to explicitly call GC.Collect and GC.WaitForPendingFinalizers in PythonEngine.Shutdown so that all dead .NET objects holding references to Python objects would release them to let Py_Finalize collect as much Python memory as possible before the run/generation ends.

filmor · 2020-03-04T21:45:29Z

Huh? The current implementation introduces cost (atomic read) for every single object created, while the lock would only need to be taken during the finalization and in Initialize.

lostmsu · 2020-03-04T22:03:05Z

@filmor you can't really forego some sort of synchronization on every PyObject created, as without synchronization it might read stale value. See https://shipilev.net/blog/2014/jmm-pragmatics/

On x86 and AMD64 all reads are atomic AFAIK, so this only affects ARM.

lostmsu · 2020-03-04T22:15:38Z

@filmor actually, in x86 builds this change would have performance impact, as extra steps are needed to guarantee Int64 read atomicity. I need to change long run to int run and use Volatile.Read instead of Interlocked.Read.

benoithudson · 2020-03-04T22:38:14Z

This feels like something that should on in a Debug build but not in Release.

For cleanup you basically need:

            System.GC.Collect();
            using (Py.GIL())
            {
                dynamic gc = Py.Import("gc");
                gc.collect();
            }
            System.GC.Collect();

The first C# collect cleans garbage on the C# side. That might release some references into Python.

The Python collect cleans garbage on the Python side, which might release some references into C#. It also releases the Python side of handles that the C# collect released.

The second C# collect releases the handles the Python collect released. Not strictly necessary given that we're shutting down C# anyway.

lostmsu · 2020-03-04T22:49:04Z

Just to clarify, this is not talking about CLR domain unload, just PythonEngine.Shutdown.

This is what Py_Finalize doc says:

Ideally, this frees all memory allocated by the Python interpreter.
Memory tied up in circular references between objects is not freed. Some memory allocated by extension modules may not be freed.

I think we should strive for "ideally". Unless we force collection of dead .NET PyObjects, Python will treat the objects that are referenced by them as "tied up", when they actually are not.

Alternatively, we could document this, and suggest users to call GC.Collect and WaitForPendingFinalizers manually before calling PythonEngine.Shutdown to avoid forcing this on everyone.

lostmsu · 2020-03-05T18:54:20Z

@filmor OK, turns out Volatile is not enough, and one needs Interlocked. Volatile does not guarantee "freshness", e.g. it is only eventually consistent.

benoithudson · 2020-03-05T19:23:34Z

I think garbage collecting on PythonEngine.Shutdown makes loads of sense; otherwise you irreparably leak memory and you slow down Python gc if you restart Python later.

Python automatically does a gc pass during Py_Finalize so you might not need to run Python gc during Shutdown. But I don't think it hurts.

benoithudson · 2020-03-05T19:54:22Z

Oh, just realized you might be reading something different than intended:

This feels like something that should on in a Debug build but not in Release.

I was referring to this PR, asserting that PyObject is only finalized in the proper run. Garbage collecting seems like it should always be done.

amos402 · 2020-03-07T16:06:01Z

TBH, I failed to find the relevant documentation, but from a quick ducking around my understanding is that Py_Finalize will attempt to destroy all objects, allocated by Python (e.g. instances of C type PyObject).

Py_Finalize only destroy the garbages.

That means, that any .NET objects, that own handles to Python objects (e.g. PyObject or PythonException) should not try to deallocate those handles after Py_Finalize has been called, even if Python runtime is later initialized again for a new run.

It is. Thus I always trend to not using the pythonnet's features in the internal.

pythonnet/src/runtime/finalizer.cs

Lines 104 to 109 in 925c166

    
           if (Runtime.Py_IsInitialized() == 0) 
        
           { 
        
               // XXX: Memory will leak if a PyObject finalized after Python shutdown, 
        
               // for avoiding that case, user should call GC.Collect manual before shutdown. 
        
               return; 
        
           }

Just as I commented. For safe, they would be leak. Dispose will not be called on another runtime.
FIX: If the domain didn't reload, it may be called on another runtime, but #958 should enable to handle it.

In this change I add run tracking to Runtime and PyObject, and force-crash PyObject.Dispose instead of calling XDecref when they do not match. We need to determine what should the proper behavior be in that scenario.

As @filmor mentioned, I'm not like to increase the overhead here too, maybe just enable it on debug mode.

I believe #958 also suffers from this issue.

I think it's not related to #958, since the domain may changed, it will not call the dispose in another runime, these objects already leak by the Finalizer.
Also #958 can release the the object which created by previous runtime as long as it didn't leaked by Finalizer.

amos402 · 2020-03-07T16:24:07Z

😂Let me guess, you encounter the crash when you testing, and it seems decrease reference count beyond the object, but it it may just the reference count error of PyObject.
At this point, you may turn on TRACE_ALLOC and add a listener to Finalizer.IncorrectRefCntResolver, check the tracebacks of PyObject creation and memo it , rerun the test again with some tracks for this object to figure it out.

lostmsu · 2020-03-07T21:33:04Z

@amos402 no, "decrease reference count" is not the what this is about (or at least I did not check, may be related). This is about the following scenario:

PythonEngine.Initialize();
var num = 42.ToPyObject();
PythonEngine.Shutdown(); // <- "run" 1 ends
PythonEngine.Initialize(); // <- "run" 2 starts
num = null;
GC.Collect();
GC.WaitForPendingFinalizers(); // <- this will put former `num` into Finalizer queue
Finalizer.Instance.Collect(forceDispose: true);
// ^- this will call num.Dispose, which will call XDecref on `num.Handle`,
// but Python interpreter from "run" 1 is long gone, so it will corrupt memory instead.

In the example above no class checks, that the original interpreter is gone.

amos402 · 2020-03-08T09:22:19Z

If the reference count turns to 0, it will call the tp_dealloc, so the matter is determine the type slots, not the place it call XDecref , #958 should be able to handle it.
Add some simple tests in #958 for this. https://github.com/pythonnet/pythonnet/pull/958/files/183f9d822c17b1cba7f3538b8d4394b7ef1cae44..9d57a82732053f408b8d115d1035177d5c077bef#diff-0731db125845d153331b229612722c5c

lostmsu · 2020-03-09T20:46:52Z

@amos402 the tests you've added only check that the object is collected. But the problem is collecting it might corrupt memory. To see the effect, you might need to run it in a loop, and also ensure no exception is generated by Finalizer.ErrorHandler.

amos402 · 2020-03-10T05:20:16Z

I updated the tests for running multi times and passed. I test it for 1000 times locally, but it will take too much time due to calling the GC, some I reduce it to 10 on commit.
Dispose cross should be allow in #958, if it cause corrupted memory, that will be a bug.

lostmsu · 2020-03-10T20:57:18Z

Played a bit with the test Amos added to soft-shutdown branch, and it seem to be working correctly no matter which object survives until the next run/generation of interpreter.

@amos402 since it appears to be safe to call decref on objects from previous generation, should not we keep .NET objects that need finalization alive until the runtime is up again?

lostmsu · 2021-11-09T05:16:10Z

Reopening this pull request for 3.0 milestone. I discovered a trivial example that (at least in Python 3.10, maybe others too) corrupts Python heap. Being unsure if this is a bug in CPython, I opened an issue in CPython bug tracker.

A trivial repro using using Python.NET code base and Python 3.10 AMD64 (debug build) on Windows:

Runtime.Py_Initialize();
BorrowedReference builtins = Runtime.PyEval_GetBuiltins();
BorrowedReference iter = Runtime.PyDict_GetItemString(builtins, "iter");
var ownedIter = new NewReference(iter); // this basically does IncRef
Runtime.Py_Finalize();

Runtime.Py_Initialize();
ownedIter.Dispose(); // this basically does DecRef
Runtime.Py_Finalize(); // <- this blows up in PyGC_Collect -> validate_list

@amos402 if you have time, I would like to hear your thoughts on this.

… the same run they were created in.

…evious run.

…Python runtime is shut down

… on shutdown when loaded from Python

lostmsu · 2021-11-22T21:19:02Z

@filmor with a workaround for Mono hanging in one of the GC calls, all tests pass.

Can you review this pull request?

src/runtime/runtime.cs

…h other PythonNET properties

lostmsu force-pushed the bugs/1073 branch from fa1fdad to ebf971f Compare March 3, 2020 02:32

amos402 added a commit to amos402/pythonnet that referenced this pull request Mar 8, 2020

Add temp tests reference by pythonnet#1074 (comment)

9d57a82

amos402 mentioned this pull request Mar 8, 2020

Add soft shutdown #958

Merged

4 tasks

lostmsu mentioned this pull request Apr 13, 2020

PyObject instances surviving Runtime shutdown should be finalized when Runtime restarts #1112

Closed

lostmsu closed this Apr 27, 2020

lostmsu reopened this Nov 9, 2021

lostmsu mentioned this pull request Nov 9, 2021

Finalizer tries to dispose objects, that belong to previous Runtime "generation" #1073

Closed

lostmsu force-pushed the bugs/1073 branch from ebf971f to cd8abd1 Compare November 9, 2021 22:03

lostmsu force-pushed the bugs/1073 branch 11 times, most recently from ffaa254 to 8658322 Compare November 12, 2021 05:29

lostmsu added 2 commits November 11, 2021 22:02

Track Runtime run number. Assert, that PyObjects are only disposed in…

4f657d4

… the same run they were created in.

dispose registered codecs and interop configuration during shutdown

62e2fb4

lostmsu force-pushed the bugs/1073 branch from 8658322 to 507a2bf Compare November 12, 2021 06:02

lostmsu added 5 commits November 11, 2021 22:48

Finalizer raises FinalizationException when it sees an object from pr…

1897d1b

…evious run.

allow tests to pass when objects are leaking due to being GCed after …

3909639

…Python runtime is shut down

removed code testing possiblity to dispose objects after domain restart

60d90c6

allow leaking PyObject instances when CLR is stared from Python

9e815a6

WaitForFullGCComplete was never needed, and was used incorrectly

8611dde

lostmsu force-pushed the bugs/1073 branch from fe1b26a to 8611dde Compare November 12, 2021 06:57

lostmsu force-pushed the bugs/1073 branch from 67635b7 to 83033b9 Compare November 22, 2021 20:18

remove finalizer assert for raw pointer value; skip collection assert…

6383a28

… on shutdown when loaded from Python

lostmsu force-pushed the bugs/1073 branch from 83033b9 to 6383a28 Compare November 22, 2021 21:01

filmor reviewed Nov 23, 2021

View reviewed changes

src/runtime/runtime.cs Outdated Show resolved Hide resolved

renamed run system property to __pythonnet_run__ to be consistent wit…

47b3913

…h other PythonNET properties

filmor approved these changes Nov 23, 2021

View reviewed changes

filmor merged commit 55abd29 into pythonnet:master Nov 23, 2021

lostmsu deleted the bugs/1073 branch November 23, 2021 22:12

Track Runtime run number #1074

Track Runtime run number #1074

Uh oh!

Conversation

lostmsu commented Mar 3, 2020

What does this implement/fix? Explain your changes.

Does this close any currently open issues?

Uh oh!

lostmsu commented Mar 3, 2020

Uh oh!

filmor commented Mar 3, 2020

Uh oh!

lostmsu commented Mar 3, 2020

Uh oh!

lostmsu commented Mar 3, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

filmor commented Mar 3, 2020

Uh oh!

lostmsu commented Mar 4, 2020

Uh oh!

filmor commented Mar 4, 2020

Uh oh!

lostmsu commented Mar 4, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lostmsu commented Mar 4, 2020

Uh oh!

benoithudson commented Mar 4, 2020

Uh oh!

lostmsu commented Mar 4, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lostmsu commented Mar 5, 2020

Uh oh!

benoithudson commented Mar 5, 2020

Uh oh!

benoithudson commented Mar 5, 2020

Uh oh!

amos402 commented Mar 7, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

amos402 commented Mar 7, 2020

Uh oh!

lostmsu commented Mar 7, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

amos402 commented Mar 8, 2020

Uh oh!

lostmsu commented Mar 9, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

amos402 commented Mar 10, 2020

Uh oh!

lostmsu commented Mar 10, 2020

Uh oh!

lostmsu commented Nov 9, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lostmsu commented Nov 22, 2021

Uh oh!

Uh oh!

Uh oh!

lostmsu commented Mar 3, 2020 •

edited

Loading

lostmsu commented Mar 4, 2020 •

edited

Loading

lostmsu commented Mar 4, 2020 •

edited

Loading

amos402 commented Mar 7, 2020 •

edited

Loading

lostmsu commented Mar 7, 2020 •

edited

Loading

lostmsu commented Mar 9, 2020 •

edited

Loading

lostmsu commented Nov 9, 2021 •

edited

Loading