Skip to content

Track Runtime run number #1074

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Nov 23, 2021
Merged

Track Runtime run number #1074

merged 9 commits into from
Nov 23, 2021

Conversation

lostmsu
Copy link
Member

@lostmsu lostmsu commented Mar 3, 2020

Assert, that PyObjects are only disposed in the same run they were created in.

What does this implement/fix? Explain your changes.

TBH, I failed to find the relevant documentation, but from a quick ducking around my understanding is that Py_Finalize will attempt to destroy all objects, allocated by Python (e.g. instances of C type PyObject).

That means, that any .NET objects, that own handles to Python objects (e.g. PyObject or PythonException) should not try to deallocate those handles after Py_Finalize has been called, even if Python runtime is later initialized again for a new run.

Right now PyObject.Dispose only checks for Py_IsInitialized, without giving any regard to which runtime run does it belong to versus which run is it now.

In this change I add run tracking to Runtime and PyObject, and force-crash PyObject.Dispose instead of calling XDecref when they do not match. We need to determine what should the proper behavior be in that scenario.

I believe #958 also suffers from this issue.

Does this close any currently open issues?

Related to #1073 , #958

@lostmsu
Copy link
Member Author

lostmsu commented Mar 3, 2020

@amos402 can you take a look at this one. How does it affect Finalizer?

@filmor
Copy link
Member

filmor commented Mar 3, 2020

The term run is a bit generic, and the private values should be prefixed by an underscore (I know that this is not everywhere the case, I'll try to watch out for it on future PRs). Apart from this I think this is an elegant solution, although I'm not quite sure whether the interlocked read is really required. Can new PyObjects be created while Initialize is running?

@lostmsu
Copy link
Member Author

lostmsu commented Mar 3, 2020

@filmor the finalizer, that calls Runtime.GetRun() will run from .NET finalizer thread, which is always a dedicated thread. So at least the finalizer thread and the thread, that runs Runtime.Initialize must somehow synchronize.

@lostmsu
Copy link
Member Author

lostmsu commented Mar 3, 2020

@filmor , BTW, this is not a fix. I just wanted to see if the PR would have the problem stand out in a CI run (which it does, but only a few tests fail; the effect would be much more dramatic, if Finalizer would not simply eat all exceptions, that happens on a scale).

@filmor
Copy link
Member

filmor commented Mar 3, 2020

But can't this be handled by simply locking a mutex in the thread and Initialize?

@lostmsu
Copy link
Member Author

lostmsu commented Mar 4, 2020

But can't this be handled by simply locking a mutex in the thread and Initialize?

That would have basically the same effect at slightly higher cost.

We need input here to decide what to do with the problem. @amos402 @benoithudson @Martin-Molinero

After giving this some thought, I feel we might need to explicitly call GC.Collect and GC.WaitForPendingFinalizers in PythonEngine.Shutdown so that all dead .NET objects holding references to Python objects would release them to let Py_Finalize collect as much Python memory as possible before the run/generation ends.

@filmor
Copy link
Member

filmor commented Mar 4, 2020

Huh? The current implementation introduces cost (atomic read) for every single object created, while the lock would only need to be taken during the finalization and in Initialize.

@lostmsu
Copy link
Member Author

lostmsu commented Mar 4, 2020

@filmor you can't really forego some sort of synchronization on every PyObject created, as without synchronization it might read stale value. See https://shipilev.net/blog/2014/jmm-pragmatics/

On x86 and AMD64 all reads are atomic AFAIK, so this only affects ARM.

@lostmsu
Copy link
Member Author

lostmsu commented Mar 4, 2020

@filmor actually, in x86 builds this change would have performance impact, as extra steps are needed to guarantee Int64 read atomicity. I need to change long run to int run and use Volatile.Read instead of Interlocked.Read.

@benoithudson
Copy link
Contributor

This feels like something that should on in a Debug build but not in Release.

For cleanup you basically need:

            System.GC.Collect();
            using (Py.GIL())
            {
                dynamic gc = Py.Import("gc");
                gc.collect();
            }
            System.GC.Collect();

The first C# collect cleans garbage on the C# side. That might release some references into Python.

The Python collect cleans garbage on the Python side, which might release some references into C#. It also releases the Python side of handles that the C# collect released.

The second C# collect releases the handles the Python collect released. Not strictly necessary given that we're shutting down C# anyway.

@lostmsu
Copy link
Member Author

lostmsu commented Mar 4, 2020

Just to clarify, this is not talking about CLR domain unload, just PythonEngine.Shutdown.

This is what Py_Finalize doc says:

Ideally, this frees all memory allocated by the Python interpreter.
Memory tied up in circular references between objects is not freed. Some memory allocated by extension modules may not be freed.

I think we should strive for "ideally". Unless we force collection of dead .NET PyObjects, Python will treat the objects that are referenced by them as "tied up", when they actually are not.

Alternatively, we could document this, and suggest users to call GC.Collect and WaitForPendingFinalizers manually before calling PythonEngine.Shutdown to avoid forcing this on everyone.

@lostmsu
Copy link
Member Author

lostmsu commented Mar 5, 2020

@filmor OK, turns out Volatile is not enough, and one needs Interlocked. Volatile does not guarantee "freshness", e.g. it is only eventually consistent.

@benoithudson
Copy link
Contributor

I think garbage collecting on PythonEngine.Shutdown makes loads of sense; otherwise you irreparably leak memory and you slow down Python gc if you restart Python later.

Python automatically does a gc pass during Py_Finalize so you might not need to run Python gc during Shutdown. But I don't think it hurts.

@benoithudson
Copy link
Contributor

Oh, just realized you might be reading something different than intended:

This feels like something that should on in a Debug build but not in Release.

I was referring to this PR, asserting that PyObject is only finalized in the proper run. Garbage collecting seems like it should always be done.

@amos402
Copy link
Member

amos402 commented Mar 7, 2020

TBH, I failed to find the relevant documentation, but from a quick ducking around my understanding is that Py_Finalize will attempt to destroy all objects, allocated by Python (e.g. instances of C type PyObject).

Py_Finalize only destroy the garbages.

That means, that any .NET objects, that own handles to Python objects (e.g. PyObject or PythonException) should not try to deallocate those handles after Py_Finalize has been called, even if Python runtime is later initialized again for a new run.

It is. Thus I always trend to not using the pythonnet's features in the internal.

if (Runtime.Py_IsInitialized() == 0)
{
// XXX: Memory will leak if a PyObject finalized after Python shutdown,
// for avoiding that case, user should call GC.Collect manual before shutdown.
return;
}

Just as I commented. For safe, they would be leak. Dispose will not be called on another runtime.
FIX: If the domain didn't reload, it may be called on another runtime, but #958 should enable to handle it.

In this change I add run tracking to Runtime and PyObject, and force-crash PyObject.Dispose instead of calling XDecref when they do not match. We need to determine what should the proper behavior be in that scenario.

As @filmor mentioned, I'm not like to increase the overhead here too, maybe just enable it on debug mode.

I believe #958 also suffers from this issue.

I think it's not related to #958, since the domain may changed, it will not call the dispose in another runime, these objects already leak by the Finalizer.
Also #958 can release the the object which created by previous runtime as long as it didn't leaked by Finalizer.

@amos402
Copy link
Member

amos402 commented Mar 7, 2020

😂Let me guess, you encounter the crash when you testing, and it seems decrease reference count beyond the object, but it it may just the reference count error of PyObject.
At this point, you may turn on TRACE_ALLOC and add a listener to Finalizer.IncorrectRefCntResolver, check the tracebacks of PyObject creation and memo it , rerun the test again with some tracks for this object to figure it out.

@lostmsu
Copy link
Member Author

lostmsu commented Mar 7, 2020

@amos402 no, "decrease reference count" is not the what this is about (or at least I did not check, may be related). This is about the following scenario:

PythonEngine.Initialize();
var num = 42.ToPyObject();
PythonEngine.Shutdown(); // <- "run" 1 ends
PythonEngine.Initialize(); // <- "run" 2 starts
num = null;
GC.Collect();
GC.WaitForPendingFinalizers(); // <- this will put former `num` into Finalizer queue
Finalizer.Instance.Collect(forceDispose: true);
// ^- this will call num.Dispose, which will call XDecref on `num.Handle`,
// but Python interpreter from "run" 1 is long gone, so it will corrupt memory instead.

In the example above no class checks, that the original interpreter is gone.

amos402 added a commit to amos402/pythonnet that referenced this pull request Mar 8, 2020
@amos402 amos402 mentioned this pull request Mar 8, 2020
4 tasks
@amos402
Copy link
Member

amos402 commented Mar 8, 2020

If the reference count turns to 0, it will call the tp_dealloc, so the matter is determine the type slots, not the place it call XDecref , #958 should be able to handle it.
Add some simple tests in #958 for this. https://github.com/pythonnet/pythonnet/pull/958/files/183f9d822c17b1cba7f3538b8d4394b7ef1cae44..9d57a82732053f408b8d115d1035177d5c077bef#diff-0731db125845d153331b229612722c5c

@lostmsu
Copy link
Member Author

lostmsu commented Mar 9, 2020

@amos402 the tests you've added only check that the object is collected. But the problem is collecting it might corrupt memory. To see the effect, you might need to run it in a loop, and also ensure no exception is generated by Finalizer.ErrorHandler.

@amos402
Copy link
Member

amos402 commented Mar 10, 2020

I updated the tests for running multi times and passed. I test it for 1000 times locally, but it will take too much time due to calling the GC, some I reduce it to 10 on commit.
Dispose cross should be allow in #958, if it cause corrupted memory, that will be a bug.

@lostmsu
Copy link
Member Author

lostmsu commented Mar 10, 2020

Played a bit with the test Amos added to soft-shutdown branch, and it seem to be working correctly no matter which object survives until the next run/generation of interpreter.

@amos402 since it appears to be safe to call decref on objects from previous generation, should not we keep .NET objects that need finalization alive until the runtime is up again?

@lostmsu
Copy link
Member Author

lostmsu commented Nov 9, 2021

Reopening this pull request for 3.0 milestone. I discovered a trivial example that (at least in Python 3.10, maybe others too) corrupts Python heap. Being unsure if this is a bug in CPython, I opened an issue in CPython bug tracker.

A trivial repro using using Python.NET code base and Python 3.10 AMD64 (debug build) on Windows:

Runtime.Py_Initialize();
BorrowedReference builtins = Runtime.PyEval_GetBuiltins();
BorrowedReference iter = Runtime.PyDict_GetItemString(builtins, "iter");
var ownedIter = new NewReference(iter); // this basically does IncRef
Runtime.Py_Finalize();

Runtime.Py_Initialize();
ownedIter.Dispose(); // this basically does DecRef
Runtime.Py_Finalize(); // <- this blows up in PyGC_Collect -> validate_list

@amos402 if you have time, I would like to hear your thoughts on this.

@lostmsu lostmsu force-pushed the bugs/1073 branch 11 times, most recently from ffaa254 to 8658322 Compare November 12, 2021 05:29
@lostmsu
Copy link
Member Author

lostmsu commented Nov 22, 2021

@filmor with a workaround for Mono hanging in one of the GC calls, all tests pass.

Can you review this pull request?

@filmor filmor merged commit 55abd29 into pythonnet:master Nov 23, 2021
@lostmsu lostmsu deleted the bugs/1073 branch November 23, 2021 22:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants