-
Notifications
You must be signed in to change notification settings - Fork 747
Add soft shutdown #958
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add soft shutdown #958
Conversation
Can you explain how does soft shutdown work in more details? |
The base concept is, after |
I'm unsure how this relates to #957. Seems to be more about improving the GC handling? The prototype that @BadSingleton did was, in implementation, fairly simple (took us a while to figure it out of course):
|
Well, Python crash is connect with GC handling, after domain unload, when the GC collect occurred, it will traverse all GC objects, but the memory from old domain has been invalid, any access will result to crash. |
* Keep delegate of Thunk in tp_dict * Remove useless code
* Fix ref error at PythoneEngine.With
…nObject) will not care 'u' on Python2.
…hutdown forcedly * Disable restore objects on default
src/runtime/classmanager.cs
Outdated
@@ -39,6 +39,48 @@ public static void Reset() | |||
cache = new Dictionary<Type, ClassBase>(128); | |||
} | |||
|
|||
public static IList<ClassBase> GetManagedTypes() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be internal
. Also, I don't see it used anywhere.
This method also needs a XML doc note, that it is not thread-safe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why internal? The class is a internal already, and as a getter, I prefer use public.
Hummm, seems I only use it in my previous version, I will check it latter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still need to address thread safety. If called from different threads this can cause reading stale values from memory.
src/runtime/extensiontype.cs
Outdated
@@ -28,7 +28,7 @@ public ExtensionType() | |||
|
|||
IntPtr py = Runtime.PyType_GenericAlloc(tp, 0); | |||
|
|||
GCHandle gc = GCHandle.Alloc(this); | |||
GCHandle gc = AllocGCHandle(true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AllocGCHandle(true)
- specify the parameter name here - it is unclear what true
means.
E.g. AllocGCHandle(owned: true)
src/runtime/classbase.cs
Outdated
private static IntPtr GetObjectDict(IntPtr ob) | ||
{ | ||
return Marshal.ReadIntPtr(ob, ObjectOffset.DictOffset(ob)); | ||
} | ||
|
||
private static void SetObjectDict(IntPtr ob, IntPtr value) | ||
{ | ||
Marshal.WriteIntPtr(ob, ObjectOffset.DictOffset(ob), value); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please, add a XML doc to both methods, mentioning, that neither of them affect refcounts.
src/runtime/runtime_state.cs
Outdated
var gc = PyImport_ImportModule("gc"); | ||
var get_objects = PyObject_GetAttrString(gc, "get_objects"); | ||
var objs = PyObject_CallObject(get_objects, IntPtr.Zero); | ||
var length = PyList_Size(objs); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please, check the return values of these calls.
Also, wrap code, operating on objs
and gc
in try .. finally ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, checking return values is necessary although using these builtin functions will be failed by less probability.
But the try is not necessary, they're all C functions actually.
I've summarized my understanding in this document: https://docs.google.com/document/d/1a9OLsdKHXJ6MxHjo0WlVRcfjAiaP5hONNKUKKH6Tr-o I see two PRs that would be easy to pull out of amos' code and check in right now:
I don't yet understand what the gc chains thing is trying to address. I also don't understand how amos handles the modules. I suspect they're related. It seems like amos is turning off python gc for the proxy objects. If that works, we should just turn it all off and then we can eliminate the native code page. But that's yet another PR. There seems to be some refcount fixes as well. If we can get those fixes in separate PRs that makes it easier to review. |
* Drop C module dependency when getting _PyObject_NextNotImplemented * Exception details for SetNoSiteFlag
…mments-2 Soft shutdown review comments 2
src/runtime/BorrowedReference.cs
Outdated
@@ -11,8 +11,7 @@ namespace Python.Runtime | |||
public bool IsNull => this.pointer == IntPtr.Zero; | |||
|
|||
/// <summary>Gets a raw pointer to the Python object</summary> | |||
public IntPtr DangerousGetAddress() | |||
=> this.IsNull ? throw new NullReferenceException() : this.pointer; | |||
public IntPtr DangerousGetAddress() => this.pointer; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should not be changed. If a function is needed, that allows access to the raw pointer when it is null
, it should be a separate function for clarity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll add an unchecked version
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That checking is not necessary, because the method already named DangerousXXX, why it even can't obtain a null pointer. If it needs some null validation, it should do by the invoker not the internal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@amos402 if the check is missed by accident by the invoker, the bug would be much easier to trace if it fails at the first access rather than much later on, when the raw null pointer is actually dereferenced.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I concern is the method conformed the single responsibility or not. It is unfit that a method like this can't obtain a null pointer. For example, std::shared_ptr<T>::get
can illustrate this point.
Anyway, I don't want to argue about this, if you insist, just check it, I don't mind it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tend to agree with @amos402, a method prefixed with Dangerous
should let me do dangerous things. However, I think this modification should go in another PR as it modifies the current behaviour, and adding a checked version would add up a lot of changes all over the place.
if (_isInitialized) | ||
{ | ||
return; | ||
} | ||
_isInitialized = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably not an issue in this PR, but static methods generally should be thread-safe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressing this is not necessary for acceptance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's to avoid double initialization of the runtime. With domain reloads we can't rely anymore on Py_IsInitialized
.
src/runtime/runtime.cs
Outdated
|
||
// Initialize modules that depend on the runtime class. | ||
AssemblyManager.Initialize(); | ||
#if !NETSTANDARD |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should avoid compile-time checks, as we eventually want to produce a single .NET Standard DLL. Can you replace this with a runtime check for AppDomain
unloading support.
src/runtime/runtime_data.cs
Outdated
Marshal.WriteIntPtr(mem, (IntPtr)ms.Length); | ||
Marshal.Copy(data, 0, mem + IntPtr.Size, (int)ms.Length); | ||
|
||
IntPtr capsule = PySys_GetObject("clr_data").DangerousGetAddress(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why can't this work with BorrowedReference
directly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could, but it would not improve readability. While we do get a borrowed reference, we really do own this reference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not so sure. It took some time to figure out, that in 73-80 capsule
is borrowed, in 80-82 it is a new object, and that in 81 the reference to it is not being stolen, and required decref in 82.
Maybe extract clearing of existing clr_data
into a method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, I get what you are saying now. I'll factor out the clearing.
And other code review changes
Even if netstandard lacks (for now) the necessary APIs for domain (un)loading, it is still possible to (un)load domains via the Mono C runtime.
In Runtime_data.cs
…e pointer" We Don't support python 2 anymore, but the CI machines may still be using it to build.
Also adds implicit IntPtr conversion operators to simplify their use.
…mments-3 * Remove compile-time check on NETSTANDARD * Extract method `ClearCLRData`
@BadSingleton / @amos402 can you please merge from master? |
* Fix dead lock occur by calling PyEval_RestoreThread without check * Ignore `CollectOnShutdown` temporarily
Codecov Report
@@ Coverage Diff @@
## master #958 +/- ##
=======================================
Coverage 86.25% 86.25%
=======================================
Files 1 1
Lines 291 291
=======================================
Hits 251 251
Misses 40 40
Flags with carried forward coverage won't be shown. Click here to find out more. Continue to review full report at Codecov.
|
if (!FullGCCollect()) | ||
{ | ||
Assert.IsTrue(WaitForCollected(op, hash, 10000)); | ||
} | ||
Assert.IsFalse(shortWeak.IsAlive); | ||
garbage = Finalizer.Instance.GetCollectedObjects(); | ||
Assert.IsNotEmpty(garbage, "The garbage object should be collected"); | ||
Assert.IsTrue(garbage.Any(r => ReferenceEquals(r.Target, longWeak.Target)), | ||
"Garbage should contains the collected object"); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please, avoid leaving partial changes in the history.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not partial, it's a complete one, although it can't pass on all runtimes.
What does this implement/fix? Explain your changes.
An upgraded version for #754
An implementation for #957
Also included some minor fixes.
Still simple and roughly, reset all GC objects by a save state, the old domain objects will be leaking and unaccessible.
Here's my next plan, I think I can replace all the clr objects slots to solid callbacks, that may help the memory leaking.
How to use: use
PythonEngine.Initialize(softShutdown: true)
for startup.Updated Details:
RuntimeState.ShouldRestoreObjects
, and modules which not insys.modules
after just call initial CPython will be remove fromsys.modules
on soft shutdown mode.tp_travese
andtp_clear
to release their sub objects, then useSlotsHolder.ReleaseTypeSlots
reset their slots.Points can be optimize for the above mechanism:
tp_travese
andtp_clear
are not necessary for all time, they will become overhead for CPython's GC, maybe we can just use C# methods instead.SlotsHolder
is not suitable because it also handle the memory release.tp_is_gc
and some other slots are no need to be override at start, they can inherit fromPyType_Type
by default.Interop
do not holding the delegate objects by callingGetThunk
anymore, theSlotsHolder
will take care of them.__import__
will restore at shutdown.Does this close any currently open issues?
#957
Any other comments?
Checklist
Check all those that are applicable and complete.
AUTHORS
CHANGELOG