Skip to content

Add PyHandle #1087

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed

Add PyHandle #1087

wants to merge 1 commit into from

Conversation

amos402
Copy link
Member

@amos402 amos402 commented Mar 10, 2020

What does this implement/fix? Explain your changes.

  • Add a wrapper named PyHandle to replace IntPtr of C API.
  • Refactor Rutime.XDecref and Runtime.XIncref.
  • Add implcit operator for BorrowedReference and NewReference

NewReference and BorrowedReference seem be good for helping user comprehend the type of return value quickly. But it should not interfere the general usage.
When I tried to merge the code, I found that the DangerousGetAddress just keep annoying me. In many cases we have to use the raw IntPtr(e.g. put the Python object to a clr collection). Keep typing the long method name DangerousGetAddress just verbose since I must know that C API does return borrowed reference or new reference before using the C API.
Also I can't the NewReference or BorrowedReference to the collection, because they're non-copyable. This may good cause no matter which put to the collection wouldn't make sense(e.g. List<NewReference>`, I still have to iterate it and Decref the objects when I clear the List).

So, here by I introduce the PyHandle. Be more readable compare with the raw IntPtr, it contains some basic functions for the Python object for easy usage, can be implicit convert from NewReference or BorrowedReference. And it doesn't conflict with current code.
Use the PyHandle instead of BorrowedReference as the parameter type should be more rantional.
For example, such as int PyList_Append(BorrowedReference pointer, IntPtr value);, why the pointer should be a BorrowedReference? What about the value what's type it should be? Either New nor Borrowed both doesn't match.
I may just created a PyList from PyList_New and it will return a NewReference, but I can't even use the return value through to the other functions, seems that doesn't match the general usage cases.

Does this close any currently open issues?

#1068
#1043

Any other comments?

...

Checklist

Check all those that are applicable and complete.

  • Make sure to include one or more tests for your change
  • If an enhancement PR, please create docs and at best an example
  • Add yourself to AUTHORS
  • Updated the CHANGELOG

@lostmsu
Copy link
Member

lostmsu commented Mar 10, 2020

As far as I can see, PyHandle does not add much over IntPtr, except ToString overload, which is helpful for debugging and conversions to and from the new *Reference types. In every other way it is the same IntPtr type.

However, implicit conversions destroy the whole point of *Reference types, which is to track lifetime of objects received from and given to Python interpreter. Automatic conversion to PyHandle, which can later be stored elsewhere or passed to Python allows it to potentially outlive the object it is pointing to.

There are two reasons I did not add more explicit type to int PyList_Append(BorrowedReference pointer, IntPtr value);:

  1. documentation does not explicitly say if it does or does not IncRef it internally, e.g. it is unlcear if a BorrowedReference will suffice for value, or we have to pass a NewReference, and somehow have it consumed.
  2. If it would have to be a NewReference, how can we ensure it is consumed? E.g. how we can ensure that after extern Py_SomeFunc(NewReference value) the value is set to null. It is a problem you can help on. Ideally we could find a way to make PInvoke do it for us.

Overall the way I see correct refcount handling could work is:

  • keep using BorrowedReference for something that was borrowed from Python. It can not outlive the current .NET function (which is to say generally is insufficient, but better than no guarantees at all).
  • use NewReference to distinguish references we have to Dispose after use
  • use PyObject for anything else, as it allows to safely store objects indefinitely (e.g. outside of the current function scope), including in collections
  • we can make specialized collection(s) for borrowed references when needed (this is only when having a collection of PyObject would have too much impact on performance). Those collections would have to have the same lifetime restrictions as BorrowedReference.


namespace Python.Runtime
{
struct PyHandle : IDisposable
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having regular structs IDisposable is a bad idea. The following two cases would cause the pointer to be copied and potentially lead to double free:

var x = GetNewPyHandleToSomething();
var y = x;
x.Dispose();
y.Dispose();
var x = GetNewPyHandleToSomething();
void DoSomethingWithHandle(PyHandle handle) {
  ...
  handle.Dispose();
}
DoSomethingWithHandle(x);
x.Dispose();

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This merely for easy using statement like using(var list = PyList_New()), people should know what they're doing when they using the C API, if they're not, everything on language sight are just smoke and mirrors. Just like you can always decref a wrong object.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@amos402 but they only have to use raw C API if you expose it. In that sense *Reference types are fool-proof, unless you use DagerousGetAddress.

C-style API is a error-prone mental burden, which is totally unnecessary in this case.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not agree with it, since you can't control the user how write the code, these fool-proof are helpless.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@amos402 users of Python.NET do not need to work with pointers at all.

In PRs for Python.NET I can safely skip attempts to verify correct refcount balancing if the code does not use .DangerousGetAddress(), which will make them easier to read.

@lostmsu
Copy link
Member

lostmsu commented Mar 10, 2020

Also, I started with little conversions to *Reference so we could see pain points and address them before jumping on using *Reference types for all PInvoke. If you face one, open an issue, describe your scenario, and give a code example where *Reference types do not work well for you, and we can discuss how to fix it.

@lostmsu lostmsu mentioned this pull request Mar 10, 2020
4 tasks
@amos402
Copy link
Member Author

amos402 commented Mar 11, 2020

As far as I can see, PyHandle does not add much over IntPtr, except ToString overload, which is helpful for debugging and conversions to and from the new *Reference types. In every other way it is the same IntPtr type.

That's what I wanted, we need a equivalent type for IntPtr to distinguish the Python object and raw pointer for memory or other usage(e.g. size_t) and provides some basic method for Python object for OOP.

I don't like current usages of the NewReference and BorrowedReference, it only increase the verbose code. I barely don't want to use them at all.
Just as I said, people who use these API should know what value their returned, a <summary/> comment should be enough.

var obj = PyList_New();
// do something
// return and didn't call Decref or Dispose

As a result, the same as you use IntPtr, didn't help much at all.
On my opinion, I didn't need these useless things, they only increase my use-cost, I can do well as well as use PyHandle even IntPtr.

@lostmsu
Copy link
Member

lostmsu commented Mar 11, 2020

@amos402 actually, I recently upstreamed a change to FxCop, that will make your example fail at compile time because you did not call Dispose. Once the new version of the analyzers package is released, we can add it to our build.

But even without it, the new types already prevent accidental copying without corresponding increfs.

@amos402
Copy link
Member Author

amos402 commented Mar 11, 2020

That means I even can just return the returned value. I have to incref it first, and store its pointer from DangerousGetAddress and call Dispose to make it decref, then I can return it. One simple step become four steps.
That could be more worse.

@lostmsu
Copy link
Member

lostmsu commented Mar 11, 2020

@amos402 no, it does not. Static escape analyzer is smart enough to extend scope when the value is returned.

You can actually see how this works right now with [NonCopyable] on NewReference.

@amos402
Copy link
Member Author

amos402 commented Mar 11, 2020

I don't like that NonCopyable thing cause I just want to put the object into collection directly.

@lostmsu
Copy link
Member

lostmsu commented Mar 11, 2020

cause I just want to put the object into collection directly

Can you show a scenario in Python.NET code?

@amos402
Copy link
Member Author

amos402 commented Mar 11, 2020

NewReference pylist = Runtime.PyList_New(10);
Runtime.PyList_Append(pylist, pyobj); // X

using (NewReference pylist = Runtime.PyList_New(10)) // X
{
}

List<NewReference> list1; // X
list1.Add(pylist); // X

List<BorrowoedReference> list2; // X
list2.Add(PySys_GetObject("modues") // X

@lostmsu
Copy link
Member

lostmsu commented Mar 11, 2020

@amos402 I meant a full method, that does something useful.

BTW, the first X should not fail. I think NewReference needs a Borrow() method, that would return BorrowedReference, or, perhaps, even an implicit conversion (I am still torn about this one).

The second X fails because we use C# 7.3. It would work in C# 8.0, which we could switch to after 2.4 is released.

@amos402
Copy link
Member Author

amos402 commented Mar 11, 2020

I thought the usages above are just some general usages, but it can't do it.

If you need the concrete usages, like these:
Chains call: X
https://github.com/amos402/pythonnet/blob/66ab7192222e382faf6cff808a23964eb846fc34/src/runtime/runtime_data.cs#L248-L269

Generator call: X
https://github.com/amos402/pythonnet/blob/66ab7192222e382faf6cff808a23964eb846fc34/src/runtime/runtime_state.cs#L146-L159

@lostmsu
Copy link
Member

lostmsu commented Mar 11, 2020

@amos402 the first example will have only one DangerousGetAddress after we add an implicit conversion from NewReference to BorrowedReference, and update PInvoke signatures.

The second scenario is indeed dangerous, and I feel like it is good to have it indicated as such. You are serializing pointers, that would live across domain boundary and Python interpreter instantiations.

@amos402
Copy link
Member Author

amos402 commented Mar 11, 2020

BTW, the first X should not fail. I think NewReference needs a Borrow() method, that would return BorrowedReference, or, perhaps, even an implicit conversion (I am still torn about this one).

Why you have to specific they're New or Borrowed precisely when they as parameters, wouldn't it make confuse to others? Is not easy to treat the parameters as PyHandle and return NewReference or BorrowedReference? Of course, for better usage, I tend to make them can be convert a PyHandle automatically.

The second X fails because we use C# 7.3. It would work in C# 8.0, which we could switch to after 2.4 is released.

Please don't rely on the language version much.

@lostmsu
Copy link
Member

lostmsu commented Mar 11, 2020

@amos402 most parameters are BorrowedReference, but, for example, PyList_SetItem steals the reference to the item. So we can mark it as NewReference, and ensure, that PInvoke will not allow us to use item after it is passed to PyList_SetItem.

@lostmsu lostmsu mentioned this pull request Mar 11, 2020
2 tasks
@amos402
Copy link
Member Author

amos402 commented Mar 11, 2020

That means it may confuse the concept of new reference and borrowed reference, steals the reference means it just would not add reference count, now you say the parameter is a new reference object, things become jumble.

@lostmsu
Copy link
Member

lostmsu commented Mar 11, 2020

@amos402 I think it is pretty clear. If a parameter is BorrowedReference, you can pass a borrowed reference to it. If it is a NewReference, you have to "create" a new reference to the object you want to pass, internally method will steal it. Usually we would do it by incref'ing the object.

@lostmsu
Copy link
Member

lostmsu commented Mar 11, 2020

BTW, opened a PR for allowing to borrow from a NewReference #1088, which should remove DangerousGetAddress from your first example, except 1 instance when you put it into a collection.

@amos402
Copy link
Member Author

amos402 commented Mar 11, 2020

var xxx = PySysGetObject("xxx"); // borrowed
PyTuple_SetItem(t, xxx); // no no no, you're passing a BorrowedReference, you need to make it to NewReference

var xxx = PyObject_GetString(obj, "list"); // new
PyList_Append(xxx, obj); // no no no, you're passing a NewReference, you need to make it to BorrowedReference. Oh, seems it can be upgrade automatically, fine.

PyList_SetItem(xxx, xxx); // no no no, you're passing a NewReference, you need to make it to BorrowedReference. Can you see that, it can be upgrade automatically, OK... Wait, `xxx` is what, it's a new or borrowed, or something else..... Now then, let's check the manual and ignore what the signature said.....

You are increasing the use-cost.

@lostmsu
Copy link
Member

lostmsu commented Mar 11, 2020

For reference stealing we can have another non-copyable ref struct StolenReference, and add

public StolenReference Steal() => ...

Methods to both BorrowedReference and NewReference.
Then you'd use it like this:

PyList_SetItem(listReference, index, itemReference.Steal());

Alternatively, we could find a PInvoke trick, that would allows us to use NewReference directly, and somehow automatically null it afterwards.

@amos402
Copy link
Member Author

amos402 commented Mar 11, 2020

Oh no, another type, that's what am saying, things become jumble. Do we really need this? I though just enough by using PyHandle, no tricks.

@lostmsu
Copy link
Member

lostmsu commented Mar 11, 2020

var xxx = PySysGetObject("xxx"); // borrowed
PyTuple_SetItem(t, xxx); // no no no, you're passing a BorrowedReference, you need to make it to NewReference

And that's exactly the point! You have to construct a suitable new reference for xxx here. You missed an IncRef!

var xxx = PyObject_GetString(obj, "list"); // new
PyList_Append(xxx, obj); // no no no, you're passing a NewReference, you need to make it to BorrowedReference. Oh, seems it can be upgrade automatically, fine.

As you say, can be fixed by #1080.

PyList_SetItem(xxx, xxx); // no no no, you're passing a NewReference, you need to make it to BorrowedReference. Can you see that, it can be upgrade automatically, OK... Wait, `xxx` is what, it's a new or borrowed, or something else..... Now then, let's check the manual and ignore what the signature said.....

That scenario I am still thinking about. With the StolenReference type above this would look like:

PyList_SetItem(xxx, xxx.Steal());
// a person reading code can see reference is stolen without referring to Python docs
PyList_SetItem(xxx, xxx); // <- won't compile! excellent: you forgot to IncRef xxx!

@amos402
Copy link
Member Author

amos402 commented Mar 11, 2020

And that's exactly the point! You have to construct a suitable new reference for xxx here. You missed an IncRef!

I mean, I wouldn't say that's a NewReference, why it named New? It's not NEW.

All I just want to use the Python API just as write the C/C++ extension. People who use these API should always care about the reference. Since there's no smart pointer here, the operates handle the reference count does necessity, you can control them automatically either. But these types make code become verbose and ambiguous.

@lostmsu
Copy link
Member

lostmsu commented Mar 11, 2020

@amos402 , again, this part (where parameter is to be stolen) is not settled yet.
Ideally, I'd like it to look either like this:

PyTuple_SetItem(t, xxx.NewReference());

or like this:

PyTuple_SetItem(t, xxx.Steal());

But I am afraid we'd have to resort to something like PyTuple_SetItem(t, ref xxx), because xxx here has to be nulled after use, but only if call does not fail.

@amos402
Copy link
Member Author

amos402 commented Mar 11, 2020

Return value is just fine, but introduce them into parameters, I refuse.
BorrowedReference PyList_GetItem(PyHandle pointer, IntPtr index)
int PyList_SetItem(PyHandle pointer, IntPtr index, PyHandle value)
This's enough.

@lostmsu
Copy link
Member

lostmsu commented Mar 11, 2020

@amos402 any specific reasons?

I have one for the distinct types: when I read a pull request, I'd like to avoid looking up every single Python function call I encounter in docs to ensure the pull request author counted IncRefs and DecRefs correctly.

@amos402
Copy link
Member Author

amos402 commented Mar 11, 2020

Yes, the type checking may be a little help for that. But it should not stuck the general usage. So I tend to use implicit operator to make them become PyHandle, user should know that they're just using the raw pointer when they use it, because you have to declare it as PyHandle op = PyList_New(10), not var, not NewReference, no need to call that long verbose function PyHandle op = PyList_New(10).DangerousGetAddress().
And if you must Dispose before leave the scope, when I want to put a IntPtr into collection, I may want to steal it, so I must incref it first then call Dispose to decrease it, these +- operates become meaningless.
Be ware the reference count does necessary even you use these wrapped types. Generally, I don't need to check the manual everytime, you can find there's a priciple for making the return value as NEW or BORROWED.
Steal reference only appeared by few XXX_Set container APIs, it means it transfer the ownership to the container, not mean it should a New, also if you introduce a StoleReference, hard to explain why this can represent as a type. You can think all parameters are borrowed. If you do too much things for the signature, that may interfere the general usage.

@filmor
Copy link
Member

filmor commented Mar 11, 2020

All I just want to use the Python API just as write the C/C++ extension. People who use these API should always care about the reference. Since there's no smart pointer here, the operates handle the reference count does necessity, you can control them automatically either. But these types make code become verbose and ambiguous.

I'm sorry, but no. This mindset got us exactly all of those reference count issues we (and in particular you) have been working on for years now to resolve. There is no shame in getting the compiler to help out on this. Naming can be discussed, but with this the code becomes /less/ ambiguous because we don't (ab)use IntPtr for semantically different things. With var you won't see much of a difference in verbosity either.

@amos402
Copy link
Member Author

amos402 commented Mar 12, 2020

I have to declare that I didn't critical the Rererence types.

  • Return values for New/Borrowed references?
    I agree with that. I never said no.

  • Why adding this PyHandle?
    I have no intend to replace New/Borrowed reference with it for the return values. This is a equivalent type for IntPtr, it's use for replace IntPtr for OOP and prevent ambiguous from other pointer types.

  • Why PyHandle can be implicit convert from IntPtr?
    They're equivalent.

  • Why replace all IntPtr(PyObject* in C) into PyHandle?
    I don't want to interfere current code too much. Old code can be compatible without any modifies. Besides, it's easy to batch replace IntPtr types for the P/Invoke signatures.

  • Why PyHandle can be implicit convert from New/Borrowed reference types?
    I don't want to interfere current code too much. Even you change the signatures of return values to reference types, it don't need any changes.
    Besides, it's for the easy usages. I said, PyHandle is equivalent for IntPtr. To trigger the implicit converter, you have to declare the variable type explicit, you won't get a PyHandle when you just using var. If you did that, that means you knew it became a raw pointer, and you have to take care of its lifecycle.
    At another point, you can say IntPtr/PyHandle/NewReference/BorrowedReference are all equivalent.
    And I can easy put them into clr container. No matter what they were, you still need to care about the reference count.
    Of course, if you really really hate the implicit operator, you can add a move segment instead it latter.

  • Why PyHandle implemented IDisposable?
    So it can be use by using statement replace those verbose try {} finnaly{ decref }.

  • Why refactor the XIncref/XDecref?
    Just write that incidentally(not sure I'm using the write word). You can see it's faster than before.

  • Why add Incref/XDecref methods for PyHandle?
    So it can easy to use them when you want to assert the pointer isn't null. For reducing a custom assert checking or if (is null) throw.

  • Parameters for New/Borrowed references?
    Not agree. I suggest limit them only in caller's scope except return it. Use them as return value for C# method is fit, I think.

  • Introduce a new type for those function of steal reference?
    I object insistently.

  • Other suggests?
    DangerousGetAddress is too long for type and read, make a Handle property instead.

p.s. If I used any wrong English grammar or words before, please point out them if it won't bother you too much😝.

@lostmsu
Copy link
Member

lostmsu commented Feb 21, 2021

We are not taking this change.

@lostmsu lostmsu closed this Feb 21, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants