Skip to content

failure to convert numpy.float64 into System.Double #1936

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
rzindler opened this issue Sep 12, 2022 · 24 comments
Closed

failure to convert numpy.float64 into System.Double #1936

rzindler opened this issue Sep 12, 2022 · 24 comments

Comments

@rzindler
Copy link

Environment

  • Pythonnet version: 3.0
  • Python version: 3.x
  • Operating System: Windows/Linux
  • .NET Runtime: x

Details

Passing in a numpy array to a C# function that takes a double[] parameter fails.

python code:
test_array = np.array([123, 345], np.float)
CSharpClass.Test(test_array)

C# code
public static void Test(double[] TestData) {...}

I already have a fix for this issue.

In Runtime.cs, change PyFloat_Check(...) to

    internal static bool PyFloat_Check(BorrowedReference ob)
    {
        return PyObject_TypeCheck(ob, PyFloatType);
        //return PyObject_TYPE(ob) == PyFloatType; // existing code
    }

This will convert any type that is derived from PyFloatType, which includes numpy.float64

@lostmsu
Copy link
Member

lostmsu commented Sep 12, 2022

By design. NumPy array is not a C# array and vice versa.

@lostmsu lostmsu closed this as not planned Won't fix, can't repro, duplicate, stale Sep 12, 2022
@rzindler
Copy link
Author

But this worked in 2.5. So what is the "correct" way of doing this now then?

@lostmsu
Copy link
Member

lostmsu commented Sep 12, 2022

One fast method is listed here: #514 (comment)

A faster and perhaps more convenient way would be to write a .NET function using PyBuffer, see tests. If you want it to work transparently, maybe make a codec.

@rzindler
Copy link
Author

Thanks!

@rzindler
Copy link
Author

I'm going to push back on this.
The core problem is that the numpy scalar types are not being converted to C# types. np.float64 should be freely interchangeable with a double. Likewise, np.int32 and C# int.
This is not just for passing numpy arrays themselves, but for passing elements of numpy arrays as scalars.

So I can't call:
A.Func(numpy_array[0])

I have to call:
A.Func(float(numpy_array[0]))
which adds execution overhead, and coding annoyance (poorer readability, forgetting, etc.)

I get the desire to make the conversion explicit rather than implicit. But blocking conversion of the elements is not the way to do that.

You should also note that you can pass Python lists as a double[] parameter, and it works fine. If you want to make conversion explicit, then why not lists too? Your claim that this is by design is therefore suspect.

Also, your idea of making a codec does not work. There does not appear to be a way to make a codec to convert to a primitive type. The decoder is not getting called, because Converter.DecodableByUser() is returning false for primitive types. I guess that's a performance choice?

The simple change I suggested will make it so np.float64 will get converted to C# double. Presumably, a similar change can be made for np.int32.

@filmor
Copy link
Member

filmor commented Sep 14, 2022

Please, don't act like we have never given this any thought.

There is #1908 that I still want to get in (in some form) for 3.0 which tries to make things a bit better.

I have to see whether I can clear up @lostmsu's concerns, it should definitely still be possible to hijack the conversion (though I think we do this part, checking against codecs, as the first step).

@lostmsu
Copy link
Member

lostmsu commented Sep 14, 2022

@rzindler as I said before, make a codec. A codec would allow you to pass numpy.float64 as a parameter to a method that accepts System.Double without any explicit casts.

@rzindler
Copy link
Author

rzindler commented Sep 14, 2022

I did try that. As I said above, codecs are not being called when converting TO a primitive type, because Converter.DecodableByUser() returns false.
I'm guessing it was done to enhance performance, since checking through all the codecs is potentially expensive. Otherwise, why would any type not be decodable by the user?

This is the current code:

        if (DecodableByUser(obType))
        {
            BorrowedReference pyType = Runtime.PyObject_TYPE(value);
            if (PyObjectConversions.TryDecode(value, pyType, obType, out result))
            {
                return true;
            }
        }

@filmor sorry if I gave the impression, that I thought you hadn't thought about stuff.

@lostmsu
Copy link
Member

lostmsu commented Sep 14, 2022

@rzindler you are right, we did explicitly forbid codecs for the primitive types, and performance is indeed the reason.

The fact, that lists are still converted is quite dubious too, but one questionable design decision should not be used as an excuse to add more.

@rzindler
Copy link
Author

In fact, any iteratable object will be implicitly converted to a System.Array. numpy arrays are the only ones that fail, and only because their elements are not convertible.

Not only are all iteratables implicitly converted, This bit of code below happens before the codecs are called, so you cannot make a codec for array conversions. Meaning no codecs are possible for either numpy arrays, or their elements.

            if (obType.IsArray)
            {
                return ToArray(value, obType, out result, setError);
            }

If you really want to prevent all implicit conversion to arrays, all you have to do is delete this code (and all other references to ToArray(...)). Also, if you deleted it, then we could make a codec, which we can't right now.

Perhaps you'd want to make a public version of ToArray() that is accessible from Python. This would enable explicit conversion, which AFAIK doesn't exist right now (except in extra Python code). Of course, this would fail for numpy arrays, unless the elements are convertible.

So in any case, it is definitely desirable to have implicit conversions of the numpy element types. My original suggestion works for np.float64, but maybe you want to use the .item() member that all numpy element types have to get at their core value.

Right now, even if I take a PyObject as a parameter for a numpy array, I'd have to bypass your entire conversion infrastructure and convert it manually. All calls to ToManaged(...) will fail, and there's no place to jump in without rewriting it.

@rzindler
Copy link
Author

One more comment:
If, as you've been saying, disabling implicit numpy to System.Array conversions is deliberate, then I'd say it deserves some prominent mention in the release notes, as it's a significant break in backward compatibility. I didn't see anything in the release notes suggesting this change.

@lostmsu
Copy link
Member

lostmsu commented Sep 15, 2022

@rzindler you're overreacting. It is mentioned in the release notes

@rzindler
Copy link
Author

rzindler commented Sep 15, 2022

Firstly, I appreciate you guys taking the time to answer here.

However, I am a bit frustrated that you seem to not be actually reading what I wrote when you reply.
So please read this carefully.

Our use case is we are converting chunks of a very large Python code base to C#, mostly for performance gains. There's a common situation where a callback function is given to a Python function. These callback functions take numpy arrays as parameters, and can be either Python or C# functions. Therefore, it's not practical to use explicit conversion on the Python side, because we don't know if the function is C# or Python.

As the (3.0) code is now, codecs are not getting called, so there's basically no way to get this code working without modifying your code base.

You have chosen to move in a direction that requires type conversions to be explicit. While this is in line with modern computer science thought, I'm going to push you to reconsider this direction for your situation. Especially the support for converting numpy arrays. I think you're making a big mistake.

  • When I was trying out v2.5 for the first time, the first thing I did was try passing a numpy array to a C# function, and it just worked. This was what convinced me to start using this tool! If it didn't work right away, I would have moved on! You can bet that I won't be the only one.
  • Your entire product exists to make implicit conversions between Python and C#. I use it because I want to call a function, and not have to think about whether it's C# or Python code I'm calling. Your product bills itself as seamless integration between Python and .NET, but not supporting implicit numpy conversions puts a large seam in that integration.
  • Strict typing works and works well within C#, partly because it is strongly typed from end to end, and partly because there are trivial ways to bypass the strictness via things like casting operators. The closest to casting operators in Python is the __float__ member. Given the intended purpose of __float__, it is a complete mystery to me why you'd want to block that conversion.
  • If you want to be completely pedantic about type conversions, then a Python float is not the same as a C# double. So why not require explicit conversion for that too? The answer is because it would be super annoying, and offer nothing in return. So why did you choose to block the numpy element types (e.g. numpy.float64)? It's just as annoying, and you still get nothing in return.
  • Repeating myself for emphasis: a numpy.int32 is just as close to a C# int as the Python int is. If it's ok to make one conversion implicit, then why not the other? The numpy element types only exist for performance optimization within numpy itself.
  • I understand that there are downsides to implicit array conversion. They include potential performance issues, as the caller may not realize a heavy conversion is happening. In theory, requiring explicit conversion can also prevent bugs. But we must measure what we gain by what we lose. In this case we kinda lose the whole reason for Python.NET's existence. If I have to write code to convert arrays, it is no longer anything like seamless integration.
  • Python uses the exact opposite of strict typing, while C# is very strict. Python.NET is the glue layer between them. Ironically, people often choose Python as a glue layer between stuff precisely because its typing is so loose. I'm saying it's not such a bad thing to be a bit loose with types in a glue layer, when it adds flexibility and ease of use. As a glue layer between loose and strict, you can choose which world to live in. All the drawbacks of loose typing will already be in all of the Python code, and requiring strictness on C# calls will do little to help all the problems caused by Python's looseness.
  • I'm sure many (most?) people using numpy are blissfully unaware that numpy.float64 even exists. It's a good thing if they never have to learn about it. It's the kind of thing you learn about only when it breaks your code. Try not to be the one that makes people learn something they really don't need to know.

Anyways, at this point, if you won't add implicit conversion of numpy element types, or at least support a way to add codecs for primitive types, I am either going to make my own bastardized version of your code (not a good option), or abandon its use. I have been pushing my team to use Python.NET. Please don't make me regret that.

Thanks for your time.

@filmor
Copy link
Member

filmor commented Sep 15, 2022

Firstly, I appreciate you guys taking the time to answer here.

However, I am a bit frustrated that you seem to not be actually reading what I wrote when you reply. So please read this carefully.

Come on. I already gave you a clear hint that this tone is not appreciated here. Leave the attitude out and let's stick to the technical discussion (where you are making valid points, it's just not an easy topic and lostmsu has valid points for the "other" side).

Our use case is we are converting chunks of a very large Python code base to C#, mostly for performance gains. There's a common situation where a callback function is given to a Python function. These callback functions take numpy arrays as parameters, and can be either Python or C# functions. Therefore, it's not practical to use explicit conversion on the Python side, because we don't know if the function is C# or Python.

To improve performance of this approach /massively/ (as long as there is no implementation for #1838, though there is already a PyBuffer type), you should consider either wrapping these calls manually, passing around pointers, or using https://github.com/fdieulle/pandasnet. If pandasnet is indeed broken as well, I would consider this a bug on our end.

As the (3.0) code is now, codecs are not getting called, so there's basically no way to get this code working without modifying your code base.

This should be fixable as well, I'd consider the list -> Array conversion optional.

You have chosen to move in a direction that requires type conversions to be explicit. While this is in line with modern computer science thought, I'm going to push you to reconsider this direction for your situation. Especially the support for converting numpy arrays.

We didn't choose this direction for some vague CS-academia reason. The problem with implicit conversions is that it gets /very/ difficult to adjust them, like the case that is mentioned here: #1908 (comment). Also mentioned there is the fact that we don't have a proper overload resolution. Overload resolution currently works by iterating through all signatures, checking if all given parameters can be converted one way or another, and then using the first one where that works. Adding a scoring functionality could help, but the code for this is already quite complicated and brittle, simply because Python does not have any notion of overloading.

  • When I was trying out v2.5 for the first time, the first thing I did was try passing a numpy array to a C# function, and it just worked. This was what convinced me to start using this tool! If it didn't work right away, I would have moved on! You can bet that I won't be the only one.

I am not sure how to answer this. We are not providing you a product that we advertise for. We are simply trying to maintain a 15 year old library in a manner that allows us to go further (note that codecs are a relatively new feature, before all conversions had to be explicitly added directly in the codebase, which led to massive grievances e.g. with regard to automated conversions between lists and arrays) while keeping breakage to a minimum. Version 2.4 wouldn't have worked for you either.

  • Your entire product exists to make implicit conversions between Python and C#. I use it because I want to call a function, and not have to think about whether it's C# or Python code I'm calling. Your product bills itself as seamless integration between Python and .NET, but not supporting implicit numpy conversions puts a large seam in that integration.

Again we are just maintaining, we are not trying to sell anything to you. You are more than welcome to help out improving the situation. Your suggestion in the initial issue wouldn't work for the case @lostmsu lined out in the given comment. If you (or we together) can find a good way to fix this situation, we will be happy to merge it, and I consider a solution to this a blocker for 3.0 (which has been dragging on for a while, though, so I'd like to get this fixed soon).

  • Strict typing works and works well within C#, partly because it is strongly typed from end to end, and partly because there are trivial ways to bypass the strictness via things like casting operators. The closest to casting operators in Python is the __float__ member. Given the intended purpose of __float__, it is a complete mystery to me why you'd want to block that conversion.

Then read our answers as well. A PR for __float__ exists, I referenced it already. Since our overload resolution can not prioritise right now, merging it in its current state (apart from the breaking on 3.7 that I still need to investigate) will make selecting the correct overload in ambiguous situations pure luck. The issue is not whether the type system is strict or not, it's whether methods can be overloaded (C#) or not (Python).

  • If you want to be completely pedantic about type conversions, then a Python float is not the same as a C# double. So why not require explicit conversion for that too? The answer is because it would be super annoying, and offer nothing in return. So why did you choose to block the numpy element types (e.g. numpy.float64)? It's just as annoying, and you still get nothing in return.

We don't choose anything here. We don't want to introduce special-casing for Numpy types as that introduces a (weak) dependency on Numpy and we had quite bad experiences with depending on non-builtin types in the past. It would also be very confusing to have this work for Numpy and not for other "floaty" types. Either we fix it for everything with __float__ or not at all.

  • Repeating myself for emphasis: a numpy.int32 is just as close to a C# int as the Python int is. If it's ok to make one conversion implicit, then why not the other? The numpy element types only exist for performance optimization within numpy itself.

I don't think this particular case works for numpy.int32 either.

  • I understand that there are downsides to implicit array conversion. They include potential performance issues, as the caller may not realize a heavy conversion is happening. In theory, requiring explicit conversion can also prevent bugs. But we must measure what we gain by what we lose. In this case we kinda lose the whole reason for Python.NET's existence. If I have to write code to convert arrays, it is no longer anything like seamless integration.

Well, if there are impedance mismatches, it's less seamless. That's how it works for all "inter-language" libraries, and we don't have a magical recipe for that either. Our current direction is that (by default, configuration should be done through codecs) we only want to provide mostly exact matches, by allowing both sides to use facade types. I'll have to read through our code again to see why this is applied "half-way" here, my expectation would be that this doesn't work without a codec. @lostmsu Do we have default codecs set up right now?

  • Python uses the exact opposite of strict typing, while C# is very strict. Python.NET is the glue layer between them. Ironically, people often choose Python as a glue layer between stuff precisely because its typing is so loose. I'm saying it's not such a bad thing to be a bit loose with types in a glue layer, when it adds flexibility and ease of use. As a glue layer between loose and strict, you can choose which world to live in. All the drawbacks of loose typing will already be in all of the Python code, and requiring strictness on C# calls will do little to help all the problems caused by Python's looseness.

This doesn't add anything to the discussion at hand. Python's type system is strong in the sense that an object will never act as if it was of a different type depending on the context. Again, if that hasn't been clear before: I agree with you that this should be solvable via codecs. I disagree that some vague notion of what the library is supposed to be for should dictate the defaults. There is no way how we'd be able to emulate full duck-typing in C#.

  • I'm sure many (most?) people using numpy are blissfully unaware that numpy.float64 even exists. It's a good thing if they never have to learn about it. It's the kind of thing you learn about only when it breaks your code. Try not to be the one that makes people learn something they really don't need to know.

Anyways, at this point, if you won't add implicit conversion of numpy element types, or at least support a way to add codecs for primitive types, I am either going to make my own bastardized version of your code (not a good option), or abandon its use. I have been pushing my team to use Python.NET. Please don't make me regret that.

I'm not commenting further on these two points, but I have read them.

@lostmsu
Copy link
Member

lostmsu commented Sep 15, 2022

At the very least we can allow codec calls before IsArray check or, perhaps better, we could try codecs if and only if ToArray fails.

As for having NumPy arrays autoconvert to C# arrays it has the same issue as the list to array conversion: you can mutate the object in C# but changes will not be reflected in Python.

Generally, I recommend to wrap NumPy types on the C# side (which is what I do and SciSharp does).

@rzindler
Copy link
Author

OK, thanks guys for the thoughtful replies.

Written word does not accurately convey tone. I'm really not trying to give you snark. I appreciate your time. I was mostly frustrated when I said codecs weren't working and the reply was that I should use codecs.

Now moving on in the spirit of collaboration...

Your replies definitely paint a clearer picture to me. In particular the ambiguous overloading is not something I thought of.

As an offhand suggestion, I'd say that rather than dealing with conversion priorities of the codecs, it should just fail if there is more than one possible overload (and none are exact matches of the types). This is what happens in compilers with ambiguous calls to overloaded functions. It is the perfect time to require explicit conversions. This would simplify and clean up a lot. Your priority and grouping scheme could all go away.

Anyways, I'm really mostly interested in implicit conversion of the numpy primitive types (not just float).

As for how I might do it, with the requirement of not depending on numpy being installed, and keeping performance fast. Here are some suggestions:

  • Since you don't want to iterate through the codec list for all primitives, I would suggest having a codec check only after the normal primitive conversion has failed in the big switch inside Converter.ToPrimitive(). The way your code is structured, it means you'll have to have the check in multiple cases, but that shouldn't be too unwieldy. Basically every time there's a goto type_error;, make a check for primitive codecs first. So each case might look something like this:
case TypeCode.xxx:
    if(!XXX_TypeCheck(value))
    {
        if(!TryPrimitiveCodecs(xxx, value, ref result))  // This is the new part
            goto type_error;
    }
    else
        result = (xxx)Runtime.PyConversionXXX(value);
    return true;

This would add no performance overhead for values that don't need codecs.

  • Now as for implementing the conversion codecs of numpy primitives without access to numpy at compile time. I would maybe start with a string compare of the type name. If I find a string match, I would store the type object within the codec object itself in order to speed all subsequent tests (no more string compares). Then I would call the .item() member of the numpy primitive objects to get the actual value (or np.asscalar(value) if .item() doesn't exists for older numpy versions).

Maybe there's a better way, but this would work, would not really slow anything down, and I think would meet your criteria as well.
Am I wrong?

@lostmsu
Copy link
Member

lostmsu commented Sep 15, 2022

I think allowing codecs to run on primitives after ToPrimitive fails is a possible solution. The only problem I see with it is issues with multiple codecs being registered for the same type. E.g. say you rely on a np.float64 <-> double codec, and somebody like SciSharp has a np.float64 <-> SciSharp.NumPy.Float64 codec. Now theirs won't be a problem, because it simply won't be invoked for double parameters, but yours will break overload resolution between Foo(double) and Foo(SciSharp.NumPy.Float64). So I am not convinced.

Why can't you make wrappers for NumPy types with custom codecs? Or maybe even just use SciSharp's NumPy wrapper and go from it?

@rzindler
Copy link
Author

I really think you should throw an exception if there's ambiguity between overloads. Then the whole problem goes away. Anyone making both Foo(double) and Foo(SciSharp.NumPy.Float64) kinda deserves to see an exception.

Also, I was thinking that there could be a separate list of codecs just for primitives. That would speed things up a bit, since there'd be shorter lists to parse, and you could potentially have different interfaces (i.e. simpler for the primitives).

@rzindler
Copy link
Author

About using SciSharp's numpy wrapper, has that been done? Numpy.NET appears to embed Python within it. That's very different than being called from Python. It's not clear to me that it would work. Is there a way for it to use the Python instance it was called from? I couldn't find anything.
Or are you talking about something else besides Numpy.NET?

@lostmsu
Copy link
Member

lostmsu commented Sep 16, 2022

If your C# library depends on Numpy.NET and is correctly deployed, Numpy.NET will be automatically loaded along with it. Mind you though that Numpy.NET currently does not target Python.NET 3.0.0 yet, so some work is required to bring it up to date.

@lostmsu
Copy link
Member

lostmsu commented Sep 16, 2022

BTW, we already have this issue: #1276

@lostmsu
Copy link
Member

lostmsu commented Sep 16, 2022

@rzindler considering @filmor is not planning to release until October, and that Hacktoberfest is coming, I think I might want to spent some time trying to figure out some out-of-box conversion here for popular libraries with exact types. Only for primitives, not for arrays though.

@filmor filmor reopened this Sep 17, 2022
@filmor
Copy link
Member

filmor commented Sep 17, 2022

I'll reopen this, there were plenty of issues with this already. I actually like the suggestion to never try to solve ambiguous situations. I know that this may be a hassle at the beginning, but it has the advantage that one can progress much much easier from that stance than from the current one.

My suggestion would be:

  • If a method is overloaded such that a call is ambiguous in any way (i.e. we try our resolution and it matches for multiple options), raise an AmbiguousOverloadError that contains information on the matching overloads
  • Don't apply any kind of overload resolution if .__overloads__ is used

We could progressively add more rules to resolve ambiguous situations (e.g. count the number of exact matching types, etc.) which should not break any existing code if done carefully.

Our main issue right now (I think) is that we just use the first matching overload and stop trying at that point.

@filmor
Copy link
Member

filmor commented Sep 20, 2022

Since #1908 is merged and released as rc6, I'll close this one again.

@filmor filmor closed this as completed Sep 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants