Skip to content

math functions fail confusingly on long integers (and object arrays generally) #368

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
njsmith opened this issue Jul 29, 2012 · 22 comments

Comments

@njsmith
Copy link
Member

njsmith commented Jul 29, 2012

If you pass a 'long' integer greater than sys.maxint to np.sqrt, then it blows up with a confusing error message:

>>> np.__version__
'1.8.0.dev-1234d1c'
>>> np.sqrt(sys.maxint)
3037000499.9760499
>>> np.sqrt(sys.maxint + 1)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: sqrt

This was noticed because it breaks SciPy's kendalltau function: http://mail.scipy.org/pipermail/scipy-user/2012-July/032652.html

It was hard to diagnose because the error message is so misleading.

It looks like what's going on is that ndarray conversion on very large 'long' objects gives up and simply returns an object array. So... that seems reasonable, not sure what else we could do:

>>> np.asarray(sys.maxint + 1)
array(9223372036854775808L, dtype=object)

And then when called on an object array, np.sqrt does its weird ad-hoc fallback thing, and tries calling a .sqrt() method on each object. Which of course doesn't exist, since it's just something we made up. (But this is where the confusing error message comes from.) In fact, our handling of object arrays is pretty broken all around -- we can't even take the square root of float objects:

>>> np.sqrt(np.array(1.0, dtype=object))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: sqrt

The math module versions of sqrt and friends accept long integers:

>>> math.sqrt(sys.maxint + 1)
3037000499.97605
>>> math.cos(sys.maxint + 1)
0.011800076512800236

Mostly this just works by calling PyFloat_AsDouble, which goes via the float method if defined (as it is for longs). However, the math module does have special code for longs in some cases (log in 2.7, maybe more in future versions, who knows):

>>> math.sqrt(sys.maxint ** 100)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
OverflowError: long int too large to convert to float
>>> math.log(sys.maxint ** 100)
4366.827237527655

So in conclusion: np.sqrt and friends, when operating on object arrays, should fall back to the stdlib math functions. (This would be in addition to the current .sqrt method fallback. I guess the current fallback should probably be tried, since anyone defining our ad-hoc methods is presumably doing so specifically because they want numpy to respect that.)

@charris
Copy link
Member

charris commented Jul 29, 2012

Perhaps it would make sense to raise a warning when an object array is created in this way, I suspect it is usually unintended and leads to these unexpected results.

@teoliphant
Copy link
Member

I'm not sure about raising a warning. There are a lot of cases where asarray will return OBJECT arrays as the default. Perhaps we could detect a failing "float" or "int" conversion Having object arrays look for attributes to implement the ufuncs has been a "feature" of Numeric from 1995. I think it would make sense to have the ufuncs which are also in the standard library to call the standard library function. One challenge is that you have cmath and math rather than a unified interface. I suppose if we always used cmath then we would get expected behavior for both real and complex "objects".

@dan-blanchard
Copy link

This would be in addition to the current .sqrt method fallback

If the fallback for sqrt is already implemented, why does np.sqrt raise this exception with longs?

I've been encountering this bug via scipy's kendalltau and would love a workaround.

@njsmith
Copy link
Member Author

njsmith commented Feb 15, 2013

Because the existing fallback is to look for a .sqrt method on the given
objects, but nobody except numpy has ever heard of this method. In
particular standard python objects like long don't implement it. So we
need to add a second fallback. If you'd like to submit a PR then that would
dramatically increase the chance that this gets implemented soon :-)

(The math versus cmath thing is a bit of a red herring, I think, since
np.sqrt already acts like math.sqrt specifically rather than cmath.sqrt,
except when given specifically complex input? np.sqrt(-1) -> nan.)

On Fri, Feb 15, 2013 at 12:57 PM, Dan Blanchard notifications@github.comwrote:

This would be in addition to the current .sqrt method fallback

If the fallback for sqrt is already implemented, why does np.sqrt raise
this exception with longs?

I've been encountering this bug via scipy's kendalltau and would love a
workaround.


Reply to this email directly or view it on GitHubhttps://github.com//issues/368#issuecomment-13627606.

@dan-blanchard
Copy link

Does anyone have a clue where this fallback is implemented? I will gladly take a swing at fixing this, but I cannot for the life of me find where this happens.

@seberg
Copy link
Member

seberg commented Feb 18, 2013

Don't really know it too well, but as a hint, that is what the P in numpy/core/code_generators/generate_umath.py means. The actual code running it is in numpy/core/src/umath/loops.c.src where you can find PyUFunc_O_O_method (which is that P as far as I can see). The best may be to add a specialized ufunc though like the logical object ones at the end of the file, which tries calling item.__pow__(0.5), since I don't think you can pass the 0.5 otherwise, though maybe there are some more general purpose ufunc's there so maybe you can teach one of them to do it too.

Btw. does np.conjugate work for you on object arrays? Doesn't seem to me, but maybe I am just trying wrong right now.

@dan-blanchard
Copy link

I believe I've enumerated all the functions that are broken for objects arrays.

np.logical_not(x)
np.degrees(x)
np.rad2deg(x)
np.radians(x)
np.deg2rad(x)
np.arccos(x)
np.arccosh(x)
np.arcsin(x)
np.arcsinh(x)
np.arctan(x)
np.arctanh(x)
np.cos(x)
np.sin(x)
np.tan(x)
np.cosh(x)
np.sinh(x)
np.tanh(x)
np.exp(x)
np.exp2(x)
np.expm1(x)
np.log(x)
np.log2(x)
np.log10(x)
np.log1p(x)
np.sqrt(x)
np.ceil(x)
np.trunc(x)
np.fabs(x)
np.floor(x)
np.rint(x)
np.fmod(x, x)
np.logical_and(x, x)
np.logical_or(x, x)
np.logical_xor(x, x)
np.arctan2(x, x)
np.hypot(x, x)

all fail with an AttributeError when x = np.array(1.0, dtype=object).

A related issue---but I'm not sure if this one is as much a bug, since there's a somewhat sensible error message---is that all of these:

np.isnan(x)
np.isinf(x)
np.isfinite(x)
np.signbit(x)
np.spacing(x)
np.modf(x)
np.logaddexp(x, x)
np.logaddexp2(x, x)
np.copysign(x, x)
np.nextafter(x, x)

fail with a TypeError like "TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule 'safe'". It is still a little strange that they say they cannot be coerced using casting rule 'safe', since the casting parameter defaults to unsafe.

I believe the AttributeErrors are all being caused by lines 409 and 445 in loops.c.src, which both feature something like this:

PyObject *ret = PyObject_CallMethod(in1 ? in1 : Py_None, meth, NULL);

I think it would make the most sense to change them to something like this:

PyObject *ret = NULL;
if (PyObject_HasAttrString(in1, meth)) {
    ret = PyObject_CallMethod(in1 ? in1 : Py_None, meth, NULL);
} else {
    // Call standard library math function if it exists, otherwise die
}

I'm pretty new to the whole Python C API (although I'm very familiar with Python and C separately) and the Numpy source code, so I'm not sure how best to implement the else part of that conditional statement. Anyone have any ideas/suggestions? Is there a simple way to see if we have a function named meth in the current namespace and call it?

@bfroehle
Copy link
Contributor

bfroehle commented Mar 6, 2013

@dan-blanchard I think the bug you are hitting in scipy.stats.kendalltau was already fixed in scipy/scipy@ce14ddb. Perhaps you should try upgrading to 0.11.0?

@dan-blanchard
Copy link

@bfroehle I actually have already upgraded scipy, but the fix for kendalltau is just a workaround to deal with this more general bug. I'd like to try to rectify the bug in general, because it is bizarre that you can't use any of the above math functions on an array of long integers (or objects).

@ewmoore
Copy link
Contributor

ewmoore commented Mar 7, 2013

@dan-blanchard, You need to do exactly what you would do in python, load the math module, and call the appropriate function. An example of doing something similar from the ndarray.dot method is:

static PyObject *
array_dot(PyArrayObject *self, PyObject *args, PyObject *kwds)
{
    PyObject *fname, *ret, *b, *out = NULL;
    static PyObject *numpycore = NULL;
    char * kwords[] = {"b", "out", NULL };

    if (!PyArg_ParseTupleAndKeywords(args, kwds, "O|O", kwords, &b, &out)) {
        return NULL;
    }

    /* Since blas-dot is exposed only on the Python side, we need to grab it
* from there */
    if (numpycore == NULL) {
        numpycore = PyImport_ImportModule("numpy.core");
        if (numpycore == NULL) {
            return NULL;
        }
    }
    fname = PyUString_FromString("dot");
    if (out == NULL) {
        ret = PyObject_CallMethodObjArgs(numpycore, fname, self, b, NULL);
    }
    ret = PyObject_CallMethodObjArgs(numpycore, fname, self, b, out, NULL);
    Py_DECREF(fname);
    return ret;
}

As you can see the module is imported using PyImport_ImportModule and the reference is stored in a static variable so that it will be available for future calls. Then a PyObject containing the function name is constructed (using PyUString_FromString, a numpy macro that calls PyString_FromString on Python2 and PyUnicode_FromString on Python3.) and used to call the appropriate function, by calling PyObject_CallMethodObjArgs. In your case, you'll probably want to do the function look up yourself, and then call PyObject_CallFunctionObjArgs to save the look up overhead since you're looping.

There are two wrinkles I see for doing this here: 1) not everything you might want is located in the math module, some are in operator, some have different names and some don't exist AFAIK, so you'll need a mapping. 2) Object arrays can be heterogeneous, what happens when an exception is raised for one object? I.E.

a = np.array(['foo', 37L, 0.37], np.object)
np.sqrt(a) # what happens?

@ewmoore
Copy link
Contributor

ewmoore commented Mar 7, 2013

Also, unless math vs. cmath is handled you'll leave a different confusing corner case than what we have now:

import math, cmath

math.sqrt(47) # gives real
Out[2]: 6.855654600401044

cmath.sqrt(47) # gives complex
Out[3]: (6.855654600401044+0j)

math.sqrt(47 + 10j) # raises
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-4-ecff6b007476> in <module>()
----> 1 math.sqrt(47 + 10j) # raises

TypeError: can't convert complex to float

cmath.sqrt(47 + 10j) # gives complex 
Out[5]: (6.893912354640718+0.7252775699467939j)

Perhaps just calling the math version is the best solution, since otherwise we need to do type checking on each object. Although maybe such typechecking is necessary.

@dan-blanchard
Copy link

@ewmoore Thank you so much for the tips. That should be more than enough for me to get a handle on this.

  1. Object arrays can be heterogeneous, what happens when an exception is raised for one object? I.E.
a = np.array(['foo', 37L, 0.37], np.object)
np.sqrt(a) # what happens?

I think that in that case we would want to raise a TypeError (or maybe a ValueError since the expected type is just object here) and say something like "Element 0 in array does not support sqrt". The exact wording would be a little tricky, because we first check if the object has a method named sqrt, but then we call math.sqrt as a fallback.

Perhaps just calling the math version is the best solution, since otherwise we need to do type checking on each object. Although maybe such typechecking is necessary.

I'm inclined to think that typechecking is the better solution here, but it's obviously slower.

@njsmith
Copy link
Member Author

njsmith commented Mar 8, 2013

The speed issues seems irrelevant to me -- checking the type is not really
going to be noticeable by the time we're already calling Python functions,
checking for methods and discovering that they don't exist, etc.

One possible objection to type-checking is that regular python never
chooses between math and cmath based on the type of the argument... math
and cmath work differently even for floats:

In [3]: math.sqrt(-1)
ValueError: math domain error

In [4]: cmath.sqrt(-1)
Out[4]: 1j

OTOH, there's:

In [9]: (-1) ** 0.5
ValueError: negative number cannot be raised to a fractional power

In [10]: complex(-1) ** 0.5
Out[10]: (6.123031769111886e-17+1j)

And our goal here isn't to replicate math.sqrt or cmath.sqrt, but to
implement numpy.sqrt's semantics for arbitrary objects. And np.sqrt does
dispatch based on the type:

In [11]: np.sqrt(-1)
/home/njs/.user-python2.7-64bit/bin/ipython:1: RuntimeWarning: invalid
value encountered in sqrt
#!/home/njs/.user-python2.7-64bit/bin/python
Out[11]: nan

In [12]: np.sqrt(complex(-1))
Out[12]: 1j

So I guess I'm +1 on type-checking.

On Thu, Mar 7, 2013 at 4:51 PM, Dan Blanchard notifications@github.comwrote:

@ewmoore https://github.com/ewmoore Thank you so much for the tips.
That should be more than enough for me to get a handle on this.

  1. Object arrays can be heterogeneous, what happens when an exception is
    raised for one object? I.E.

a = np.array(['foo', 37L, 0.37], np.object)np.sqrt(a) # what happens?

I think that in that case we would want to raise a TypeError (or maybe a
ValueError since the expected type is just object here) and say something
like "Element 0 in array does not support sqrt". The exact wording would be
a little tricky, because we first check if the object has a method named
sqrt, but then we call math.sqrt as a fallback.

Perhaps just calling the math version is the best solution, since
otherwise we need to do type checking on each object. Although maybe such
typechecking is necessary.

I'm inclined to think that typechecking is the better solution here, but
it's obviously slower.


Reply to this email directly or view it on GitHubhttps://github.com//issues/368#issuecomment-14571953
.

@njsmith
Copy link
Member Author

njsmith commented Mar 8, 2013

Oh, and for errors, I think the logic we want is:

import numbers
def sqrt_fallback(obj):
if hasattr(obj, "sqrt"):
# Any exception here is allowed to escape:
return obj.sqrt()
transformed_obj = None
mod = None
try:
transformed_obj = float(obj)
mod = math
except TypeError:
try:
transformed_obj = complex(obj)
mod = cmath
except TypeError:
raise TypeError("object has no .sqrt() method, and isn't
convertible to float or complex")
# Any exception here is allowed to escape
return mod.sqrt(transformed_obj)

(We may well want to literally implement the whole fallback lookup logic in
Python as above, though maybe as a somewhat generalized function that takes
the function name as a parameter instead of copying all that code above for
all the different ufuncs, like ufunc_obj_fallback(ufunc_name, obj). And
then the C level code could just call this Python function unconditionally.)

On Fri, Mar 8, 2013 at 10:32 PM, Nathaniel Smith njs@pobox.com wrote:

The speed issues seems irrelevant to me -- checking the type is not really
going to be noticeable by the time we're already calling Python functions,
checking for methods and discovering that they don't exist, etc.

One possible objection to type-checking is that regular python never
chooses between math and cmath based on the type of the argument... math
and cmath work differently even for floats:

In [3]: math.sqrt(-1)
ValueError: math domain error

In [4]: cmath.sqrt(-1)
Out[4]: 1j

OTOH, there's:

In [9]: (-1) ** 0.5
ValueError: negative number cannot be raised to a fractional power

In [10]: complex(-1) ** 0.5
Out[10]: (6.123031769111886e-17+1j)

And our goal here isn't to replicate math.sqrt or cmath.sqrt, but to
implement numpy.sqrt's semantics for arbitrary objects. And np.sqrt does
dispatch based on the type:

In [11]: np.sqrt(-1)
/home/njs/.user-python2.7-64bit/bin/ipython:1: RuntimeWarning: invalid
value encountered in sqrt
#!/home/njs/.user-python2.7-64bit/bin/python
Out[11]: nan

In [12]: np.sqrt(complex(-1))
Out[12]: 1j

So I guess I'm +1 on type-checking.

On Thu, Mar 7, 2013 at 4:51 PM, Dan Blanchard notifications@github.comwrote:

@ewmoore https://github.com/ewmoore Thank you so much for the tips.
That should be more than enough for me to get a handle on this.

  1. Object arrays can be heterogeneous, what happens when an exception is
    raised for one object? I.E.

a = np.array(['foo', 37L, 0.37], np.object)np.sqrt(a) # what happens?

I think that in that case we would want to raise a TypeError (or maybe
a ValueError since the expected type is just object here) and say something
like "Element 0 in array does not support sqrt". The exact wording would be
a little tricky, because we first check if the object has a method named
sqrt, but then we call math.sqrt as a fallback.

Perhaps just calling the math version is the best solution, since
otherwise we need to do type checking on each object. Although maybe such
typechecking is necessary.

I'm inclined to think that typechecking is the better solution here, but
it's obviously slower.


Reply to this email directly or view it on GitHubhttps://github.com//issues/368#issuecomment-14571953
.

@jjhelmus
Copy link
Contributor

jjhelmus commented Nov 3, 2015

The original issue reported was fixed by 6966bdc but the underlying issue is still present (at least as of version 1.11.0.dev0+2bc5cb9, which can be seen in the following snippet:

>>> import numpy
>>> numpy.version.version
'1.11.0.dev0+2bc5cb9'
>>> numpy.sqrt(2**64)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'long' object has no attribute 'sqrt'

@eric-wieser
Copy link
Member

As of #12700, the error message has been improved.

@njsmith, do you think this can now be closed?

@harrow
Copy link

harrow commented Apr 14, 2021

In my opinion it's still very confusing. If the number is bigger or smaller than 2^64 python always calls it an "int" but in the former case it fails and in the latter case it works. See below for an example

>>> import numpy as np
>>> np.sqrt(np.math.factorial(20))
1559776268.6284978
>>> np.sqrt(np.math.factorial(21))
AttributeError: 'int' object has no attribute 'sqrt'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: loop of ufunc does not support argument 0 of type int which has no callable sqrt method

>>> type(np.math.factorial(20))
<class 'int'>
>>> type(np.math.factorial(21))
<class 'int'>

@jreyesco16
Copy link

Hello everyone,

I'm new to open source and I would like to contribute to resolving this issue. Not sure where to start but I'll read the thread thus far to catch myself up to speed. Please let me know how I can be of use.

@mattip
Copy link
Member

mattip commented Jun 19, 2023

@jreyesco16 welcome. Take a look at the contributors page for some hints on how to get going.

@JohannaTrost
Copy link

Hi all,
this topic was also discussed in #13666 and #13875 (both closed).

@harrow I agree that (at least for a regular Numpy user like myself) 'int' object has no attribute 'sqrt' plus the following TypeError message is confusing!

For anyone encountering this problem now, a simple conversion avoids the error, e.g. np.sqrt(np.float32(2**64)).
Here @eric-wieser explains that when you do e.g.:

>>> np.sqrt(2**64)
AttributeError: 'int' object has no attribute 'sqrt'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
 File "<stdin>", line 1, in <module>
TypeError: loop of ufunc does not support argument 0 of type int which has no callable sqrt method
  • 2**64 is not an array and is coerced to one
  • 2**64 is too large to fit in an int64 array, so an object array type is chosen
  • np.sqrt(object_array) is only supported if the objects within have a sqrt method
  • so you'd get the same error on np.sqrt(np.array(1, dtype=object))

Would it be possible to have Numpy do the conversion internally or output a more comprehensive error message?

@njsmith I am new here, would it be appropriate to open a new issue and close this one, since things have developed in the meantime?

luyahan pushed a commit to plctlab/numpy that referenced this issue Apr 25, 2024
feat: Add vrecpe[q]_f64 and vrsqrte[q]_f64
@aerobio
Copy link

aerobio commented Jun 20, 2024

Another confusing case, dealing with very large integers, which require using object arrays:

>>> a = np.array([1, -2], dtype=object)
>>> b = np.array([-5, 3], dtype=object)
>>> np.copysign(a, b)
Traceback (most recent call last):
[...]
TypeError: ufunc 'copysign' not supported for the input types, and the inputs could not be safely coerced
to any supported types according to the casting rule ''safe''

Specifying the casting option doesn't help:

>>> np.copysign(a, b, casting='unsafe')
[...]
TypeError: ufunc 'copysign' not supported for the input types, and the inputs could not be safely coerced
to any supported types according to the casting rule ''safe''

It's unclear whether the function doesn't support the input types (why?) or there is a bug related to the casting argument.

@rkern
Copy link
Member

rkern commented Jun 20, 2024

copysign doesn't support dtype=object. Most objects don't have signs, and even for the ones that do, there isn't a real way to write an implementation for the dtype=object case that handles all of them.

If you want a copysign that works with actual builtin int objects, one could probably write a new DType that is basically dtype=object in terms of storage but only allows int objects, so you can implement the ufunc loops knowing what each object is.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests