Skip to content

BUG: Fix dot and inner type/value exception failures #6988

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

jakirkham
Copy link
Contributor

Partial fix: #6990

TL;DR: Bug fix to catch a possible second exception while building an exception message. Includes tests for dot and inner where this bug was first discovered.

History: Initially, this PR began as tests for the current behavior of dot and inner when they have some mismatched types that cannot be coerced into some common working type. However, this resulted in discovering a special class of bugs in C where an exception could occur during the creation of the exception message, but was not being handled. This bug has been carefully described with a list of instances in this bug report ( #6990 ). As a result, this contains a fix for one instance of this bug and tests to verify that this instance is fixed.

@jakirkham
Copy link
Contributor Author

Seeing failure on USE_DEBUG=1 only here, as well. Not sure what is going on. Maybe related to this ( http://bugs.python.org/issue20500 ). Though it is supposedly fixed. Maybe related to this ( #6741 ).

@jakirkham jakirkham force-pushed the test_dot_inner_type_failures_value_failure branch 2 times, most recently from 0c77ec5 to de1e234 Compare January 10, 2016 01:05
@jakirkham
Copy link
Contributor Author

This definitely feels like an interpreter bug. I tried running this with python-dbg version 2.7.11 and did not encounter this error.

@jakirkham
Copy link
Contributor Author

For clarity, this is what I am seeing...

python3-dbg: ../Objects/object.c:463: PyObject_Repr: Assertion `!PyErr_Occurred()' failed.
./tools/travis-test.sh: line 108:  5108 Aborted                 (core dumped) $PYTHON ../tools/test-installed-numpy.py

@seberg
Copy link
Member

seberg commented Jan 10, 2016

It sounds a bit like an error set without an error return (i.e. a C-function does PyErr_SetString, but then does not return NULL). The only thing I can think of is that it wants a PyErr_Clear() before setting the new error, though I don't remember that this was necessary.

@jakirkham
Copy link
Contributor Author

What about returning -1? It seems there are some cases -1 is returned instead.

@seberg
Copy link
Member

seberg commented Jan 10, 2016

Same as NULL in that case, there are two calling conventions for python function, those functions just don't return a python object in general.

@jakirkham
Copy link
Contributor Author

So, I have looked, but nothing obviously sticks out to me. The only thing I can think of is PyArray_DescrFromType returning NULL in some cases, but doesn't use PyErr_SetString in all cases. Here is the one time it is used. (

PyErr_SetString(PyExc_ValueError,
"Invalid data-type for array");
)

@jakirkham
Copy link
Contributor Author

The same can be said for PyArray_FromAny.

@jakirkham
Copy link
Contributor Author

I added PyErr_Clear above PyErr_SetString and it made no difference.

A = np.array((1,1), dtype='i,i')

assert_raises(ValueError, np.dot, c, A)
assert_raises(TypeError, np.dot, A, c)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, I managed to get this built so I can debug with python3-dbg interactively and discovered this line will cause the strange exception. However, the line before does not cause the exception.

@jakirkham
Copy link
Contributor Author

So, the error in the second case looks something like this.

TypeError: Cannot cast array data from dtype([('f0', '<i4'), ('f1', '<i4')]) to dtype('float64') according to the rule 'safe'

Based on what @charris has put forward, my guess is trying to create the representations of these dtypes to fill into the exception is causing the problem. Do either of you know where this is happening?

@jakirkham
Copy link
Contributor Author

In short, things like this are the problem. Somehow the exception needs to be stored/cleared before doing this and then recreated afterwards. (

PyObject_Repr((PyObject *)src_dtype));
)

@jakirkham
Copy link
Contributor Author

I have opened a new issue ( #6990 ) related to this problem.

@jakirkham jakirkham force-pushed the test_dot_inner_type_failures_value_failure branch from 9cfffe6 to 0ef99e0 Compare January 10, 2016 22:13
@jakirkham jakirkham changed the title TST: Test dot and inner type/value failures BUG, TST: Test dot and inner type/value failures Jan 10, 2016
@jakirkham jakirkham force-pushed the test_dot_inner_type_failures_value_failure branch 2 times, most recently from f41fdaf to 21e1e27 Compare January 10, 2016 22:23
@jakirkham
Copy link
Contributor Author

So, I tried to add a bug fix, but I am seeing a segmentation fault. However, I don't see it the first time I test the dot or inner case of this form assert_raises(TypeError, <func>, A, c), but only the second time. Does anything look obviously wrong to either of you with this bug fix?

@jakirkham jakirkham force-pushed the test_dot_inner_type_failures_value_failure branch from 21e1e27 to 860474b Compare January 10, 2016 22:40
@jakirkham
Copy link
Contributor Author

Tried running gdb and it points me to this line ( https://github.com/numpy/numpy/blob/master/numpy/core/src/multiarray/convert_datatype.c#L1010 ). Weird thing is I don't think we are calling that function at all.

@@ -1926,14 +1926,35 @@ PyArray_FromArray(PyArrayObject *arr, PyArray_Descr *newtype, int flags)
/* Raise an error if the casting rule isn't followed */
if (!PyArray_CanCastArrayTo(arr, newtype, casting)) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After some more debugging with gdb apparently the segmentation fault is occurring at this line. The real line responsible is in PyArray_EquivTypes. Somehow PyArray_EquivTypes is getting NULL for its first type, which is the cause of the problem.

@jakirkham
Copy link
Contributor Author

So, this is kind of interesting. If I run the code below, I don't encounter the segmentation fault, but I do see the error message change after the first call. Namely, it changes to the error message raised by PyArray_MatrixProduct2 from the one that we have tried to fix here. The ValueError will continue to show up every time I run it. A similar thing happens with inner. However, if I try to reassign a value to A (not B) it will segment fault.

>>> import numpy
>>> import numpy as np
>>> A = np.array((1,1), dtype='i,i')
>>> B = np.array((1.,1.), dtype='f,f')
>>> np.dot(A, B)
TypeError: Cannot cast array data from dtype([('f0', '<i4'), ('f1', '<i4')]) to dtype('V8') according to the rule 'safe'
>>> np.dot(A, B)
ValueError: Cannot find a common data type.
...
>>> A = np.array((1,1), dtype='i,i')

@jakirkham jakirkham force-pushed the test_dot_inner_type_failures_value_failure branch from 860474b to 2835d7f Compare January 11, 2016 00:13
errmsg = PyUString_FromString("Cannot cast array data from ");
PyUString_ConcatAndDel(&errmsg,
PyObject_Repr((PyObject *)PyArray_DESCR(arr)));
arr_descr = PyArray_DESCR(arr);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think calling Py_DECREF(arr_descr); may have been the cause of the segmentation faults. At the time, I didn't realize PyArray_DESCR returned a borrowed reference. So, I was effectively deleting part of the array, but leaving the rest of the array around. As a result, we saw some weird behavior due to this.

@jakirkham
Copy link
Contributor Author

I didn't realize that PyArray_DESCR shared a reference and was decrefing the returned object. I have fixed this so I am not doing this any more. Segmentation fault appears to be fixed locally. I think this will check out.

@jakirkham
Copy link
Contributor Author

Alright, looks like it checks out. Please let me know if you guys see anything else here that needs changing.

@jakirkham jakirkham force-pushed the test_dot_inner_type_failures_value_failure branch from 2835d7f to 860474b Compare January 11, 2016 00:55
return NULL;
}
arr_descr_repr = PyObject_Repr((PyObject *)arr_descr);
Py_DECREF(arr_descr);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this may have been the cause of the segmentation faults. At the time, I didn't realize this returned a borrowed references. So, I was effectively deleting part of the array, but leaving the rest of the array around. As a result, we saw some weird behavior due to this.

@jakirkham jakirkham force-pushed the test_dot_inner_type_failures_value_failure branch from 860474b to 2835d7f Compare January 11, 2016 00:56
@jakirkham jakirkham force-pushed the test_dot_inner_type_failures_value_failure branch from 2835d7f to b704320 Compare January 11, 2016 05:30
@jakirkham
Copy link
Contributor Author

Failures on AppVeyor ( #6991 ) had not previously occurred for this PR with exception of the segmentation fault issue (also happened on Travis), which has since been resolved. I believe these to be unrelated to the content of this PR.

@jakirkham jakirkham force-pushed the test_dot_inner_type_failures_value_failure branch from b704320 to 031028f Compare January 11, 2016 14:40
@jakirkham
Copy link
Contributor Author

As AppVeyor is merging with master, which is broken, it is currently failing the tests there. So, I ran my own AppVeyor build without this merge (this is rebased on a commit on master before the bad commit) and it looks like it checks out ( https://ci.appveyor.com/project/jakirkham/numpy/build/1.0.2 ).

@njsmith
Copy link
Member

njsmith commented Jan 11, 2016

I'm starting to get really confused at what I'm looking at here -- the PR description says it's new tests, the discussion is about some gnarly C bug fixes... Maybe in the future it would work better to bundle up the test changes and code changes and big fixes into one combined PR so they stay together as a unit? Or maybe I just haven't been paying enough attention, that happens too sometimes :-)

@jakirkham
Copy link
Contributor Author

It is pretty easy to get lost here. This is what happened.

Sorry this is super confusing. It was never my intent to confuse anyone like this. The bug discovered was really surprising and quite subtle. Hope this clarifies what happened. If you have more questions, please ask.

@jakirkham jakirkham changed the title BUG, TST: Test dot and inner type/value failures BUG: Test dot and inner type/value failures Jan 11, 2016
@jakirkham
Copy link
Contributor Author

Updated the PR title and description to better explain what is going on here now.

@njsmith, is this making more sense yet? If not please just ask, this bug was surprisingly tricky to find. The fix ends up being straightforward.

@jakirkham
Copy link
Contributor Author

Maybe in the future it would work better to bundle up the test changes and code changes and big fixes into one combined PR so they stay together as a unit? Or maybe I just haven't been paying enough attention, that happens too sometimes :-)

They are. It's ok. :)

@jakirkham jakirkham changed the title BUG: Test dot and inner type/value failures BUG: Fix dot and inner type/value exception failures Jan 11, 2016
…Object_Repr`. Also, do a better job of handling any errors raised while constructing the error message.
@jakirkham jakirkham force-pushed the test_dot_inner_type_failures_value_failure branch from 031028f to bab118d Compare January 12, 2016 00:42
@charris charris merged commit bab118d into numpy:master Jan 12, 2016
@jakirkham jakirkham deleted the test_dot_inner_type_failures_value_failure branch January 12, 2016 19:11
@jakirkham
Copy link
Contributor Author

Thanks everyone for your help. In particular, thanks to @seberg and @charris for helping me locate and fix this bug.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: Clear errors before contructing error message that use PyObject_Repr
4 participants