-
-
Notifications
You must be signed in to change notification settings - Fork 10.8k
BUG: Fix dot and inner type/value exception failures #6988
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: Fix dot and inner type/value exception failures #6988
Conversation
Seeing failure on |
0c77ec5
to
de1e234
Compare
This definitely feels like an interpreter bug. I tried running this with |
For clarity, this is what I am seeing...
|
It sounds a bit like an error set without an error return (i.e. a C-function does PyErr_SetString, but then does not return NULL). The only thing I can think of is that it wants a PyErr_Clear() before setting the new error, though I don't remember that this was necessary. |
What about returning |
Same as NULL in that case, there are two calling conventions for python function, those functions just don't return a python object in general. |
So, I have looked, but nothing obviously sticks out to me. The only thing I can think of is numpy/numpy/core/src/multiarray/arraytypes.c.src Lines 4442 to 4443 in 5dab927
|
The same can be said for |
I added |
A = np.array((1,1), dtype='i,i') | ||
|
||
assert_raises(ValueError, np.dot, c, A) | ||
assert_raises(TypeError, np.dot, A, c) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, I managed to get this built so I can debug with python3-dbg
interactively and discovered this line will cause the strange exception. However, the line before does not cause the exception.
So, the error in the second case looks something like this.
Based on what @charris has put forward, my guess is trying to create the representations of these |
In short, things like this are the problem. Somehow the exception needs to be stored/cleared before doing this and then recreated afterwards. (
|
I have opened a new issue ( #6990 ) related to this problem. |
9cfffe6
to
0ef99e0
Compare
f41fdaf
to
21e1e27
Compare
So, I tried to add a bug fix, but I am seeing a segmentation fault. However, I don't see it the first time I test the |
21e1e27
to
860474b
Compare
Tried running |
@@ -1926,14 +1926,35 @@ PyArray_FromArray(PyArrayObject *arr, PyArray_Descr *newtype, int flags) | |||
/* Raise an error if the casting rule isn't followed */ | |||
if (!PyArray_CanCastArrayTo(arr, newtype, casting)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After some more debugging with gdb
apparently the segmentation fault is occurring at this line. The real line responsible is in PyArray_EquivTypes
. Somehow PyArray_EquivTypes
is getting NULL
for its first type, which is the cause of the problem.
So, this is kind of interesting. If I run the code below, I don't encounter the segmentation fault, but I do see the error message change after the first call. Namely, it changes to the error message raised by
|
860474b
to
2835d7f
Compare
errmsg = PyUString_FromString("Cannot cast array data from "); | ||
PyUString_ConcatAndDel(&errmsg, | ||
PyObject_Repr((PyObject *)PyArray_DESCR(arr))); | ||
arr_descr = PyArray_DESCR(arr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think calling Py_DECREF(arr_descr);
may have been the cause of the segmentation faults. At the time, I didn't realize PyArray_DESCR
returned a borrowed reference. So, I was effectively deleting part of the array, but leaving the rest of the array around. As a result, we saw some weird behavior due to this.
I didn't realize that |
Alright, looks like it checks out. Please let me know if you guys see anything else here that needs changing. |
2835d7f
to
860474b
Compare
return NULL; | ||
} | ||
arr_descr_repr = PyObject_Repr((PyObject *)arr_descr); | ||
Py_DECREF(arr_descr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this may have been the cause of the segmentation faults. At the time, I didn't realize this returned a borrowed references. So, I was effectively deleting part of the array, but leaving the rest of the array around. As a result, we saw some weird behavior due to this.
860474b
to
2835d7f
Compare
2835d7f
to
b704320
Compare
Failures on AppVeyor ( #6991 ) had not previously occurred for this PR with exception of the segmentation fault issue (also happened on Travis), which has since been resolved. I believe these to be unrelated to the content of this PR. |
b704320
to
031028f
Compare
As AppVeyor is merging with master, which is broken, it is currently failing the tests there. So, I ran my own AppVeyor build without this merge (this is rebased on a commit on |
I'm starting to get really confused at what I'm looking at here -- the PR description says it's new tests, the discussion is about some gnarly C bug fixes... Maybe in the future it would work better to bundle up the test changes and code changes and big fixes into one combined PR so they stay together as a unit? Or maybe I just haven't been paying enough attention, that happens too sometimes :-) |
It is pretty easy to get lost here. This is what happened.
Sorry this is super confusing. It was never my intent to confuse anyone like this. The bug discovered was really surprising and quite subtle. Hope this clarifies what happened. If you have more questions, please ask. |
Updated the PR title and description to better explain what is going on here now. @njsmith, is this making more sense yet? If not please just ask, this bug was surprisingly tricky to find. The fix ends up being straightforward. |
They are. It's ok. :) |
…to a common type.
…into a common type.
…Object_Repr`. Also, do a better job of handling any errors raised while constructing the error message.
031028f
to
bab118d
Compare
Partial fix: #6990
TL;DR: Bug fix to catch a possible second exception while building an exception message. Includes tests for
dot
andinner
where this bug was first discovered.History: Initially, this PR began as tests for the current behavior of
dot
andinner
when they have some mismatched types that cannot be coerced into some common working type. However, this resulted in discovering a special class of bugs in C where an exception could occur during the creation of the exception message, but was not being handled. This bug has been carefully described with a list of instances in this bug report ( #6990 ). As a result, this contains a fix for one instance of this bug and tests to verify that this instance is fixed.