Skip to content

tp_compare warning on sort in Python 2.6/2.7 with OSX with object array of length >= 20 with objects that raise TypeError #3879

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jtratner opened this issue Oct 7, 2013 · 14 comments

Comments

@jtratner
Copy link

jtratner commented Oct 7, 2013

We (@cpcloud and I) were investigating this strange problem with pandas where we were getting RuntimeWarning: tp_compare didn't return -1 or -2 for exception only on OSX. We narrowed down the issue all the way to this specific set of circumstances (it may occur other ways, but this at least triggers it). Given that it still fails with an Exception, not sure if it actually matters:

  • Definitely occurs on recent Mac OSX (not sure if version of OSX matters). Does not occur on Arch Linux 64 bit.
  • Python 2.6 or 2.7 (doesn't occur with Python 3)
  • Must be object array with length >= 20 (and at least through 1,000,000)
  • Must raise some kind of error with comparison
  • doesn't matter if builtin class (like datetime.datetime or a Cythonized cdef'd class) or a pure Python class

Here's an example:

import numpy as np
class Raiser(object):
    def raises_anything(*args, **kwargs):
        raise TypeError("SOMETHING ERRORED")
    __eq__ = __ne__ = __lt__ = __gt__ = __ge__ = __le__ = raises_anything
arr = np.array([Raiser() for n in range(10)] + list(range(10)))
arr.sort()

generates an Exception and a warning about tp_compare:

test5.py:8: RuntimeWarning: tp_compare didn't return -1 or -2 for exception
  arr.sort()
Traceback (most recent call last):
  File "test5.py", line 8, in <module>
    arr.sort()
  File "test5.py", line 4, in raises_anything
    raise TypeError("SOMETHING ERRORED")
TypeError: SOMETHING ERRORED

Same thing occurs if you create an array of datetimes + integers.

We're not sure what actually causes the warning, but we're thinking the reason length matters has to do with the choice of sort algorithm:

#define SMALL_QUICKSORT 15
#define SMALL_MERGESORT 20
#define SMALL_STRING 16
@jtratner
Copy link
Author

jtratner commented Oct 7, 2013

Occurs with numpy 1.7.1 and 1.6

@jtratner
Copy link
Author

jtratner commented Oct 7, 2013

Occurs on '1.9.0.dev-ec3603f' too.

@juliantaylor
Copy link
Contributor

when its done comparing the objects which always fail (and set PyErr) it starts comparing the integers which returns -1, 0, 1. But the PyErr is still set which causes python warn on the 1 return.
looks like the OBJECT_compare should clear the python error after its done.

also happens on linux if one reverses the integer list (mac quicksort probably compares in reverse order sometimes)

@juliantaylor
Copy link
Contributor

clearing the error in OBJECT_compare obviously can't be done just for quicksort, maybe one should disable runtime warnings in before npy_quicksort and restore later, or add a special OBJECT_compare which clears python errors to be used by only by quicksort.

@njsmith
Copy link
Member

njsmith commented Oct 7, 2013

Why do we want to clear the error? Shouldn't we just bail out early if a
comparison fails?
On 7 Oct 2013 13:52, "Julian Taylor" notifications@github.com wrote:

clearing the error in OBJECT_compare obviously can't be done just for
quicksort, maybe one should disable runtime warnings in before
npy_quicksort and restore later, or add a special OBJECT_compare which
clears python errors to be used by only by quicksort.


Reply to this email directly or view it on GitHubhttps://github.com//issues/3879#issuecomment-25805639
.

@jtratner
Copy link
Author

jtratner commented Oct 7, 2013

Your comment made me wonder if sorting that causes an error causes the array to be left in a different state than it was before (at least on Windows with numpy 1.6, it does). Not sure if that's an issue.

Apologies that it's a bit hard to read - just note that the integers move.

import numpy as np
import random
class Raiser(object):
    def raises_anything(*args, **kwargs):
        raise TypeError("SOMETHING ERRORED")
    __eq__ = __ne__ = __lt__ = __gt__ = __ge__ = __le__ = raises_anything
arr = np.array([Raiser() for n in range(10)] + list(range(10)))
random.shuffle(arr)


arr
Out[6]: 
array([<__main__.Raiser object at 0x0000000007F7BA20>,
       <__main__.Raiser object at 0x0000000007F7B908>, 5, 8,
       <__main__.Raiser object at 0x0000000007F7B940>, 9,
       <__main__.Raiser object at 0x0000000007F7B978>,
       <__main__.Raiser object at 0x0000000007F7B9E8>, 3,
       <__main__.Raiser object at 0x0000000007F7BA58>, 2,
       <__main__.Raiser object at 0x0000000007F7BA90>,
       <__main__.Raiser object at 0x0000000007F7B9B0>, 1,
       <__main__.Raiser object at 0x0000000007F7B898>, 6, 7, 4, 0,
       <__main__.Raiser object at 0x0000000007F7BAC8>], dtype=object)

original = arr.copy()

arr.sort()
-c:1: RuntimeWarning: tp_compare didn't return -1 or -2 for exception
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-8-f2555d8c7d46> in <module>()
----> 1 arr.sort()

<ipython-input-1-1f1f20d5336e> in raises_anything(*args, **kwargs)
      2 class Raiser(object):
      3     def raises_anything(*args, **kwargs):
----> 4         raise TypeError("SOMETHING ERRORED")
      5     __eq__ = __ne__ = __lt__ = __gt__ = __ge__ = __le__ = raises_anything
      6 arr = np.array([Raiser() for n in range(10)] + list(range(10)))

TypeError: SOMETHING ERRORED

arr
Out[9]: 
array([<__main__.Raiser object at 0x0000000007F7B908>, 5, 8,
       <__main__.Raiser object at 0x0000000007F7B940>, 9,
       <__main__.Raiser object at 0x0000000007F7B978>,
       <__main__.Raiser object at 0x0000000007F7B9E8>,
       <__main__.Raiser object at 0x0000000007F7BA20>, 3,
       <__main__.Raiser object at 0x0000000007F7BA58>, 2,
       <__main__.Raiser object at 0x0000000007F7BA90>,
       <__main__.Raiser object at 0x0000000007F7B9B0>, 1,
       <__main__.Raiser object at 0x0000000007F7B898>, 6, 7, 4, 0,
       <__main__.Raiser object at 0x0000000007F7BAC8>], dtype=object)

original
Out[10]: 
array([<__main__.Raiser object at 0x0000000007F7BA20>,
       <__main__.Raiser object at 0x0000000007F7B908>, 5, 8,
       <__main__.Raiser object at 0x0000000007F7B940>, 9,
       <__main__.Raiser object at 0x0000000007F7B978>,
       <__main__.Raiser object at 0x0000000007F7B9E8>, 3,
       <__main__.Raiser object at 0x0000000007F7BA58>, 2,
       <__main__.Raiser object at 0x0000000007F7BA90>,
       <__main__.Raiser object at 0x0000000007F7B9B0>, 1,
       <__main__.Raiser object at 0x0000000007F7B898>, 6, 7, 4, 0,
       <__main__.Raiser object at 0x0000000007F7BAC8>], dtype=object)

@juliantaylor
Copy link
Contributor

Why do we want to clear the error? Shouldn't we just bail out early if a comparison fails?

we can't as we are using C quicksort for object arrays which has no early bailout option. We could reimplement it of course.

The only way to prevent objects moving is to do a copy before sorting, which is explicitly what one does not want to do with inplace sorting. If the array should be unchanged after a failure the user must use a out of place sort (np.sort)
It is same with pure python inplace sorting.

@jtratner
Copy link
Author

jtratner commented Oct 7, 2013

The only way to prevent objects moving is to do a copy before sorting,
which is explicitly what one does not want to do with inplace sorting. If
the array should be unchanged after a failure the user must use a out of
place sort (np.sort)
It is same with pure python inplace sorting.

Okay, just wanted to check. Thanks for clarifying!

@cpcloud
Copy link

cpcloud commented Oct 7, 2013

@juliantaylor Is that the reason why npy_quicksort always returns 0? Does that matter?

@njsmith
Copy link
Member

njsmith commented Oct 7, 2013

Yeah, I think objects moving is a non-issue, this isn't an ACID database
and failing compares are pretty weird.

Re: lack of bailout: okay, but then we should still propagate the error
sooner or later? Or is the issue that as soon as we re-enter the
interpreter on the next comparison function, it gets annoyed when it
discovers the lingering exception?

On Mon, Oct 7, 2013 at 3:56 PM, Julian Taylor notifications@github.comwrote:

Why do we want to clear the error? Shouldn't we just bail out early if a
comparison fails?

we can't as we are using C quicksort for object arrays which has no early
bailout option. We could reimplement it of course.

The only way to prevent objects moving is to do a copy before sorting,
which is explicitly what one does not want to do with inplace sorting. If
the array should be unchanged after a failure the user must use a out of
place sort (np.sort)
It is same with pure python inplace sorting.


Reply to this email directly or view it on GitHubhttps://github.com//issues/3879#issuecomment-25815223
.

@juliantaylor
Copy link
Contributor

@cpclout yes it uses qsort which does not have a return value, the error is propagated via pythons global error state instead.

@njsmith the latter, after each comparison cpython goes into adjust_tp_compare(comparison_result)
the error from the previous failing comparison still lingers causing it to warn about correct results from the current non failing comparison. It is assuming you bail out on error or reset it instead of continuing comparing stuff.
From what I can tell nothing is actually broken by this, you just get a spurious warning.
python3 doesn't seem to care about it (numpy there it treats the error cases as compared equal and errors out later)

@cpcloud
Copy link

cpcloud commented Oct 7, 2013

python3 doesn't seem to care about it

I think this is because tp_compare isn't allowed in Py3. There are uses of the convenience PyObject_Cmp function but tp_compare seems to be only used in with numpy scalars in Py2

@charris
Copy link
Member

charris commented Oct 7, 2013

We could reimplement it of course.

I've been planning to do that for the sort functions for several years now ;) All as part of cleaning up the ugly calling code for those sorting functions that get passed a comparison function.

@charris
Copy link
Member

charris commented Oct 7, 2013

we can't as we are using C quicksort for object arrays

I do have a numpy version of quicksort that has been sitting in a branch for the last 18 mos that, IIRC, does fix that problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants