-
-
Notifications
You must be signed in to change notification settings - Fork 10.8k
MAINT: Use AxisError in more places #8843
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
numpy/core/src/multiarray/common.h
Outdated
} | ||
|
||
/* Invoke the AxisError constructor */ | ||
exc = PyObject_CallObject(AxisError_cls, args); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implementing the string formatting in C seemed like way too much effort for little gain - no need for performance once something starts going wrong
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wait I should just use PyObject_CallFunction
here
assert_raises(ValueError, np.percentile, d, axis=(1, 1), q=25) | ||
assert_raises(ValueError, np.percentile, d, axis=(-1, -1), q=25) | ||
assert_raises(ValueError, np.percentile, d, axis=(3, -1), q=25) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These test the better duplicate-axis detection
assert_raises(ValueError, gradient, x, axis=3) | ||
assert_raises(ValueError, gradient, x, axis=-3) | ||
assert_raises(TypeError, gradient, x, axis=[1,]) | ||
assert_raises(np.AxisError, gradient, x, axis=3) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These exceptions were found by temporarily removing the base classes from AxisError
, so that tests for ValueError
and IndexError
would not catch them.
@@ -4764,6 +4748,8 @@ def delete(arr, obj, axis=None): | |||
else: | |||
return arr.copy(order=arrorder) | |||
|
|||
axis = normalize_axis_index(axis, ndim) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This guards against IndexError
np.moveaxis, x, -4, 0) | ||
assert_raises_regex(np.AxisError, 'invalid axis .* `destination`', | ||
assert_raises_regex(np.AxisError, 'destination.*out of bounds', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Error message comparison:
ValueError: invalid axis for this array in `destination` argument
AxisError: destination: axis 5 is out of bounds for array of dimension 3
c850df2
to
2847ada
Compare
assert_raises(TypeError, gradient, x, axis=[1,]) | ||
assert_raises(np.AxisError, gradient, x, axis=3) | ||
assert_raises(np.AxisError, gradient, x, axis=-3) | ||
# assert_raises(TypeError, gradient, x, axis=[1,]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This used to error, and ideally it would still error for consistency with ufunc.reduce
. But all the functions using _ureduce
do not error in this situation, so we've already failed at consistency.
Perhaps we should add a deprecation warning (in a future commit) about passing lists of axes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great! only rather minor comments.
numpy/core/_internal.py
Outdated
|
||
# do the string formatting here, to save work in the C code | ||
else: | ||
msg = "axis {} is out of bounds for array of dimension {}".format( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpicking, but I find the following much more readable
msg = ("axis {} is out of bounds for array of dimension {}"
.format(axis, ndim))
numpy/core/numeric.py
Outdated
def _validate_axis(axis, ndim, argname): | ||
def normalize_axis_tuple(axis, ndim, argname=None, allow_duplicate=False): | ||
""" | ||
Normalizes a tuple of axes, such that they are valid positive indices |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should have a single line as "title" (with a full stop at the end)
numpy/core/numeric.py
Outdated
A prefix to put before the error message, typically the name of the | ||
argument | ||
allow_duplicate : bool, optional | ||
If True, the default, disallow an axis from being specified twice |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If False
!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops!
numpy/core/numeric.py
Outdated
Parameters | ||
---------- | ||
axis : int, tuple of int, or list of int | ||
The un-normalized index or indices of the axis. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The explanation should have a full stop at the end (here and below); I think this actually matters for sphinx rendering...
numpy/core/numeric.py
Outdated
axis = [normalize_axis_index(ax, ndim, argname) for ax in axis] | ||
if not allow_duplicate and len(set(axis)) != len(axis): | ||
if argname: | ||
raise ValueError('repeated axis in `%s` argument' % argname) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May as well change to new-style formatting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sensible
@@ -1544,17 +1542,59 @@ def rollaxis(a, axis, start=0): | |||
return a.transpose(axes) | |||
|
|||
|
|||
def _validate_axis(axis, ndim, argname): | |||
def normalize_axis_tuple(axis, ndim, argname=None, allow_duplicate=False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I slightly wonder -- mostly because of #8819 -- whether it would be useful to have this in C...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
whether it would be useful to have this in C
Yes, I think it would - and I think we even already have it for ufunc.reduce
. I think this PR is a good stepping stone to implementing that, since it pulls all the python codepaths through one point for parsing axes - that point can now be moved to C later.
2847ada
to
cb8f713
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, all fixed up
@@ -631,4 +631,17 @@ class TooHardError(RuntimeError): | |||
pass | |||
|
|||
class AxisError(ValueError, IndexError): | |||
pass | |||
""" Axis supplied was invalid. """ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now with a docstring, which Ipython shows when you type an exception name. This is in a similar style to the builtin ones.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be in numpy/_globals.py
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And imported into numpy/__init__.py
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I put it here in #8584 because it needs to be imported from C code, and that's what _internals
seems to be used for. Is it safe to load _globals.py
from C?
AxisError
is already visible at the global scope as np.AxisError
, as it goes through the __all__
chain.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can import from practically anything. We also import from multiarray itself into C code. I'd import this from '"numpy"', The _globals
is for singletons that may also be used by python code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should TooHardError
move there too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right now, numeric.py
has from ._internal import TooHardError, AxisError
core/__init__.py
also contains from . import _internal # for freeze programs
, for some reason
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably, but it can be done in a separate PR as the imports also need to be changed. Maybe _globals.py
should be renamed _global_singletons
or _global_exceptions
;) There are other errors scattered about that maybe we should move at some point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or maybe _numpy_exceptions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you want me to do both in a separate PR? AxisError
being in that file is not new to this PR.
(either way, fix coming for something else, so don't merge yet!)
☔ The latest upstream changes (presumably #8861) made this pull request unmergeable. Please resolve the merge conflicts. |
cb8f713
to
17eec08
Compare
Moving the string formatting to python makes this a lot easier
They were only separate before due to a requirement of a better error message
Previously we did not, as this wouldn't say which axis argument was the cause of the problem.
This also means that its axis argument invokes operator.index like others do. _validate_axis currently accepts lists of axes, which is a bug.
Interestingly np.roll allows axes to be repeated.
This fixes an omission where duplicate axes would only be detected when positive
17eec08
to
23e4cce
Compare
*axis, ndim); | ||
/* Invoke the AxisError constructor */ | ||
exc = PyObject_CallFunction(AxisError_cls, "iiO", | ||
*axis, ndim, msg_prefix); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was the way I should have written this the first time...
Ok, this is rebased over merge conflicts, using more appropriate bits of the python API to construct the exception, and passes. As @charris points out, this exception might be in the wrong place. But that:
So should probably be moved to another issue/PR? |
numpy/core/numeric.py
Outdated
argname) | ||
if len(set(axis)) != len(axis): | ||
raise ValueError('repeated axis in `%s` argument' % argname) | ||
axis = [normalize_axis_index(ax, ndim, argname) for ax in axis] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the docstring, you write that you'll return a tuple, but you actually construct a list. Just replace [...] with
tuple(...)` (also in the try/except above, for completeness.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, that's a bug.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done - fixed to return a tuple. This is actually pretty important, given that things like sum
require axis
to be a tuple, so would previously not work for any kind of normalize-and-forward usage.
Thanks for the catch!
static NPY_INLINE int | ||
check_and_adjust_axis(int *axis, int ndim) | ||
{ | ||
return check_and_adjust_axis_msg(axis, ndim, Py_None); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you have to Py_INCREF(Py_None)
before passing it on. Since you then would have to Py_DECREF
it as well, I suggest passing in NULL
here, and doing the translation to Py_None
above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Py_None
is garantueed to exist and as long as it doesn't escape to the user and it isn't decref'd everything should be fine. PyObject_CallFunction
increments it's reference count for the exception so what the user gets is a new reference.
Using NULL
seems cleaner though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't need to here, because check_and_adjust_axis_msg
takes a borrowed reference. It doesn't actually need a refcount at all until it makes it to the python code, and PyObject_CallFunction()
increments the refcount for me, AFAICT.
It took a lot of iteration to get this working, and attempts at doing what you describe only caused segfaults
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using NULL seems cleaner though.
Why? We already need to handle Py_None
being passed in from user code. Furthermore, when we get Py_None
from usercode, it's a borrowed reference - so like here, it doesn't need an incref unless it escapes to python code.
The only thing that's unclear to me is whether I need to INCREF
msg_prefix
on the line immediately before PyObject_CallFunction
. I think the answer is no, (it was no when I used Py_Buildvalue here previously), but that's not clear from the documentation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@eric-wieser How would it cause segfaults? You could just initialize it as NULL
and when you call the "exception" you do PyObject_CallFunction(AxisError_cls, "iiO", *axis, ndim, msg_prefix?msg_prefix:Py_None);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Segfaults came from other attempts at refcounting. I wasn't aiming to be passing null originally. Sorry if that was unclear. You're correct, what you have would add support for passing null, and would work correctly assuming that this works correctly - but I see no purpose in adding that as a special case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that's the reason why it's generally not a good idea to pass borrowed references around. But I didn't check the numpy source code, maybe it's common here. In that case please ignore my comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess my argument would be that if I manually inlined the contents of check_and_adjust_axis_msg
into normalize_axis_index
, you would not expect me to add any additional Py_INCREF
s
I'm not sure what you mean there - can you elaborate on exactly why it is a bad idea? (and perhaps link me to some recommendations for working with python refcounting)
Counter-point to it generally not being a good idea - every function in a PyMethodDef
object is passed borrowed references - so python seems to favor it internally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, this goes a bit off-topic. I still think the code is correct wrt reference counting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some confusion added here by my first comment being written before yours appeared (and not directed at you), I think :).
int axis; | ||
int ndim; | ||
PyObject *msg_prefix = Py_None; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here, again, I think you cannot just use Py_None
. Also solve by initlializing to NULL
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is definitely correct. PyArg_ParseTuple
does not INCREF
O
arguments, so this line is exactly equivalent to passing None
from python.
The question remains if I should INCREF
before calling check_and_adjust_axis_msg
, but let's discuss that above.
23e4cce
to
17466ad
Compare
@eric-wieser - I think you convinced me that your approach works ref-counting wise, but I'm still not quite convinced it is the handiest. My suggestion would be to simply pass string around in the C facing routines, i.e., make Anyway, it is not a big deal. |
Seems reasonable. Although I didn't find myself needing any extra info in the C code, since everything there seems to be working with a single axis argument. So I think we do that if we need it, and otherwise don't bother.
Alternatively, the C code could just have a static pre-allocated error-message string
You were right to question it - it's something I'm not too experienced in, so the default there should be to assume I'm wrong! |
looks good to me |
The only issue with this is that if you leave this in place, if anyone in the C code is going to use |
I think that if anyone doing that themselves finds they do want to be able to pass a c string, they'll be in a more capable decision to decide if they should just do it in place. We already create python strings in lots of places for error messages. Besides, to support this, I think we'd need a |
OK, fair enough. Since both @juliantaylor and I approved this, let's just go ahead and merge. |
Thanks @mhvk! |
Behavioural changes:
AxisError
instead ofIndexError
orValueError
- but this is backwards-compatible, sinceAxisError
derives from both (MAINT: Use the same exception for all bad axis requests #8584)_ureduce
detects duplicate arguments correctly - previously,np.median([1, 2], axis=(0,0))
would fail, butnp.median([1, 2], axis=(-1,-1))
andnp.median([1, 2], axis=(0,-1))
would be fine. Now all three raiseValueError
.np.gradient
now acceptsaxis=[1,2]
as well asaxis=(1,2)
. This might not be desirable, but is consistent with the majority of other functions. Now that this all goes through one place, we could deprecate it in future.np.core.numeric._validate_axis
is slightly more public atnp.core.numeric.normalize_axis_tuple
, to complementnp.core.multiarray.normalize_axis_index
operator.index
, rather than just a random subset of them.