-
-
Notifications
You must be signed in to change notification settings - Fork 10.8k
MAINT: cleanup np.average #7382
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -898,15 +898,14 @@ def average(a, axis=None, weights=None, returned=False): | |||
TypeError: Axis must be specified when shapes of a and weights differ. | |||
|
|||
""" | |||
if not isinstance(a, np.matrix): | |||
a = np.asarray(a) | |||
a = np.asanyarray(a) | |||
|
|||
if weights is None: | |||
avg = a.mean(axis) | |||
scl = avg.dtype.type(a.size/avg.size) | |||
else: | |||
a = a + 0.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I never liked lines like these much. I think the purpose is just to ensure a
is float. How about
a = np.asanyarray(a, dtype=np.result_dtype(a, 0.))
This at least doesn't make a copy unless it is needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can probably skip the whole asanyarray
call here, and just use the dtype argument later in the ufunc call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
About the asanyarray. Do you think that weights is likely to have a unit or so? Because if we make it asanyarray, the weights would also have the same priority at deciding the output array type. Just wanted to point it out that sometimes we may want to not make give a secondary argument as much influence as the primary one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, weights can definitely have units, e.g., if it is the inverse of the measurement error squared.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using the dtype argument seems a much better idea indeed -- but needs a bit of care in the same usage for weight below; it may be easiest to just precalculate the result dtype with result_dtype = np.result_type(a, 0.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I meant was, that you can just add a 0. to the line below: scl = wgt.sum(axis=axis, dtype=np.result_type(a.dtype, wgt.dtype, 0.))
to the same effect probably
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like a = np.asanyarray(a, dtype=np.result_dtype(a, 'f8'))
because I am worried about the behavior of the np.multiply(a, wgt)
call. If both a
and wgt
happen to be integer arrays we have to worry about overflow etc.
I like f8
because that is what np.mean
explicitly casts to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, ok, that forces it to at least double precision, having the same behaviour of upcasting as np.mean makes sense in any case I guess. Though again, i think you can also just plug the casting into that later result_type call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, right. Upcast to double makes sense for things like short integers, which is why it upcasts those explicitly.
Note that this PR would supersede gh-5551. |
cef50df
to
d2b0df6
Compare
Updated based on comments. For the casting of The behavior of |
d2b0df6
to
d4c9030
Compare
@ahaldane - this looks good! I did a quick try and it works with |
LGTM on first read through. |
Should add some tests for the preservation of subclasses. |
wgt = np.asarray(weights) | ||
wgt = np.asanyarray(weights) | ||
|
||
if issubclass(a.dtype.type, (np.integer, np.bool_)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be an or
or and
with the same for wgt? Hmmm, actually, if, which one, hmmm :), I guess or?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the type of wgt
doesn't matter. The current code does as follows for the four combinations to think about (float
means f4
or f8
):
a wgt result_dtype
--- --- ---
int int f8
int float f8
float int f8
float float2 biggest(float, float2)
I suppose in the 2nd line we could cast to whatever wgt's floating type was (f4 or f8) but I'm not sure whether that's helpful or unnecessarily complicated.
(Edit: Actually the 2nd line will give back f8
no matter what we do, since int
+f4
coerces to f8
anyway)
e11fefe
to
6f6c40c
Compare
6f6c40c
to
5ceab8f
Compare
Updated with tests. |
a = np.array([[1,2],[3,4]]).view(subclass) | ||
w = np.array([[1,2],[3,4]]).view(subclass) | ||
|
||
assert_equal(type(np.average(a, weights=w)), subclass) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should that not be better as assert_(type(np.average(a, weights=w)) is subclass)
?
EDIT: Although that only works for new style classes. Hmm, assert_equal does seem to work properly there.
EDIT: But not for old style classes either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not clear to me what assert_equal
does for types, so maybe is ok, if not obvious.
Thanks Allan. |
MAIN: fix to #7382, make scl in np.average writeable
These changes to
np.average
were suggested by @mhvk in #5706.Rather than only make the changes in
np.ma.average
, in this PR I'd like to first add them tonp.average
, so that we end up with a situation wherenp.ma.average
andnp.average
are more or less copies.