Skip to content

MAINT: cleanup np.average #7382

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Mar 7, 2016
Merged

Conversation

ahaldane
Copy link
Member

@ahaldane ahaldane commented Mar 5, 2016

These changes to np.average were suggested by @mhvk in #5706.

Rather than only make the changes in np.ma.average, in this PR I'd like to first add them to np.average, so that we end up with a situation where np.ma.average and np.average are more or less copies.

@@ -898,15 +898,14 @@ def average(a, axis=None, weights=None, returned=False):
TypeError: Axis must be specified when shapes of a and weights differ.

"""
if not isinstance(a, np.matrix):
a = np.asarray(a)
a = np.asanyarray(a)

if weights is None:
avg = a.mean(axis)
scl = avg.dtype.type(a.size/avg.size)
else:
a = a + 0.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I never liked lines like these much. I think the purpose is just to ensure a is float. How about

a  = np.asanyarray(a, dtype=np.result_dtype(a, 0.))

This at least doesn't make a copy unless it is needed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can probably skip the whole asanyarray call here, and just use the dtype argument later in the ufunc call.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

About the asanyarray. Do you think that weights is likely to have a unit or so? Because if we make it asanyarray, the weights would also have the same priority at deciding the output array type. Just wanted to point it out that sometimes we may want to not make give a secondary argument as much influence as the primary one.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, weights can definitely have units, e.g., if it is the inverse of the measurement error squared.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using the dtype argument seems a much better idea indeed -- but needs a bit of care in the same usage for weight below; it may be easiest to just precalculate the result dtype with result_dtype = np.result_type(a, 0.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I meant was, that you can just add a 0. to the line below: scl = wgt.sum(axis=axis, dtype=np.result_type(a.dtype, wgt.dtype, 0.)) to the same effect probably

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like a = np.asanyarray(a, dtype=np.result_dtype(a, 'f8')) because I am worried about the behavior of the np.multiply(a, wgt) call. If both a and wgt happen to be integer arrays we have to worry about overflow etc.

I like f8 because that is what np.mean explicitly casts to.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, ok, that forces it to at least double precision, having the same behaviour of upcasting as np.mean makes sense in any case I guess. Though again, i think you can also just plug the casting into that later result_type call.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, right. Upcast to double makes sense for things like short integers, which is why it upcasts those explicitly.

@seberg
Copy link
Member

seberg commented Mar 6, 2016

Note that this PR would supersede gh-5551.

@ahaldane ahaldane force-pushed the tidy_average_median branch 3 times, most recently from cef50df to d2b0df6 Compare March 6, 2016 17:35
@ahaldane
Copy link
Member Author

ahaldane commented Mar 6, 2016

Updated based on comments.

For the casting of a, in this update I decided to do something like what np.mean does: Only upcast to f8 for integer types. This way if a is f4 we get f4 back.

The behavior of np.average is slightly different from np.mean because np.average doesn't have dtype, out or keepdims keyword args.

@ahaldane ahaldane force-pushed the tidy_average_median branch from d2b0df6 to d4c9030 Compare March 6, 2016 19:07
@mhvk
Copy link
Contributor

mhvk commented Mar 6, 2016

@ahaldane - this looks good! I did a quick try and it works with Quantity (both for data and weight).

@charris
Copy link
Member

charris commented Mar 7, 2016

LGTM on first read through.

@charris
Copy link
Member

charris commented Mar 7, 2016

Should add some tests for the preservation of subclasses.

wgt = np.asarray(weights)
wgt = np.asanyarray(weights)

if issubclass(a.dtype.type, (np.integer, np.bool_)):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be an or or and with the same for wgt? Hmmm, actually, if, which one, hmmm :), I guess or?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the type of wgt doesn't matter. The current code does as follows for the four combinations to think about (float means f4 or f8):

a     wgt        result_dtype
---   ---        ---
int   int        f8
int   float      f8
float int        f8
float float2     biggest(float, float2)

I suppose in the 2nd line we could cast to whatever wgt's floating type was (f4 or f8) but I'm not sure whether that's helpful or unnecessarily complicated.

(Edit: Actually the 2nd line will give back f8 no matter what we do, since int+f4 coerces to f8 anyway)

@ahaldane ahaldane force-pushed the tidy_average_median branch 2 times, most recently from e11fefe to 6f6c40c Compare March 7, 2016 17:36
@ahaldane ahaldane force-pushed the tidy_average_median branch from 6f6c40c to 5ceab8f Compare March 7, 2016 17:41
@ahaldane
Copy link
Member Author

ahaldane commented Mar 7, 2016

Updated with tests.

a = np.array([[1,2],[3,4]]).view(subclass)
w = np.array([[1,2],[3,4]]).view(subclass)

assert_equal(type(np.average(a, weights=w)), subclass)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should that not be better as assert_(type(np.average(a, weights=w)) is subclass)?

EDIT: Although that only works for new style classes. Hmm, assert_equal does seem to work properly there.
EDIT: But not for old style classes either.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not clear to me what assert_equal does for types, so maybe is ok, if not obvious.

charris added a commit that referenced this pull request Mar 7, 2016
@charris charris merged commit 10caf75 into numpy:master Mar 7, 2016
@charris
Copy link
Member

charris commented Mar 7, 2016

Thanks Allan.

@ahaldane
Copy link
Member Author

ahaldane commented Mar 7, 2016

Thanks for reviews @charris @seberg @mhvk

@eric-wieser
Copy link
Member

@seberg: Can we close gh-5551 then?

@ahaldane ahaldane deleted the tidy_average_median branch January 17, 2018 23:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants