MAINT: cleanup np.average #7382

ahaldane · 2016-03-05T21:37:04Z

These changes to np.average were suggested by @mhvk in #5706.

Rather than only make the changes in np.ma.average, in this PR I'd like to first add them to np.average, so that we end up with a situation where np.ma.average and np.average are more or less copies.

mhvk · 2016-03-05T23:32:16Z

numpy/lib/function_base.py

@@ -898,15 +898,14 @@ def average(a, axis=None, weights=None, returned=False):
    TypeError: Axis must be specified when shapes of a and weights differ.

    """
-    if not isinstance(a, np.matrix):
-        a = np.asarray(a)
+    a = np.asanyarray(a)

    if weights is None:
        avg = a.mean(axis)
        scl = avg.dtype.type(a.size/avg.size)
    else:
        a = a + 0.0


I never liked lines like these much. I think the purpose is just to ensure a is float. How about

a = np.asanyarray(a, dtype=np.result_dtype(a, 0.))

This at least doesn't make a copy unless it is needed.

You can probably skip the whole asanyarray call here, and just use the dtype argument later in the ufunc call.

About the asanyarray. Do you think that weights is likely to have a unit or so? Because if we make it asanyarray, the weights would also have the same priority at deciding the output array type. Just wanted to point it out that sometimes we may want to not make give a secondary argument as much influence as the primary one.

Yes, weights can definitely have units, e.g., if it is the inverse of the measurement error squared.

Using the dtype argument seems a much better idea indeed -- but needs a bit of care in the same usage for weight below; it may be easiest to just precalculate the result dtype with result_dtype = np.result_type(a, 0.)

What I meant was, that you can just add a 0. to the line below: scl = wgt.sum(axis=axis, dtype=np.result_type(a.dtype, wgt.dtype, 0.)) to the same effect probably

I like a = np.asanyarray(a, dtype=np.result_dtype(a, 'f8')) because I am worried about the behavior of the np.multiply(a, wgt) call. If both a and wgt happen to be integer arrays we have to worry about overflow etc.

I like f8 because that is what np.mean explicitly casts to.

Oh, ok, that forces it to at least double precision, having the same behaviour of upcasting as np.mean makes sense in any case I guess. Though again, i think you can also just plug the casting into that later result_type call.

Ah, right. Upcast to double makes sense for things like short integers, which is why it upcasts those explicitly.

seberg · 2016-03-06T13:49:58Z

Note that this PR would supersede gh-5551.

ahaldane · 2016-03-06T17:37:18Z

Updated based on comments.

For the casting of a, in this update I decided to do something like what np.mean does: Only upcast to f8 for integer types. This way if a is f4 we get f4 back.

The behavior of np.average is slightly different from np.mean because np.average doesn't have dtype, out or keepdims keyword args.

mhvk · 2016-03-06T19:34:20Z

@ahaldane - this looks good! I did a quick try and it works with Quantity (both for data and weight).

charris · 2016-03-07T03:52:51Z

LGTM on first read through.

charris · 2016-03-07T03:55:58Z

Should add some tests for the preservation of subclasses.

seberg · 2016-03-07T09:49:59Z

numpy/lib/function_base.py

-        wgt = np.asarray(weights)
+        wgt = np.asanyarray(weights)
+
+        if issubclass(a.dtype.type, (np.integer, np.bool_)):


Should this be an or or and with the same for wgt? Hmmm, actually, if, which one, hmmm :), I guess or?

I think the type of wgt doesn't matter. The current code does as follows for the four combinations to think about (float means f4 or f8):

a wgt result_dtype --- --- --- int int f8 int float f8 float int f8 float float2 biggest(float, float2)

I suppose in the 2nd line we could cast to whatever wgt's floating type was (f4 or f8) but I'm not sure whether that's helpful or unnecessarily complicated.

(Edit: Actually the 2nd line will give back f8 no matter what we do, since int+f4 coerces to f8 anyway)

ahaldane · 2016-03-07T18:50:49Z

Updated with tests.

charris · 2016-03-07T20:00:00Z

numpy/lib/tests/test_function_base.py

+        a = np.array([[1,2],[3,4]]).view(subclass)
+        w = np.array([[1,2],[3,4]]).view(subclass)
+
+        assert_equal(type(np.average(a, weights=w)), subclass)


Should that not be better as assert_(type(np.average(a, weights=w)) is subclass)?

EDIT: Although that only works for new style classes. Hmm, assert_equal does seem to work properly there.
EDIT: But not for old style classes either.

It's not clear to me what assert_equal does for types, so maybe is ok, if not obvious.

MAINT: cleanup np.average

charris · 2016-03-07T20:19:46Z

Thanks Allan.

ahaldane · 2016-03-07T20:41:32Z

Thanks for reviews @charris @seberg @mhvk

MAIN: fix to #7382, make scl in np.average writeable

eric-wieser · 2017-11-29T17:50:02Z

@seberg: Can we close gh-5551 then?

mhvk reviewed Mar 5, 2016
View reviewed changes

seberg added component: numpy.lib 03 - Maintenance labels Mar 6, 2016

ahaldane force-pushed the tidy_average_median branch 3 times, most recently from cef50df to d2b0df6 Compare March 6, 2016 17:35

ahaldane force-pushed the tidy_average_median branch from d2b0df6 to d4c9030 Compare March 6, 2016 19:07

seberg reviewed Mar 7, 2016
View reviewed changes

ahaldane force-pushed the tidy_average_median branch 2 times, most recently from e11fefe to 6f6c40c Compare March 7, 2016 17:36

MAINT: cleanup np.average

5ceab8f

ahaldane force-pushed the tidy_average_median branch from 6f6c40c to 5ceab8f Compare March 7, 2016 17:41

charris reviewed Mar 7, 2016
View reviewed changes

charris added a commit that referenced this pull request Mar 7, 2016

Merge pull request #7382 from ahaldane/tidy_average_median

10caf75

MAINT: cleanup np.average

charris merged commit 10caf75 into numpy:master Mar 7, 2016

homu mentioned this pull request Mar 7, 2016

ENH: Increase numpy.average performance mentioned in #5507. #5551

Closed

ahaldane mentioned this pull request Mar 19, 2016

MAINT: FutureWarning for changes to np.average subclass handling #7433

Merged

ahaldane mentioned this pull request Apr 4, 2016

ENH: make some masked array methods behave more like ndarray methods #5706

Merged

ahaldane added a commit to ahaldane/numpy that referenced this pull request Apr 4, 2016

MAIN: fix to numpy#7382, make scl in np.average writeable

b740018

ahaldane mentioned this pull request Apr 4, 2016

MAIN: fix to #7382, make scl in np.average writeable #7505

Merged

charris added a commit that referenced this pull request Apr 5, 2016

Merge pull request #7505 from ahaldane/fixup_7382

9c59de6

MAIN: fix to #7382, make scl in np.average writeable

mhvk mentioned this pull request Nov 18, 2016

MAINT: let average preserve subclass information. #8290

Merged

ahaldane deleted the tidy_average_median branch January 17, 2018 23:40

skuschel mentioned this pull request Jan 29, 2019

Performance of numpy average and numpy.mean function #5507

Closed

Uh oh!

MAINT: cleanup np.average #7382

MAINT: cleanup np.average #7382

Uh oh!

Conversation

ahaldane commented Mar 5, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

seberg commented Mar 6, 2016

Uh oh!

ahaldane commented Mar 6, 2016

Uh oh!

mhvk commented Mar 6, 2016

Uh oh!

charris commented Mar 7, 2016

Uh oh!

charris commented Mar 7, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ahaldane commented Mar 7, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

charris commented Mar 7, 2016

Uh oh!

ahaldane commented Mar 7, 2016

Uh oh!

eric-wieser commented Nov 29, 2017

Uh oh!

Uh oh!