ENH: add padding options to diff #8206

mattharrigan · 2016-10-24T12:16:49Z

Add kwargs to_begin and to_end, allowing for values to be inserted
on either end of the differences. Similiar to options for ediff1d.
Closes #8132

numpy/lib/function_base.py

shoyer · 2016-10-24T15:19:21Z

I agreed with @madphysicist -- this needs to be able to handle N-dimensional to_begin/to_end with the appropriate handles.

Ideally, this would be done with broadcasting, so you say np.diff(a, to_begin=0, axis=1) even if a.ndim > 1. np.broadcast_to should be helpful here.

Also, please make sure that this works with n>1. Right now the test coverage for higher order differentiation with to_begin/to_end feels pretty sparse.

mattharrigan · 2016-10-24T16:52:51Z

Just to make sure I understand, currently a can have arbitrary dimensions, but the suggestion is to additionally allow to_begin * to_end to have arbitrary dimensions? I think to_begin and to_end shouldn't support arbitrary dimensions, but potentially allow them to be up to a.ndim dimensional. Correct?

For a.shape = (3,4,5), axis=0, to_begin.shape = (2,4), should diff error or insert like to_begin[:, :, newaxis]? I think the axis argument really complicates broadcasting.

madphysicist · 2016-10-24T17:12:01Z

to_begin and to_end should just broadcast correctly. They should only support a limited number of dimensions. In your example of a.shape = (3, 4, 5), I would expect any of the following to work for to_begin or to_end:

scalar -> broadcast to (1, 4, 5)
1D array of shape (x,) -> broadscast to (x, 4, 5)
2D array of shape (x, 4)
3D array of shape (x, 1, 5)
3D array of shape (x, 4, 5)
I hope I did not get the broadcasting rules backwards here.

My original comment referred to the fact that the shapes are not will documented, however you choose to implement them. I expect a to have an arbitrary number of dimensions, and the endcaps to match that.

mattharrigan · 2016-10-24T17:21:37Z

I apologize if I'm showing my ignorance of broadcasting rules, but I think the second and third examples are not compatible with typical broadcasting rules, which goes from right to left with equal dimensions or one of them is equal to 1.

Basically the axis argument complicates things to were I can't think of a simple rule to reliably allow higher dimensions for to_begin and to_end. I think the critical step is how to determine the shape of the output when that is initialized. I am definitely open to suggestions though!

madphysicist · 2016-10-24T17:52:48Z

I think you are right and I am being the ignorant one here. My expectation is based more on intuition/wishful thinking in this case. However, I have a PR for a function called atleast_nd open (#7804), which might actually help in this situation.

mattharrigan · 2016-10-24T18:56:22Z

I think atleast_nd would help tremendously in this case. Thanks for pointing that out.

madphysicist · 2016-10-24T19:00:36Z

Too bad that PR is just sitting there on hold... :-)

madphysicist · 2016-10-24T19:02:33Z

Thanks for finally finding a legitimate use case for my pet function.

seberg · 2016-10-24T19:10:50Z

array(..., ndmin=blah) might be enough here ;p.

madphysicist · 2016-10-24T19:23:11Z

That's all that ndarray really does, anyway, at least in this case, where you are appending dims.

mattharrigan · 2016-10-26T13:08:37Z

The latest code has a known test failure related to returning subclasses. See this. Solutions should probably be similar. Will update once that is determined

mattharrigan · 2016-10-27T12:31:19Z

I ventured into the C internals to understand what concatenate actually does. I think the applicable code is here. Two things it does which this diff implementation does not is check the dtype and subclass/priority of all 3 potential input arrays. Here I am forcing the dtype and subclass of the output to match the primary input array "a". I think that is ok and would match a typical user's expectation. Still its probably broken for certain types of subarrays, but that is a much bigger issue. Please let me know what you think.

mhvk · 2016-10-27T20:44:30Z

numpy/lib/function_base.py

+        result = np.empty(tuple(shape), dtype=a.dtype)
+
+        # wrap ndarray subclasses
+        wrap = getattr(a, "__array_prepare__", a.__array_wrap__)


I think you should use __array_wrap__ directly here; as I understand it, __array_prepare__ is for ufuncs.

I'm hesitant to not follow what is done in other sections of the numpy code base. Unless an expert weighs in of course

@mhvk is an expert :)

But not an expert with good memory -- so do point me where else __array_prepare__ is used! It may well help me make Quantity function with something it didn't function before!!

https://github.com/numpy/numpy/blob/master/numpy/linalg/linalg.py#L106 was my inspiration

Ha! Mine were https://github.com/numpy/numpy/blob/master/numpy/core/fromnumeric.py#L42 and https://github.com/numpy/numpy/blob/master/numpy/lib/function_base.py#L4608
Some grepping suggests __array_wrap__ is used more often than __array_prepare__, but there is no obvious conclusion.

The documentation is not terribly clear -- see https://docs.scipy.org/doc/numpy-1.11.0/reference/arrays.classes.html -- but it does suggest that by default __array_prepare__ does nothing while __array_wrap__ changes the class to be that of the instance to which it is attached. I think that makes slightly more sense here. Since it also simplifies the code, I would suggest just to use a.__array_wrap__ unconditionally.

array_prepare it is

__array_wrap__ I hope you meant ;-)

sorry, copy paste error

mhvk · 2016-10-27T20:56:46Z

numpy/lib/function_base.py

+        # make to_begin a 1D array
+        if to_begin is None:
+            l_begin = 0
+        elif isinstance(to_begin, str) and to_begin == 'first':


Is there a reason not to simply write if to_begin == 'first': (i.e., omit the isinstance). Python guarantees that equality checks never fail, i.e., that this evaluates to False if to_begin is not stringy.

numpy arrays have special semantics when tested for equality, otherwise it spews user warnings

Duh, yes, of course, you get back an array of bool... Should have thought of that.

mhvk · 2016-10-27T21:03:53Z

Broader question: should first be applied repeatedly for higher-order differences (such that repeated cumsum recovered the data)?

Also, the option I would probably use myself if it were available were to_begin='extrapolate' (or something similar), which would set the first item equal to the first difference (arguably, one could then have an to_end='extrapolate' as well). Though absolutely fine not to implement those! It is not exactly much work to do to_begin=0. and then follow the call with result[0] = result[1]. (Indeed, for the same reason perhaps first should not be implemented either.)

mhvk · 2016-10-27T21:07:41Z

numpy/lib/function_base.py

+        # compute the length of the diff'd portion
+        # force length to be non negative
+        l_diff = a.shape[axis] - 1
+        if l_diff < 0:


Just write l_diff = max(a.shape[axis] - 1, 0)?

mhvk · 2016-10-27T21:11:07Z

numpy/lib/function_base.py

+        if to_end is None:
+            l_end = 0
+        else:
+            to_end = np.atleast_1d(to_end)


Do you need at least 1D here? I'd write np.asanyarray(to_end) (might as well pass subclass here too).

I think its required for scalar inputs. For example this errors out len(np.array(0))

Good point, but you can also rewrite l_end = to_end.size, which works for scalars as well. (Obviously, this doesn't matter much in the broader perspective!)

ok, I'll make that change. curiously np.asanyarray(0).ndim returns 0, so i switched the == ` to < 2

yes, scalars are zero-dimensional, but with your change, that should work fine.

mhvk · 2016-10-27T21:12:20Z

numpy/lib/function_base.py

+            if to_end.ndim == 1:
+                l_end = len(to_end)
+            else:
+                to_end = np.array(to_end, ndmin=nd)


Here, add copy=False, subok=True to avoid unnecessary copy and allow subclasses.

Potentially another use case for atleast_nd :)

I'm actually confused why the ndmin is necessary here: all those prepended ones don't matter for the broadcasting anyway.

I'll add those kwargs, I mistakenly thought that was the default.

The extra prepended ones are required for the next line, but to your point there are probably smarter ways of doing that

mhvk · 2016-10-27T21:17:02Z

numpy/lib/function_base.py

+        else:
+            to_end = np.atleast_1d(to_end)
+            if to_end.ndim == 1:
+                l_end = len(to_end)


For this case, I think you'd still need to reshape l_end so that its dimension with values is at the right axis, no? I.e.,

to_end.shape = (l_end,) + (1,) * (nd-axis-1)

broadcasting magic seems to take care of that

Are you sure that is true for a 3-d array where you take the diff on axis=1 and have a 1-dimensional to_end with more than 1 element?

a = np.zeros((5,4,3)) to_end = np.array([1,2]) a[:, 2:, :] = to_end # ValueError: could not broadcast input array from shape (2) into shape (5,2,3) to_end.shape = (2, 1) a[:, 2:, :] = to_end # works

sounds like a good test case. does this answer it sufficiently?

No, because your x.take returns a multidimensional array which will already have a consistent shape. You'd need to test with a one-dimensional array with length > 1. But it certainly is a good place to add the test! (and I must add that I admire how thoroughly your tests already were).

Good catch, you're right. I missed that test case. Should work now

numpy/lib/function_base.py

mhvk · 2016-10-27T21:22:12Z

numpy/lib/function_base.py

+        # copy values to end
+        if l_end > 0:
+            end_slice = [slice(None)] * nd
+            end_slice[axis] = slice(l_begin + l_diff, None)


Maybe just

end_slice = (slice(None),) * axis + (slice(l_begin + l_diff, None),)

(or even just put that whole expression inside the square brackets in the line below)

Or calculate end_slice in the earlier if to_end is None... else clause.

I don't think that works, the slice which isn't None must be at axis, not necessarily at the end

My suggestion puts it at axis and leaves out all the slice(None) following it (which are not required).

honestly I'm not expert enough to comment on the reasoning, but that change causes some tests to fail

Can we convert back to tuple before indexing, so as not to add more work to #4434?

mattharrigan · 2016-10-27T23:19:01Z

there were quite a few comments, thanks for the good feedback. I think I got them, but let me know if I missed some. I am holding out for atleast_nd though

numpy/lib/function_base.py

mhvk · 2016-10-28T12:34:15Z

@mattharrigan - apart from the trivial comment (and not quite understanding why my suggestion for end_slice failed...), two more general questions:

I'm not sure that treating a 1-d array as something that will just extend the result along axis is a good idea, as it breaks standard broadcasting rules. E.g., consider

np.diff([[1, 2], [4, 8]], to_begin=[1, 4])
# with your PR:
array([[1, 4, 1],
       [1, 4, 4]])
# but from regular broadcasting I would expect
array([[1, 1],
       [4, 4]])
# i.e., the same as if I did to_begin=[[1, 4]]

I think it is slightly odd to break the broadcasting expectation here, especially since the regular use case surely is just to add a single element so that one keeps the original shape. The advantage of assuming this is that you do not have to do any array shaping of to_begin and to_end (which perhaps also suggests it is the right thing to do).

As I mentioned above, I think it may be worth thinking through a little what to do with higher order differences, at least for to_begin='first'. If the goal is to ensure that with that option, it becomes the inverse of cumsum, then I think for higher order one should add multiple elements in front, i.e., for that case, the recursive call should be

return np.diff(np.diff(a, to_begin='first'), n-1, to_begin='first')

mattharrigan · 2016-10-28T12:57:27Z

should those questions be posed to the numpy mailing list?

mhvk · 2016-10-28T13:23:03Z

should those questions be posed to the numpy mailing list?

Seems sensible, so I wrote a reply to the chain you started originally.

eric-wieser · 2018-06-24T01:50:05Z

numpy/lib/function_base.py

@@ -1094,6 +1094,18 @@ def diff(a, n=1, axis=-1):
    axis : int, optional
        The axis along which the difference is taken, default is the
        last axis.
+    prepend : array_like
+        Values to prepend to the beginning of "a" along axis before


~~I think it's worth noting that this only exists as a more efficient option than np.concatenate~~

Edit: nevermind, that's not even true.

eric-wieser · 2018-06-24T01:50:32Z

numpy/lib/function_base.py

@@ -1139,6 +1151,8 @@ def diff(a, n=1, axis=-1):
    array([ 1,  2,  3, -7])
    >>> np.diff(x, n=2)
    array([  1,   1, -10])
+    >>> np.cumsum(np.diff(x, prepend=0))


Can you show the equivalent np.concatenate invocation in an example too?

mattip · 2018-06-24T21:32:44Z

Needs a rebase to fix merge conflicts with the release notes?

mattharrigan · 2018-07-08T17:20:15Z

@eric-wieser and @mattip: done

eric-wieser · 2018-07-08T17:26:49Z

numpy/lib/function_base.py

@@ -1094,6 +1094,12 @@ def diff(a, n=1, axis=-1):
    axis : int, optional
        The axis along which the difference is taken, default is the
        last axis.
+    prepend, append : array_like, optional
+        Values to prepend or append to the beginning of "a" along axis


Drop "to the beginning" - it's wrong for "append", and implied for "prepend"

eric-wieser · 2018-07-08T17:28:58Z

doc/release/1.16.0-notes.rst

@@ -26,6 +26,11 @@ Compatibility notes
 C API changes
 =============

+``np.diff`` Added kwargs prepend and append


This isn't relevant to the C API. Perhaps move it to "improvements"?

mattharrigan · 2018-07-08T17:46:48Z

@eric-wieser: done, sorry for the errors. I don't understand why there is a merge conflict on the release notes.

mattharrigan · 2018-07-13T13:02:00Z

ready to merge

mattharrigan · 2018-07-26T17:54:52Z

I would greatly appreciate a maintainer reviewing this PR and merging or commenting. Thank you

charris · 2018-08-27T20:26:10Z

@mattharrigan I fixed the merge conflict.

mattharrigan · 2018-08-30T23:49:40Z

@charris thank you.

I think it is ready to merge

shoyer · 2018-08-31T04:14:44Z

numpy/lib/function_base.py

+        performing the difference.  Scalar values are expanded to
+        arrays with length 1 in the direction of axis and the shape
+        of the input array in along all other axes.  Otherwise the
+        dimension and shape must match "a" except along axis.


Can we add a test verifying that ValueError is raised if the shape doesn't match?

shoyer · 2018-08-31T04:15:47Z

numpy/lib/function_base.py

@@ -1173,6 +1183,29 @@ def diff(a, n=1, axis=-1):
            "order must be non-negative but got " + repr(n))

    a = asanyarray(a)
+
+    combined = list()


Nit: combined = [] might be a little more idiomatic here.

mattharrigan · 2018-09-26T00:49:31Z

@shoyer updated per your comments. I believe this is ready to commit

shoyer · 2018-09-26T00:54:37Z

OK, I'm just waiting on CI to pass before merging

eric-wieser

This risks throwing an IndexError when it should throw an AxisError

eric-wieser · 2018-09-26T01:52:31Z

numpy/lib/function_base.py

+
+    if len(combined) > 1:
+        a = np.concatenate(combined, axis)
+
    nd = a.ndim
    axis = normalize_axis_index(axis, nd)


This line needs to come before your additions

can you please show me a short example? I'll make this change and also add a test

i assume its just a variation of this, from TestDiff.test_axis:
assert_raises(np.AxisError, diff, x, axis=3)
assert_raises(np.AxisError, diff, x, axis=-4)

Yep, those tests, but with your new arguments

numpy/lib/function_base.py

mattharrigan · 2018-09-26T23:15:22Z

@shoyer merge?

shoyer · 2018-09-26T23:26:25Z

thanks @mattharrigan, especially for your patience with us!

mattharrigan · 2018-09-27T17:41:10Z

thanks for all you help and patience with me too. It feels good to actually finish this.

madphysicist suggested changes Oct 24, 2016

View reviewed changes

numpy/lib/function_base.py Outdated Show resolved Hide resolved

charris added 01 - Enhancement component: numpy.lib labels Oct 26, 2016

mhvk reviewed Oct 27, 2016

View reviewed changes

numpy/lib/function_base.py Outdated Show resolved Hide resolved

mhvk reviewed Oct 27, 2016

View reviewed changes

numpy/lib/function_base.py Outdated Show resolved Hide resolved

mhvk reviewed Oct 27, 2016

View reviewed changes

mhvk reviewed Oct 28, 2016

View reviewed changes

numpy/lib/function_base.py Outdated Show resolved Hide resolved

eric-wieser reviewed Jun 24, 2018

View reviewed changes

mattharrigan force-pushed the diff-to-begin branch from edd7552 to e25adfc Compare July 8, 2018 16:58

eric-wieser reviewed Jul 8, 2018

View reviewed changes

mattharrigan force-pushed the diff-to-begin branch from e25adfc to b2824e9 Compare July 8, 2018 17:37

shoyer reviewed Aug 31, 2018

View reviewed changes

mattharrigan force-pushed the diff-to-begin branch 4 times, most recently from 3026bed to 7d29e1c Compare September 26, 2018 00:47

eric-wieser requested changes Sep 26, 2018

View reviewed changes

eric-wieser reviewed Sep 26, 2018

View reviewed changes

numpy/lib/function_base.py Show resolved Hide resolved

ENH add prepend and append kwargs to diff

8ec391a

mattharrigan force-pushed the diff-to-begin branch from 7d29e1c to 8ec391a Compare September 26, 2018 02:06

eric-wieser approved these changes Sep 26, 2018

View reviewed changes

shoyer merged commit fe1c1fb into numpy:master Sep 26, 2018

mruberry mentioned this pull request Jan 27, 2021

Implement np.diff for single order differences pytorch/pytorch#50569

Closed

ENH: add padding options to diff #8206

ENH: add padding options to diff #8206

Conversation

mattharrigan commented Oct 24, 2016

shoyer commented Oct 24, 2016

mattharrigan commented Oct 24, 2016

madphysicist commented Oct 24, 2016 • edited Loading

mattharrigan commented Oct 24, 2016 • edited Loading

madphysicist commented Oct 24, 2016

mattharrigan commented Oct 24, 2016

madphysicist commented Oct 24, 2016

madphysicist commented Oct 24, 2016

seberg commented Oct 24, 2016

madphysicist commented Oct 24, 2016 • edited Loading

mattharrigan commented Oct 26, 2016

mattharrigan commented Oct 27, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mhvk commented Oct 27, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mattharrigan commented Oct 27, 2016

mhvk commented Oct 28, 2016 • edited Loading

mattharrigan commented Oct 28, 2016

mhvk commented Oct 28, 2016

eric-wieser Jun 24, 2018 • edited Loading

Choose a reason for hiding this comment

eric-wieser Jun 24, 2018 • edited Loading

Choose a reason for hiding this comment

mattip commented Jun 24, 2018

mattharrigan commented Jul 8, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mattharrigan commented Jul 8, 2018

mattharrigan commented Jul 13, 2018

mattharrigan commented Jul 26, 2018

charris commented Aug 27, 2018

mattharrigan commented Aug 30, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mattharrigan commented Sep 26, 2018

shoyer commented Sep 26, 2018

eric-wieser left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mattharrigan Sep 26, 2018 • edited Loading

Choose a reason for hiding this comment

madphysicist commented Oct 24, 2016 •

edited

Loading

mattharrigan commented Oct 24, 2016 •

edited

Loading

madphysicist commented Oct 24, 2016 •

edited

Loading

mattharrigan commented Oct 27, 2016 •

edited

Loading

mhvk commented Oct 28, 2016 •

edited

Loading

eric-wieser Jun 24, 2018 •

edited

Loading

eric-wieser Jun 24, 2018 •

edited

Loading

mattharrigan Sep 26, 2018 •

edited

Loading