ENH: gradient support for unevenly spaced data #8446

apbard · 2017-01-04T12:11:49Z

This somehow reverts #7618 and solves #6847, #7548 by implementing support for unevenly spaced data. Now the behaviour/signature is similar to that of Matlab/Octave. As argument it can take:

A single scalar to specify a sample distance for all dimensions.
N scalars to specify a constant sample distance for each dimension.
i.e. dx, dy, dz, ...
N arrays to specify the coordinates of the values along each
dimension of F. The length of the array must match the size of
the corresponding dimension
Any combination of N scalars/arrays with the meaning of 2. and 3.

this means that, e.g., you can now do the following:

>>> f = np.array([[1, 2, 6], [3, 4, 5]], dtype=np.float)
>>> dx = 2.
>>> y = [1., 1.5, 3.5]
>>> np.gradient(f, dx, y)
[array([[ 1. ,  1. , -0.5], [ 1. ,  1. , -0.5]]), 
 array([[ 2. ,  2. ,  2. ], [ 2. ,  1.7,  0.5]])]

As a side note, #8292 still does not work since np.isscalar is still used.

charris · 2017-01-07T23:00:32Z

Needs an entry in the release notes.

apbard · 2017-01-07T23:03:46Z

@charris maybe silly question. It means I have to update this:
https://github.com/numpy/numpy/blob/master/doc/release/1.13.0-notes.rst ?
or the 1.12 one?

something like the post is fine?

charris · 2017-01-07T23:43:44Z

The 1.13 one is for current master, so use that.

apbard · 2017-01-08T14:46:06Z

@charris Done. I have already rebased on master

shoyer

I'm not super happy with overloading the arguments on np.gradient further, but this does seem like useful functionality. The main question in my mind around the API is whether it would be surprising that arguments are differences in the scalar case vs. coordinate values in the array case. Arguably, it would be more consistent to be coordinate differences in both cases. On the other hand, this is what MATLAB/Octave already does.

shoyer · 2017-01-09T04:02:55Z

numpy/lib/function_base.py


+    >>> x = np.arange(f)


Should be something like np.arange(f.size) (the current example is not valid)

Of course, thanks

shoyer · 2017-01-09T04:04:31Z

numpy/lib/function_base.py

+    uniform for axis=0 and non uniform for axis=1
+
+    >>> dx = 2.
+    >>> y = [1.,1.5,3.5]


per PEP8, please add spaces after commas.

shoyer · 2017-01-09T04:07:00Z

numpy/lib/function_base.py

+            slice4[axis] = slice(2, None)
+            dx1 = dx[i][0:-1]
+            dx2 = dx[i][1:]
+            # Laplace polynomial coefficients


can you give a reference here?

shoyer · 2017-01-09T04:21:23Z

numpy/lib/function_base.py

+                dx2 = dx[i][-1]
+                a = (dx2) / (dx1 * (dx1 + dx2))
+                b = - (dx2 + dx1) / (dx1 * dx2)
+                c = (2. * dx2 + dx1) / (dx2 * (dx1 + dx2))


This is starting to look a little repetitive. Can you maybe break this out into a few helper functions so it's easier to follow the high level logic?

Alternatively, maybe you can reuse some of the logic for the scalar case dx vs array of coordinates x? e.g., just set a, b and c different?

Done. And also optimised further...now when entering the uniform_spacing branches is ~10-30% faster than original np.gradient

np.gradient 13.1845779419 gradient_opt vector 10.7180240154 gradient_noopt vector 20.1845788956 gradient_opt scalar 10.6630008221 gradient_noopt scalar 13.2290189266

Much better, thanks

shoyer · 2017-01-09T04:24:39Z

numpy/lib/function_base.py

+                raise ValueError("distances must be either scalars or match "
+                                 "the length of the corresponding dimension")
+            diffx = np.diff(varargs[0])
+            # if distances are constant reduce to the scalar case


I would avoid this sort of optimization if possible -- it adds more complexity to the implementation and function behavior for a questionable performance benefit.

uhm, IMHO it doesn't add too much complexity and may help save a bit of memory, but ok I'll remove it

I have done a couple of tests and it actually seems to provide a non-negligible speed-up:

np.gradient 0.835474014282 gradient_opt scalar 0.849312067032 gradient_noopt scalar 0.865785121918 gradient_opt vector 0.984735012054 gradient_noopt vector 1.54027509689

I think that it is probably worth keeping it.

shoyer · 2017-01-09T04:26:04Z

numpy/lib/function_base.py

+        for i, distances in enumerate(dx):
+            if np.isscalar(distances):
+                continue
+            if len(distances) != f.shape[i]:


Write this in a way that avoids the need to write these checks twice.

charris · 2017-01-09T20:12:05Z

@shoyer My initial reaction was the same -- omit Matlab function rant here -- but I didn't see any cleaner way to do this without a new function and the overloading didn't seem too egregious.

apbard · 2017-01-10T22:13:12Z

numpy/lib/function_base.py

+
+    Notes:
+    For more about finite differences and their derivation you can check for instance:
+    D. R. Durran (1999) Numerical Methods for Wave Equations in Geophysical Fluid Dynamics. Springer


Added a reference for finite difference derivation.

Is there a good online reference, e.g., on Wikipedia? That is preferable to a book if possible.

this is good?
http://websrv.cs.umt.edu/isis/index.php/Finite_differencing:_Introduction

I would prefer a source that seems more likely to stick around than a random wikipage, e.g., Wikipedia or a journal article that can be accessed online.

An academic paper available online that describes the method in detail would be ideal, including a description of all necessary assumptions.

Totally agree, but I have searched a bit and while you can find tons of book/slides/lectures I haven't found much papers that provides a clear introduction/presentation for our specific case.
This provides a generic method to compute the coefficients but in our case the Taylor expansion one is enough
That's why in the end I opted for the book.
I could derive it myself and add it directly to the documentation but I don't know if it is appropriate.

Is the discussion on this page helpful??
http://scicomp.stackexchange.com/questions/480/how-can-i-numerically-differentiate-an-unevenly-sampled-function

If this is equivalent to a special case of the method described in the paper you link, that would probably be a better reference here. It would still be good to briefly summarize the assumptions as well (i.e., piece-wise polynomial interpolation, if I understand correctly).

I have added the Notes section and explained how the approximation formula for the interior points is derived. I also tried to clarify why it has a second order of accuracy in case of evenly spaced data to answer questions raised with #8605.
I have removed the references but I can add them back (maybe both the book and the paper) if you think that they are needed.

apbard · 2017-01-10T22:14:57Z

numpy/lib/function_base.py

        dx = list(varargs)
+        for i, distances in enumerate(dx):


I am not sure whether I like this version of ifs more that the previous one with duplicated checks. what do you think?

I think this is a step in the right direction. I'll have more detailed suggestions, but let's settle on the overall API first.

apbard · 2017-01-10T22:16:00Z

numpy/lib/function_base.py

+            diffx = np.diff(dx[i])
+            # if distances are constant reduce to the scalar case
+            if (diffx == diffx[0]).all(): 
+                diffx = diffx[0]


This optimisation leads to non negligible speedup

Can you put this note in a comment in the code here?

apbard · 2017-01-10T22:18:32Z

numpy/lib/function_base.py

+                dx2 = dx[i][-1]
+                a = (dx2) / (dx1 * (dx1 + dx2))
+                b = - (dx2 + dx1) / (dx1 * dx2)
+                c = (2. * dx2 + dx1) / (dx2 * (dx1 + dx2))


Done. And also optimised further...now when entering the uniform_spacing branches is ~10-30% faster than original np.gradient

np.gradient 13.1845779419 gradient_opt vector 10.7180240154 gradient_noopt vector 20.1845788956 gradient_opt scalar 10.6630008221 gradient_noopt scalar 13.2290189266

apbard · 2017-01-10T22:31:34Z

@shoyer, @charris what do you think?

shoyer · 2017-01-10T22:36:00Z

One reason to prefer taking arguments as differences is that function signature remains consistent, e.g., np.gradients(f, dx) rather than np.gradients(f, dx) (for scalar dx) vs np.gradients(f, x) (for array x).

In any case, because this is new API, per our policy it needs to go through discussion on the numpy-discussion mailing list before being finalized. @apbard can you please send something out, with a link to this PR? I think this change (in some form) will be very welcome, but it would be helpful to discuss tradeoffs between the proposed APIs to attempt to reach concensus.

shoyer · 2017-01-20T07:32:59Z

Maybe ping the scipy-dev list on this, too? It would be good to understand how this change would interact with the proposed scipy.diff (if at all). For context: we try to keep the more complex numerical methods in scipy, with only basic array manipulation routines in numpy.

apbard · 2017-01-20T10:34:27Z

@shoyer thanks for the suggestion. Done.
From what I read here scipy.diff aims to do much more (e.g, symbolic and automatic differentiation) and be far more generic also for what concerns finite differences. Keeping a version of gradient with basic features also in numpy might be not such a bad idea.
It seems that also in that proposal the signature for gradient supports both h as step and x as coordinates.

apbard · 2017-02-08T18:28:58Z

@shoyer @charris @josef-pkt any other thought/suggestion for this PR?

homu · 2017-02-12T02:50:57Z

☔ The latest upstream changes (presumably 5de1a82) made this pull request unmergeable. Please resolve the merge conflicts.

apbard · 2017-02-12T11:58:41Z

Thanks @homu. I have looked at the commit you pointed out and I think that those changes (beside a typo) are not correct. I have opened a revert PR.

charris · 2017-02-16T20:48:04Z

numpy/lib/function_base.py

+
+    References
+    ----------
+    .. [1]  Quarteroni A., Sacco R., Saleri F. (2007) Numerical Mathematics (Texts in Applied Mathematics). New York: Springer.


The lines should be broken to keep them < 80 characters. It is OK to do that for references as long as you keep the indentation.

fixed. thanks

charris · 2017-02-16T20:49:43Z

Just another style nit. @shoyer I haven't reviewed the details, but if you are happy with this go ahead and merge.

shoyer

One nit aside, I think this looks good now. It's right at the complexity limit for what is appropriate for numpy (rather than scipy) but I think it falls on the right side of that line.

shoyer · 2017-02-16T21:14:23Z

numpy/lib/function_base.py

-    if any([not np.isscalar(dxi) for dxi in dx]):
-        raise ValueError("distances must be scalars")
-
+        raise SyntaxError("invalid number of arguments")


This should be TypeError, which is what you get when you don't supply required arguments to a function.

SyntaxError is reserved for Python itself (e.g., if you forget to close parentheses).

Fixed, thanks

charris · 2017-02-16T21:45:45Z

@apbard I don't want to nag, but there are still a lot of long lines that need fixing, both in the references and the examples, etc. The <80 character limit was chosen for readability in terminals. You should be able to set up your editor to track that for you and/or make reformatting easy.

apbard · 2017-02-17T08:35:39Z

@charris you are right, sorry. They should be fixed now. I also have split several lines elsewhere in the 2 files. There are still a few lines around 82-85 characters (links or similar). If 80 is a strict limit I'll fix them too.

charris · 2017-02-22T02:22:28Z

Yeah, 80 is strict. Or should be, there is old code like f2py where a fair amount to refactoring is still needed, but eventually we would like to run PEP8 on the code as part of the testing like scipy does now. That said, I think it looks nice as is and there are only five lines over that I noticed in the tests, so you could probably talk me into letting it through ;) But I would rather not start making exceptions.

@shoyer I'm happy with this apart from the style nits, merge at your pleasure.

This somehow reverts numpy#7618 and solves numpy#6847, numpy#7548 by implementing support for unevenly spaced data. Now the behaviour is similar to that of Matlab/Octave function. As argument it can take: 1. A single scalar to specify a sample distance for all dimensions. 2. N scalars to specify a constant sample distance for each dimension. i.e. `dx`, `dy`, `dz`, ... 3. N arrays to specify the coordinates of the values along each dimension of F. The length of the array must match the size of the corresponding dimension 4. Any combination of N scalars/arrays with the meaning of 2. and 3.

apbard · 2017-02-22T10:13:41Z

@charris I have fixed all lines but 2 urls (in old code) and I think these are the cases contemplated by pep8

 Some other good reasons to ignore a particular guideline:

     When applying the guideline would make the code less readable, even for someone who is used to reading code that follows this PEP.

p.s. I have still split url at line 1637 because it would exceed the limit by a huge margin.

eric-wieser · 2017-02-22T10:34:53Z

numpy/lib/tests/test_function_base.py

+                             'sturges': 20, 'auto': 20},
+                      5000: {'fd': 33, 'scott': 33, 'rice': 69,
+                             'sturges': 27, 'auto': 33}
+                     }


Not a fan of this heavy indentation. Why not:

basic_test = { 50: {'fd': 8, 'scott': 8, 'rice': 15, 'sturges': 14, 'auto': 14}, 500: {'fd': 15, 'scott': 16, 'rice': 32, 'sturges': 20, 'auto': 20}, 5000: {'fd': 33, 'scott': 33, 'rice': 69, 'sturges': 27, 'auto': 33} }

I believe it's just personal preference. If the point of splitting is not excessively reducing the space left before the EOL, I usually prefer to visually align to brackets since I believe it improves readability.

p.s. In my own code I would have kept the original despite the line length for readability

Alignment is actually not PEP8, but for test data it is often useful so we aren't quite as strict about it. That said, the non-aligned versions aren't as bad as one might think and one gets used to them.

@charris If I am not wrong, the alignment I used IS pep8 (https://www.python.org/dev/peps/pep-0008/#id17) (it is the first one presented).

I was referring to the multiple spaces after the : to align the opening brackets. There isn't an example, but aligning on = and multiple spaces are both discouraged, so by extension...

OT: I note on rereading PEP8 that breaking before binary operators is now preferred for new code. I don't recall that from before, but that would be my own preference. Unfortunately, the after recommendation remains for C code...

oh, I see now... sorry... I misinterpreted... you are totally right on multiple spaces.

p.s. I like the before breaking too ;)

charris · 2017-02-22T15:39:16Z

There was something screwy about the travis tests, they passed but didn't notify here. I restarted them but if the problem persists I'll put this in anyway.

charris · 2017-02-22T16:14:47Z

Thanks @apbard .

shoyer · 2017-02-22T16:57:08Z

Thanks @apbard!

apbard · 2017-02-22T17:24:20Z

"One is glad to be of service" (cit.)
thank you both (@shoyer @charris) for your suggestions and corrections ;)

charris added 01 - Enhancement component: numpy.lib labels Jan 7, 2017

charris mentioned this pull request Jan 7, 2017

gradient() no longer accepts scalar-like wrappers for dx #8292

Closed

apbard force-pushed the ENH-gradient-with-uneven-spacing branch 12 times, most recently from 88c64c1 to 4350df0 Compare January 8, 2017 14:35

shoyer reviewed Jan 9, 2017

View reviewed changes

apbard force-pushed the ENH-gradient-with-uneven-spacing branch from 4350df0 to b56a6ac Compare January 10, 2017 22:11

apbard commented Jan 10, 2017

View reviewed changes

apbard force-pushed the ENH-gradient-with-uneven-spacing branch from eae1288 to a0dd28a Compare February 16, 2017 19:54

charris reviewed Feb 16, 2017

View reviewed changes

shoyer reviewed Feb 16, 2017

View reviewed changes

apbard force-pushed the ENH-gradient-with-uneven-spacing branch 2 times, most recently from a1d23b3 to 7c64670 Compare February 16, 2017 21:27

apbard force-pushed the ENH-gradient-with-uneven-spacing branch from 7c64670 to 45af9e1 Compare February 16, 2017 22:07

apbard force-pushed the ENH-gradient-with-uneven-spacing branch from 45af9e1 to c0949d1 Compare February 22, 2017 09:32

apbard added 2 commits February 22, 2017 10:50

DOC: gradient supports unevenly spaced data

f6a0757

apbard force-pushed the ENH-gradient-with-uneven-spacing branch from c0949d1 to 9520de9 Compare February 22, 2017 09:51

eric-wieser reviewed Feb 22, 2017

View reviewed changes

charris merged commit 171866e into numpy:master Feb 22, 2017

This was referenced Feb 22, 2017

Adding isin function for multidimensional arrays #8423

Merged

ENH: added functionality nancov to numpy #5698

Closed

Add out of bound check for as_strided #8315

Closed

apbard deleted the ENH-gradient-with-uneven-spacing branch February 22, 2017 17:26

evanmason mentioned this pull request Jul 12, 2017

gradient differences between v1.11.0 and v1.13.0 #9401

Open

This was referenced Jul 13, 2017

BUG: various fixes to np.gradient #9408

Merged

ENH: add ND generalizations to np.gradient #9409

Open

ahaldane mentioned this pull request Feb 11, 2018

DOC: Fix incorrect formula in gradient docstring. #10547

Merged

ahaldane mentioned this pull request Mar 7, 2018

DOC: Grammar of np.gradient docstring #10699

Merged

ENH: gradient support for unevenly spaced data #8446

ENH: gradient support for unevenly spaced data #8446

Conversation

apbard commented Jan 4, 2017 • edited Loading

charris commented Jan 7, 2017

apbard commented Jan 7, 2017

charris commented Jan 7, 2017

apbard commented Jan 8, 2017

shoyer left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

apbard Jan 10, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

charris commented Jan 9, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

apbard Jan 10, 2017 • edited Loading

Choose a reason for hiding this comment

apbard commented Jan 10, 2017

shoyer commented Jan 10, 2017

shoyer commented Jan 20, 2017

apbard commented Jan 20, 2017

apbard commented Feb 8, 2017

homu commented Feb 12, 2017

apbard commented Feb 12, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

charris commented Feb 16, 2017

shoyer left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

charris commented Feb 16, 2017

apbard commented Feb 17, 2017

charris commented Feb 22, 2017

apbard commented Feb 22, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

charris commented Feb 22, 2017

charris commented Feb 22, 2017

shoyer commented Feb 22, 2017

apbard commented Feb 22, 2017

apbard commented Jan 4, 2017 •

edited

Loading

apbard Jan 10, 2017 •

edited

Loading

apbard Jan 10, 2017 •

edited

Loading