-
-
Notifications
You must be signed in to change notification settings - Fork 10.8k
ENH: gradient support for unevenly spaced data #8446
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: gradient support for unevenly spaced data #8446
Conversation
Needs an entry in the release notes. |
@charris maybe silly question. It means I have to update this: something like the post is fine? |
The 1.13 one is for current master, so use that. |
88c64c1
to
4350df0
Compare
@charris Done. I have already rebased on master |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not super happy with overloading the arguments on np.gradient
further, but this does seem like useful functionality. The main question in my mind around the API is whether it would be surprising that arguments are differences in the scalar case vs. coordinate values in the array case. Arguably, it would be more consistent to be coordinate differences in both cases. On the other hand, this is what MATLAB/Octave already does.
numpy/lib/function_base.py
Outdated
|
||
>>> x = np.arange(f) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be something like np.arange(f.size)
(the current example is not valid)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Of course, thanks
numpy/lib/function_base.py
Outdated
uniform for axis=0 and non uniform for axis=1 | ||
|
||
>>> dx = 2. | ||
>>> y = [1.,1.5,3.5] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
per PEP8, please add spaces after commas.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks
numpy/lib/function_base.py
Outdated
slice4[axis] = slice(2, None) | ||
dx1 = dx[i][0:-1] | ||
dx2 = dx[i][1:] | ||
# Laplace polynomial coefficients |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you give a reference here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
dx2 = dx[i][-1] | ||
a = (dx2) / (dx1 * (dx1 + dx2)) | ||
b = - (dx2 + dx1) / (dx1 * dx2) | ||
c = (2. * dx2 + dx1) / (dx2 * (dx1 + dx2)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is starting to look a little repetitive. Can you maybe break this out into a few helper functions so it's easier to follow the high level logic?
Alternatively, maybe you can reuse some of the logic for the scalar case dx
vs array of coordinates x
? e.g., just set a
, b
and c
different?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. And also optimised further...now when entering the uniform_spacing branches is ~10-30% faster than original np.gradient
np.gradient 13.1845779419
gradient_opt vector 10.7180240154
gradient_noopt vector 20.1845788956
gradient_opt scalar 10.6630008221
gradient_noopt scalar 13.2290189266
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Much better, thanks
numpy/lib/function_base.py
Outdated
raise ValueError("distances must be either scalars or match " | ||
"the length of the corresponding dimension") | ||
diffx = np.diff(varargs[0]) | ||
# if distances are constant reduce to the scalar case |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would avoid this sort of optimization if possible -- it adds more complexity to the implementation and function behavior for a questionable performance benefit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
uhm, IMHO it doesn't add too much complexity and may help save a bit of memory, but ok I'll remove it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have done a couple of tests and it actually seems to provide a non-negligible speed-up:
np.gradient 0.835474014282
gradient_opt scalar 0.849312067032
gradient_noopt scalar 0.865785121918
gradient_opt vector 0.984735012054
gradient_noopt vector 1.54027509689
I think that it is probably worth keeping it.
numpy/lib/function_base.py
Outdated
for i, distances in enumerate(dx): | ||
if np.isscalar(distances): | ||
continue | ||
if len(distances) != f.shape[i]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Write this in a way that avoids the need to write these checks twice.
@shoyer My initial reaction was the same -- omit Matlab function rant here -- but I didn't see any cleaner way to do this without a new function and the overloading didn't seem too egregious. |
4350df0
to
b56a6ac
Compare
numpy/lib/function_base.py
Outdated
|
||
Notes: | ||
For more about finite differences and their derivation you can check for instance: | ||
D. R. Durran (1999) Numerical Methods for Wave Equations in Geophysical Fluid Dynamics. Springer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a reference for finite difference derivation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a good online reference, e.g., on Wikipedia? That is preferable to a book if possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer a source that seems more likely to stick around than a random wikipage, e.g., Wikipedia or a journal article that can be accessed online.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An academic paper available online that describes the method in detail would be ideal, including a description of all necessary assumptions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Totally agree, but I have searched a bit and while you can find tons of book/slides/lectures I haven't found much papers that provides a clear introduction/presentation for our specific case.
This provides a generic method to compute the coefficients but in our case the Taylor expansion one is enough
That's why in the end I opted for the book.
I could derive it myself and add it directly to the documentation but I don't know if it is appropriate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the discussion on this page helpful??
http://scicomp.stackexchange.com/questions/480/how-can-i-numerically-differentiate-an-unevenly-sampled-function
If this is equivalent to a special case of the method described in the paper you link, that would probably be a better reference here. It would still be good to briefly summarize the assumptions as well (i.e., piece-wise polynomial interpolation, if I understand correctly).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added the Notes section and explained how the approximation formula for the interior points is derived. I also tried to clarify why it has a second order of accuracy in case of evenly spaced data to answer questions raised with #8605.
I have removed the references but I can add them back (maybe both the book and the paper) if you think that they are needed.
dx = list(varargs) | ||
for i, distances in enumerate(dx): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure whether I like this version of ifs more that the previous one with duplicated checks. what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a step in the right direction. I'll have more detailed suggestions, but let's settle on the overall API first.
numpy/lib/function_base.py
Outdated
diffx = np.diff(dx[i]) | ||
# if distances are constant reduce to the scalar case | ||
if (diffx == diffx[0]).all(): | ||
diffx = diffx[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This optimisation leads to non negligible speedup
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you put this note in a comment in the code here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
dx2 = dx[i][-1] | ||
a = (dx2) / (dx1 * (dx1 + dx2)) | ||
b = - (dx2 + dx1) / (dx1 * dx2) | ||
c = (2. * dx2 + dx1) / (dx2 * (dx1 + dx2)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. And also optimised further...now when entering the uniform_spacing branches is ~10-30% faster than original np.gradient
np.gradient 13.1845779419
gradient_opt vector 10.7180240154
gradient_noopt vector 20.1845788956
gradient_opt scalar 10.6630008221
gradient_noopt scalar 13.2290189266
One reason to prefer taking arguments as differences is that function signature remains consistent, e.g., In any case, because this is new API, per our policy it needs to go through discussion on the numpy-discussion mailing list before being finalized. @apbard can you please send something out, with a link to this PR? I think this change (in some form) will be very welcome, but it would be helpful to discuss tradeoffs between the proposed APIs to attempt to reach concensus. |
Maybe ping the scipy-dev list on this, too? It would be good to understand how this change would interact with the proposed |
@shoyer thanks for the suggestion. Done. |
@shoyer @charris @josef-pkt any other thought/suggestion for this PR? |
☔ The latest upstream changes (presumably 5de1a82) made this pull request unmergeable. Please resolve the merge conflicts. |
Thanks @homu. I have looked at the commit you pointed out and I think that those changes (beside a typo) are not correct. I have opened a revert PR. |
eae1288
to
a0dd28a
Compare
numpy/lib/function_base.py
Outdated
|
||
References | ||
---------- | ||
.. [1] Quarteroni A., Sacco R., Saleri F. (2007) Numerical Mathematics (Texts in Applied Mathematics). New York: Springer. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The lines should be broken to keep them < 80 characters. It is OK to do that for references as long as you keep the indentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed. thanks
Just another style nit. @shoyer I haven't reviewed the details, but if you are happy with this go ahead and merge. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One nit aside, I think this looks good now. It's right at the complexity limit for what is appropriate for numpy (rather than scipy) but I think it falls on the right side of that line.
numpy/lib/function_base.py
Outdated
if any([not np.isscalar(dxi) for dxi in dx]): | ||
raise ValueError("distances must be scalars") | ||
|
||
raise SyntaxError("invalid number of arguments") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be TypeError
, which is what you get when you don't supply required arguments to a function.
SyntaxError
is reserved for Python itself (e.g., if you forget to close parentheses).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed, thanks
a1d23b3
to
7c64670
Compare
@apbard I don't want to nag, but there are still a lot of long lines that need fixing, both in the references and the examples, etc. The <80 character limit was chosen for readability in terminals. You should be able to set up your editor to track that for you and/or make reformatting easy. |
7c64670
to
45af9e1
Compare
@charris you are right, sorry. They should be fixed now. I also have split several lines elsewhere in the 2 files. There are still a few lines around 82-85 characters (links or similar). If 80 is a strict limit I'll fix them too. |
Yeah, 80 is strict. Or should be, there is old code like f2py where a fair amount to refactoring is still needed, but eventually we would like to run PEP8 on the code as part of the testing like scipy does now. That said, I think it looks nice as is and there are only five lines over that I noticed in the tests, so you could probably talk me into letting it through ;) But I would rather not start making exceptions. @shoyer I'm happy with this apart from the style nits, merge at your pleasure. |
45af9e1
to
c0949d1
Compare
This somehow reverts numpy#7618 and solves numpy#6847, numpy#7548 by implementing support for unevenly spaced data. Now the behaviour is similar to that of Matlab/Octave function. As argument it can take: 1. A single scalar to specify a sample distance for all dimensions. 2. N scalars to specify a constant sample distance for each dimension. i.e. `dx`, `dy`, `dz`, ... 3. N arrays to specify the coordinates of the values along each dimension of F. The length of the array must match the size of the corresponding dimension 4. Any combination of N scalars/arrays with the meaning of 2. and 3.
c0949d1
to
9520de9
Compare
@charris I have fixed all lines but 2 urls (in old code) and I think these are the cases contemplated by pep8
p.s. I have still split url at line 1637 because it would exceed the limit by a huge margin. |
'sturges': 20, 'auto': 20}, | ||
5000: {'fd': 33, 'scott': 33, 'rice': 69, | ||
'sturges': 27, 'auto': 33} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a fan of this heavy indentation. Why not:
basic_test = {
50: {'fd': 8, 'scott': 8, 'rice': 15, 'sturges': 14,
'auto': 14},
500: {'fd': 15, 'scott': 16, 'rice': 32, 'sturges': 20,
'auto': 20},
5000: {'fd': 33, 'scott': 33, 'rice': 69, 'sturges': 27,
'auto': 33}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe it's just personal preference. If the point of splitting is not excessively reducing the space left before the EOL, I usually prefer to visually align to brackets since I believe it improves readability.
p.s. In my own code I would have kept the original despite the line length for readability
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alignment is actually not PEP8, but for test data it is often useful so we aren't quite as strict about it. That said, the non-aligned versions aren't as bad as one might think and one gets used to them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@charris If I am not wrong, the alignment I used IS pep8 (https://www.python.org/dev/peps/pep-0008/#id17) (it is the first one presented).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was referring to the multiple spaces after the :
to align the opening brackets. There isn't an example, but aligning on =
and multiple spaces are both discouraged, so by extension...
OT: I note on rereading PEP8 that breaking before binary operators is now preferred for new code. I don't recall that from before, but that would be my own preference. Unfortunately, the after recommendation remains for C code...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, I see now... sorry... I misinterpreted... you are totally right on multiple spaces.
p.s. I like the before breaking too ;)
There was something screwy about the travis tests, they passed but didn't notify here. I restarted them but if the problem persists I'll put this in anyway. |
Thanks @apbard . |
Thanks @apbard! |
This somehow reverts #7618 and solves #6847, #7548 by implementing support for unevenly spaced data. Now the behaviour/signature is similar to that of Matlab/Octave. As argument it can take:
i.e.
dx
,dy
,dz
, ...dimension of F. The length of the array must match the size of
the corresponding dimension
this means that, e.g., you can now do the following:
As a side note, #8292 still does not work since
np.isscalar
is still used.