Skip to content

ENH: gradient support for unevenly spaced data #8446

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Feb 22, 2017

Conversation

apbard
Copy link
Contributor

@apbard apbard commented Jan 4, 2017

This somehow reverts #7618 and solves #6847, #7548 by implementing support for unevenly spaced data. Now the behaviour/signature is similar to that of Matlab/Octave. As argument it can take:

  1. A single scalar to specify a sample distance for all dimensions.
  2. N scalars to specify a constant sample distance for each dimension.
    i.e. dx, dy, dz, ...
  3. N arrays to specify the coordinates of the values along each
    dimension of F. The length of the array must match the size of
    the corresponding dimension
  4. Any combination of N scalars/arrays with the meaning of 2. and 3.

this means that, e.g., you can now do the following:

>>> f = np.array([[1, 2, 6], [3, 4, 5]], dtype=np.float)
>>> dx = 2.
>>> y = [1., 1.5, 3.5]
>>> np.gradient(f, dx, y)
[array([[ 1. ,  1. , -0.5], [ 1. ,  1. , -0.5]]), 
 array([[ 2. ,  2. ,  2. ], [ 2. ,  1.7,  0.5]])]

As a side note, #8292 still does not work since np.isscalar is still used.

@charris
Copy link
Member

charris commented Jan 7, 2017

Needs an entry in the release notes.

@apbard
Copy link
Contributor Author

apbard commented Jan 7, 2017

@charris maybe silly question. It means I have to update this:
https://github.com/numpy/numpy/blob/master/doc/release/1.13.0-notes.rst ?
or the 1.12 one?

something like the post is fine?

@charris
Copy link
Member

charris commented Jan 7, 2017

The 1.13 one is for current master, so use that.

@apbard apbard force-pushed the ENH-gradient-with-uneven-spacing branch 12 times, most recently from 88c64c1 to 4350df0 Compare January 8, 2017 14:35
@apbard
Copy link
Contributor Author

apbard commented Jan 8, 2017

@charris Done. I have already rebased on master

Copy link
Member

@shoyer shoyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not super happy with overloading the arguments on np.gradient further, but this does seem like useful functionality. The main question in my mind around the API is whether it would be surprising that arguments are differences in the scalar case vs. coordinate values in the array case. Arguably, it would be more consistent to be coordinate differences in both cases. On the other hand, this is what MATLAB/Octave already does.


>>> x = np.arange(f)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be something like np.arange(f.size) (the current example is not valid)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course, thanks

uniform for axis=0 and non uniform for axis=1

>>> dx = 2.
>>> y = [1.,1.5,3.5]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

per PEP8, please add spaces after commas.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks

slice4[axis] = slice(2, None)
dx1 = dx[i][0:-1]
dx2 = dx[i][1:]
# Laplace polynomial coefficients
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you give a reference here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

dx2 = dx[i][-1]
a = (dx2) / (dx1 * (dx1 + dx2))
b = - (dx2 + dx1) / (dx1 * dx2)
c = (2. * dx2 + dx1) / (dx2 * (dx1 + dx2))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is starting to look a little repetitive. Can you maybe break this out into a few helper functions so it's easier to follow the high level logic?

Alternatively, maybe you can reuse some of the logic for the scalar case dx vs array of coordinates x? e.g., just set a, b and c different?

Copy link
Contributor Author

@apbard apbard Jan 10, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. And also optimised further...now when entering the uniform_spacing branches is ~10-30% faster than original np.gradient

np.gradient 13.1845779419
gradient_opt vector 10.7180240154
gradient_noopt vector 20.1845788956
gradient_opt scalar 10.6630008221
gradient_noopt scalar 13.2290189266

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Much better, thanks

raise ValueError("distances must be either scalars or match "
"the length of the corresponding dimension")
diffx = np.diff(varargs[0])
# if distances are constant reduce to the scalar case
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would avoid this sort of optimization if possible -- it adds more complexity to the implementation and function behavior for a questionable performance benefit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uhm, IMHO it doesn't add too much complexity and may help save a bit of memory, but ok I'll remove it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have done a couple of tests and it actually seems to provide a non-negligible speed-up:

np.gradient 0.835474014282
gradient_opt scalar 0.849312067032
gradient_noopt scalar 0.865785121918
gradient_opt vector 0.984735012054 
gradient_noopt vector 1.54027509689

I think that it is probably worth keeping it.

for i, distances in enumerate(dx):
if np.isscalar(distances):
continue
if len(distances) != f.shape[i]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Write this in a way that avoids the need to write these checks twice.

@charris
Copy link
Member

charris commented Jan 9, 2017

@shoyer My initial reaction was the same -- omit Matlab function rant here -- but I didn't see any cleaner way to do this without a new function and the overloading didn't seem too egregious.

@apbard apbard force-pushed the ENH-gradient-with-uneven-spacing branch from 4350df0 to b56a6ac Compare January 10, 2017 22:11

Notes:
For more about finite differences and their derivation you can check for instance:
D. R. Durran (1999) Numerical Methods for Wave Equations in Geophysical Fluid Dynamics. Springer
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a reference for finite difference derivation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a good online reference, e.g., on Wikipedia? That is preferable to a book if possible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer a source that seems more likely to stick around than a random wikipage, e.g., Wikipedia or a journal article that can be accessed online.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An academic paper available online that describes the method in detail would be ideal, including a description of all necessary assumptions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally agree, but I have searched a bit and while you can find tons of book/slides/lectures I haven't found much papers that provides a clear introduction/presentation for our specific case.
This provides a generic method to compute the coefficients but in our case the Taylor expansion one is enough
That's why in the end I opted for the book.
I could derive it myself and add it directly to the documentation but I don't know if it is appropriate.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the discussion on this page helpful??
http://scicomp.stackexchange.com/questions/480/how-can-i-numerically-differentiate-an-unevenly-sampled-function

If this is equivalent to a special case of the method described in the paper you link, that would probably be a better reference here. It would still be good to briefly summarize the assumptions as well (i.e., piece-wise polynomial interpolation, if I understand correctly).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added the Notes section and explained how the approximation formula for the interior points is derived. I also tried to clarify why it has a second order of accuracy in case of evenly spaced data to answer questions raised with #8605.
I have removed the references but I can add them back (maybe both the book and the paper) if you think that they are needed.

dx = list(varargs)
for i, distances in enumerate(dx):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure whether I like this version of ifs more that the previous one with duplicated checks. what do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a step in the right direction. I'll have more detailed suggestions, but let's settle on the overall API first.

diffx = np.diff(dx[i])
# if distances are constant reduce to the scalar case
if (diffx == diffx[0]).all():
diffx = diffx[0]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This optimisation leads to non negligible speedup

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you put this note in a comment in the code here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

dx2 = dx[i][-1]
a = (dx2) / (dx1 * (dx1 + dx2))
b = - (dx2 + dx1) / (dx1 * dx2)
c = (2. * dx2 + dx1) / (dx2 * (dx1 + dx2))
Copy link
Contributor Author

@apbard apbard Jan 10, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. And also optimised further...now when entering the uniform_spacing branches is ~10-30% faster than original np.gradient

np.gradient 13.1845779419
gradient_opt vector 10.7180240154
gradient_noopt vector 20.1845788956
gradient_opt scalar 10.6630008221
gradient_noopt scalar 13.2290189266

@apbard
Copy link
Contributor Author

apbard commented Jan 10, 2017

@shoyer, @charris what do you think?

@shoyer
Copy link
Member

shoyer commented Jan 10, 2017

One reason to prefer taking arguments as differences is that function signature remains consistent, e.g., np.gradients(f, dx) rather than np.gradients(f, dx) (for scalar dx) vs np.gradients(f, x) (for array x).

In any case, because this is new API, per our policy it needs to go through discussion on the numpy-discussion mailing list before being finalized. @apbard can you please send something out, with a link to this PR? I think this change (in some form) will be very welcome, but it would be helpful to discuss tradeoffs between the proposed APIs to attempt to reach concensus.

@shoyer
Copy link
Member

shoyer commented Jan 20, 2017

Maybe ping the scipy-dev list on this, too? It would be good to understand how this change would interact with the proposed scipy.diff (if at all). For context: we try to keep the more complex numerical methods in scipy, with only basic array manipulation routines in numpy.

@apbard
Copy link
Contributor Author

apbard commented Jan 20, 2017

@shoyer thanks for the suggestion. Done.
From what I read here scipy.diff aims to do much more (e.g, symbolic and automatic differentiation) and be far more generic also for what concerns finite differences. Keeping a version of gradient with basic features also in numpy might be not such a bad idea.
It seems that also in that proposal the signature for gradient supports both h as step and x as coordinates.

@apbard
Copy link
Contributor Author

apbard commented Feb 8, 2017

@shoyer @charris @josef-pkt any other thought/suggestion for this PR?

@homu
Copy link
Contributor

homu commented Feb 12, 2017

☔ The latest upstream changes (presumably 5de1a82) made this pull request unmergeable. Please resolve the merge conflicts.

@apbard
Copy link
Contributor Author

apbard commented Feb 12, 2017

Thanks @homu. I have looked at the commit you pointed out and I think that those changes (beside a typo) are not correct. I have opened a revert PR.

@apbard apbard force-pushed the ENH-gradient-with-uneven-spacing branch from eae1288 to a0dd28a Compare February 16, 2017 19:54

References
----------
.. [1] Quarteroni A., Sacco R., Saleri F. (2007) Numerical Mathematics (Texts in Applied Mathematics). New York: Springer.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The lines should be broken to keep them < 80 characters. It is OK to do that for references as long as you keep the indentation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed. thanks

@charris
Copy link
Member

charris commented Feb 16, 2017

Just another style nit. @shoyer I haven't reviewed the details, but if you are happy with this go ahead and merge.

Copy link
Member

@shoyer shoyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One nit aside, I think this looks good now. It's right at the complexity limit for what is appropriate for numpy (rather than scipy) but I think it falls on the right side of that line.

if any([not np.isscalar(dxi) for dxi in dx]):
raise ValueError("distances must be scalars")

raise SyntaxError("invalid number of arguments")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be TypeError, which is what you get when you don't supply required arguments to a function.

SyntaxError is reserved for Python itself (e.g., if you forget to close parentheses).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, thanks

@apbard apbard force-pushed the ENH-gradient-with-uneven-spacing branch 2 times, most recently from a1d23b3 to 7c64670 Compare February 16, 2017 21:27
@charris
Copy link
Member

charris commented Feb 16, 2017

@apbard I don't want to nag, but there are still a lot of long lines that need fixing, both in the references and the examples, etc. The <80 character limit was chosen for readability in terminals. You should be able to set up your editor to track that for you and/or make reformatting easy.

@apbard apbard force-pushed the ENH-gradient-with-uneven-spacing branch from 7c64670 to 45af9e1 Compare February 16, 2017 22:07
@apbard
Copy link
Contributor Author

apbard commented Feb 17, 2017

@charris you are right, sorry. They should be fixed now. I also have split several lines elsewhere in the 2 files. There are still a few lines around 82-85 characters (links or similar). If 80 is a strict limit I'll fix them too.

@charris
Copy link
Member

charris commented Feb 22, 2017

Yeah, 80 is strict. Or should be, there is old code like f2py where a fair amount to refactoring is still needed, but eventually we would like to run PEP8 on the code as part of the testing like scipy does now. That said, I think it looks nice as is and there are only five lines over that I noticed in the tests, so you could probably talk me into letting it through ;) But I would rather not start making exceptions.

@shoyer I'm happy with this apart from the style nits, merge at your pleasure.

@apbard apbard force-pushed the ENH-gradient-with-uneven-spacing branch from 45af9e1 to c0949d1 Compare February 22, 2017 09:32
This somehow reverts numpy#7618 and solves numpy#6847, numpy#7548 by implementing
support for unevenly spaced data. Now the behaviour is similar to
that of Matlab/Octave function. As argument it can take:
1. A single scalar to specify a sample distance for all dimensions.
2. N scalars to specify a constant sample distance for each dimension.
   i.e. `dx`, `dy`, `dz`, ...
3. N arrays to specify the coordinates of the values along each
   dimension of F. The length of the array must match the size of
   the corresponding dimension
4. Any combination of N scalars/arrays with the meaning of 2. and 3.
@apbard apbard force-pushed the ENH-gradient-with-uneven-spacing branch from c0949d1 to 9520de9 Compare February 22, 2017 09:51
@apbard
Copy link
Contributor Author

apbard commented Feb 22, 2017

@charris I have fixed all lines but 2 urls (in old code) and I think these are the cases contemplated by pep8

 Some other good reasons to ignore a particular guideline:

     When applying the guideline would make the code less readable, even for someone who is used to reading code that follows this PEP.   

p.s. I have still split url at line 1637 because it would exceed the limit by a huge margin.

'sturges': 20, 'auto': 20},
5000: {'fd': 33, 'scott': 33, 'rice': 69,
'sturges': 27, 'auto': 33}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a fan of this heavy indentation. Why not:

basic_test = {
    50:   {'fd': 8,  'scott': 8,  'rice': 15, 'sturges': 14,
           'auto': 14},
    500:  {'fd': 15, 'scott': 16, 'rice': 32, 'sturges': 20,
           'auto': 20},
    5000: {'fd': 33, 'scott': 33, 'rice': 69, 'sturges': 27,
           'auto': 33}
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it's just personal preference. If the point of splitting is not excessively reducing the space left before the EOL, I usually prefer to visually align to brackets since I believe it improves readability.

p.s. In my own code I would have kept the original despite the line length for readability

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alignment is actually not PEP8, but for test data it is often useful so we aren't quite as strict about it. That said, the non-aligned versions aren't as bad as one might think and one gets used to them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@charris If I am not wrong, the alignment I used IS pep8 (https://www.python.org/dev/peps/pep-0008/#id17) (it is the first one presented).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was referring to the multiple spaces after the : to align the opening brackets. There isn't an example, but aligning on = and multiple spaces are both discouraged, so by extension...

OT: I note on rereading PEP8 that breaking before binary operators is now preferred for new code. I don't recall that from before, but that would be my own preference. Unfortunately, the after recommendation remains for C code...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, I see now... sorry... I misinterpreted... you are totally right on multiple spaces.

p.s. I like the before breaking too ;)

@charris
Copy link
Member

charris commented Feb 22, 2017

There was something screwy about the travis tests, they passed but didn't notify here. I restarted them but if the problem persists I'll put this in anyway.

@charris charris merged commit 171866e into numpy:master Feb 22, 2017
@charris
Copy link
Member

charris commented Feb 22, 2017

Thanks @apbard .

@shoyer
Copy link
Member

shoyer commented Feb 22, 2017

Thanks @apbard!

@apbard
Copy link
Contributor Author

apbard commented Feb 22, 2017

"One is glad to be of service" (cit.)
thank you both (@shoyer @charris) for your suggestions and corrections ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants