-
-
Notifications
You must be signed in to change notification settings - Fork 10.8k
variance nonzero for constant array #9631
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
FWIW, |
If you want to check for 0 variance, you should use |
I see, the statistics approach is indeed too high a price to pay for numpy. I figured there might be a reformulated numerical algorithm where loss of precision does not cause the same problem but I might be wrong there. I ran into it in a vectorised situation, where I would have noticed a bug in my code months ago if my Perhaps I'm asking too much here. FWIW, I confirmed that Matlab also gives a nonzero answer (although not the same as numpy!). |
I'm not aware of any general-purpose numerical algorithms that can give that kind of precision guarantee. I think the only ways you could guarantee exact 0 from a variance calculation are by using an infinite precision calculation, or by doing a preprocessing pass to explicitly check for all the values being equal. I guess the latter isn't even necessarily all that expensive in common cases where you can bail out after checking a few values, but it's extra complexity (esp. in the vectorized case) for a pretty specific and unusual case. I dunno. |
FWIW, it is not hard to show that Welford's algorithm [1] will give a variance that is exactly 0 when the input is constant. Numpy uses a two-pass algorithm, and the underlying problem appears to be that numpy's calculation of the mean of
[1] https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Online_algorithm |
Ah, you're quite right -- I actually checked that one before posting, but misread the details :-). Do we know how it compares to our current algorithm in terms of speed or accuracy? |
I don't know about the speed. John D. Cook blogged about the accuracy: https://www.johndcook.com/blog/2008/09/26/comparing-three-methods-of-computing-standard-deviation/ |
Related paper with accuracy and runtime comparisons of various methods: https://dl.acm.org/citation.cfm?id=3223036 |
@sk1p - thanks for the link! I think it would be great to use that one-pass algorithm. Perhaps ideally for a new |
Following up on this: a quick gufunc implementation by @0x0L of the algorithm referenced by @sk1p above correctly gives a variance of zero for the example on top. |
NumPy may give a nonzero variance (and thus standard deviation) for a constant array. This may be due to loss of numerical precision, but Pythons builtin variance routine gives the correct 0 answer, so clearly it's a preventable loss:
I think it would be highly desirable if
x.std()
andx.var()
for a constant array could be assumed to be exactly identical to zero. I'm aware one should not compare floating point numbers but zero is a bit of a special case.The text was updated successfully, but these errors were encountered: