-
Notifications
You must be signed in to change notification settings - Fork 53
RFC: add support for closeness comparison #170
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
👍 for the adoption of symmetric proximity comparator.
Can you indicate where this might be useful? PEP485 stipulates "NaN is not considered close to any other value, including NaN." |
True, and I haven't encountered a use case for this yet. I just wanted to add it for completeness. I'm only familiar with the PyTorch test suite and there it is only used in very few cases. I'll try to dig up why it is necessary there. Plus, I try to dig up why it was added to NumPy in the first place. I believe all the other libraries simply adopted the flag. |
Thanks @pmeier!
For library code though, it's typically the opposite - you want the regular IEEE 754 behavior where That does of course make it a little questionable to have |
Good point. Since boolean operations and matches = np.isclose(a, b, equal_nan=False)
if equal_nan:
matches |= np.isnan(a) & np.isnan(b) |
I think there's a discussion elsewhere about nanfunctions (
>>> x1 = np.array([1.5, 2.5, np.nan])
>>> x2 = x1
>>> np.equal(x1, x1)
array([ True, True, False])
>>> np.equal(x1, x1, where=~np.isnan(x1))
array([ True, True, True])
>>> np.sum(x1, where=~np.isnan(x1))
4.0
>>> np.sum(x1)
nan Most other libraries do not support a And as you show, it's easy enough to work around for now. |
I think |
Sent too fast: If we remove |
We had a discussion about this today. The most important point of that was: while the PEP 485 definition is preferred overall if we would design this API today, all libraries use the Overall there was agreement that these are important functions to have, and we should add them. Other points regarding the definition differences:
tl;dr see if there's a way to introduce the "nicer" behavior. |
True, and unfortunately I can't offer one apart from "we would be compatible with the standard library" and "it is objectively better". From my own perspective (and we should definitively hear other opinions here), the only times I fiddled with the default tolerances is when I knew that I need to expect more deviation than normal. In these cases I didn't really care about the actual values as long as they where "low enough" and I've manually checked that it was fine. I imagine the libraries are using there
Trying to look at this without bias, the name "close" does not imply asymmetric behavior for me. It is in the same category as "equal" and not "greater then" or the like. From that perspective the current implementation can be confusing: import numpy as np
a = 1
b = 10
rtol = 1
atol = 0
np.isclose(a, b, rtol=rtol, atol=atol) # True
np.isclose(b, a, rtol=rtol, atol=atol) # False The only practical difference between the two implementations happens when |
xref numpy/numpy#10161. Changing the existing definition looks like a nonstarter. And I haven't been able to come up with a good introduction strategy. In the linked numpy issue, even the PEP author says he would himself have preferred compatibility rather than do something incompatible for the sake of what is objectively best. "But is it "better" enough to break backward compatibility? I don't think so." It looks like this has been very extensively litigated. We should stay with the existing definition that NumPy and other libraries have. |
To investigate the impact of the proposed change, I've implemented it for
IMO, this means two things:
|
Cool, thanks for looking into this @pmeier. Just wanna clarify: your patch used the |
Yes, here are the details: I've used
I've run this against the test suites of |
As this proposal lacks consensus for the best path forward (e.g., (1) standardizing what many consider to be a less-than-optimal closeness definition as implemented by NumPy or (2) moving the ecosystem to a more desirable definition), I'll go ahead and close this issue. Given that there's general agreement that the NumPy definition is not ideal, I'm personally leery about codifying such behavior in the specification and thus forcing every future array library adopting the standard to adopt. IMO, libraries should be free to innovate here and implement more ideal APIs and behavior. |
Due to the limitations of floating point arithmetic, comparing floating point values for bitwise equality is only required in very few situations. In usual sitatuations, for example comparing the output of a function against an expected result, it has thus become best practice to compare the values for closeness rather than equality. Python added built-in support for closeness comparisons (
math.isclose
) with PEP485 which was introduced in Python 3.5.With this I'm proposing to add an elementwise
isclose
operator:Similar to
equal
,x1
andx2
as well as the return value are arrays. The returned array will be of typebool
.The relative tolerance
rtol
and absolute toleranceatol
should have default values which are discussed below.Status quo
All actively considered libraries already at least partially support closeness comparisons. In addition to the elementwise
isclose
operation, usually alsoallclose
is defined. Sinceallclose(a, b) == all(isclose(a, b))
andall
is already part of the standard, I don't think addingallclose
is helpful. Otherwise, we would also need to considerallequal
and so on.isclose
allclose
numpy.isclose
numpy.allclose
tensorflow.experimental.numpy.isclose
tensorflow.experimental.numpy.allclose
torch.isclose
torch.allclose
mxnet.contrib.ndarray.allclose
jax.numpy.isclose
jax.numpy.allclose
dask.array.isclose
dask.array.allclose
cupy.isclose
cupy.allclose
Closeness definition
All the libraries above define closeness like this:
PEP485 states about this:
math.isclose
overcomes this and additionally is symmetric:Thus, in addition to adding the
isclose
operator, I think it should stick to the objectively better definition ofmath.isclose
.Non-finite numbers
In addition to finite numbers, the standard should also define how non-finite numbers (
NaN
,inf
, and-inf
) are to be handled. Again, I propose to stick to the rationale of PEP485, which in turn is based on IEEE 754:NaN
is never close to anything. All library implementations add aequal_nan: bool = False
flag to the functions. IfTrue
twoNaN
values are considered close. Still, comparison between any other value and aNaN
is never considered close.inf
, and-inf
are only close to themselves.Default tolerances
In addition to fixed default values (
math.isclose
:rtol=1e-9, atol=0.0
, all libraries:rtol=1e-5, atol=1e-8
) the default tolerances could also be varied by the promoted dtype. For example, arrays of dtypefloat64
could use stricter default tolerances asfloat32
.For integer dtypes, I propose using
rtol = atol = 0.0
which would be identical to comparing them for equality. For floating point dtypes I would use the rationale of PEP485 as base:rtol
: Approximately half the precision of the promoted dtypeatol
:0.0
The text was updated successfully, but these errors were encountered: