Skip to content

BUG: Incorrect handling of not-equal comparison to nan #21685

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
WarrenWeckesser opened this issue Jun 7, 2022 · 10 comments · Fixed by #21687
Closed

BUG: Incorrect handling of not-equal comparison to nan #21685

WarrenWeckesser opened this issue Jun 7, 2022 · 10 comments · Fixed by #21687
Labels
00 - Bug component: SIMD Issues in SIMD (fast instruction sets) code or machinery

Comments

@WarrenWeckesser
Copy link
Member

WarrenWeckesser commented Jun 7, 2022

Describe the issue:

For a sufficiently large array x, the comparison x != 0 returns False for elements of x that are nan. The result should be True. For example,

In [1]: import numpy as np

In [2]: x = np.zeros((4, 8))

In [3]: x[0,0] = np.nan

In [4]: x != 0  # The first element of the result should be True
Out[4]: 
array([[False, False, False, False, False, False, False, False],
       [False, False, False, False, False, False, False, False],
       [False, False, False, False, False, False, False, False],
       [False, False, False, False, False, False, False, False]])

In [5]: x[:2] != 0  # The result with a smaller array works as expected.
Out[5]: 
array([[ True, False, False, False, False, False, False, False],
       [False, False, False, False, False, False, False, False]])

The bug appears to have been introduced in gh-21483.

NumPy/Python version information:

In [6]: import sys, numpy; print(numpy.__version__, sys.version)
1.24.0.dev0+136.g15b92f7ab 3.10.1 (main, Jan 14 2022, 02:27:20) [GCC 11.2.0]
@seberg
Copy link
Member

seberg commented Jun 7, 2022

Ping @rafaelcfsousa, could you have a look, since it was likely your PR that introduced this?

@rafaelcfsousa
Copy link
Contributor

@seberg: Yes, I am already checking the issue.

@rafaelcfsousa
Copy link
Contributor

rafaelcfsousa commented Jun 7, 2022

Ok, I already know what exactly is causing this bug.

The comparison kernels implemented with AVX and AVX512 use the universal intrinsic npyv_cmpneq_f64 (see below) to compute np.not_equal for dtype=double.

#define npyv_cmpneq_f64(A, B) _mm256_castpd_si256(_mm256_cmp_pd(A, B, _CMP_NEQ_OQ))

The universal intrinsic npyv_cmpneq_f64 for both SIMD extensions listed above uses the flag _CMP_NEQ_OQ to indicate how the comparison with NaN has to be done (more info here: https://stackoverflow.com/questions/16988199/how-to-choose-avx-compare-predicate-variants).

For _CMP_NEQ_OQ (Ordered comparisons returns false for NaN operands):

  • nan != nan --> false
  • nan != 0 --> false

For _CMP_NEQ_UQ (Unordered comparison returns true for NaN operands):

  • nan != nan --> true
  • nan != 0 --> true

@charris
Copy link
Member

charris commented Jun 7, 2022

I assume this doesn't need a backport?

@rafaelcfsousa
Copy link
Contributor

rafaelcfsousa commented Jun 7, 2022

In my point of view, we could follow one of the 4 options below to have this issue resolved:

  1. Use _CMP_NEQ_UQ instead of _CMP_NEQ_OQ (but I do not know yet the impact of doing that)

  2. Disable the optimization for AVX[2,512] so that SSE2 is used instead

  3. Move the dtypes f32 and f64 to a new dispatchable source file (loops_comparison_fp.dispatch.c.src) and enable on it only SSE (and the other architectures)

  4. Use scalar execution for AVX[2,512] when dtype=f32 or f64

@seberg
Copy link
Member

seberg commented Jun 7, 2022

Using _CMP_NEQ_OQ seems right to me on first sight, unless this is used in places where the unorderd version is needed (in which case, I guess we may need both universal intrinsics?)

EDIT: Whoops, copied the wrong intrinsic there...

@rafaelcfsousa
Copy link
Contributor

rafaelcfsousa commented Jun 7, 2022

@seberg and @charris :

I think we should use _CMP_NEQ_UQ instead of _CMP_NEQ_OQ.

>>> import numpy as np
>>> np.nan != np.nan
True
>>> np.nan != 0
True

The result when I compare anything with NaN is True.

Btw, npyv_cmpneq_f32 and npyv_cmpneq_f64 are only used by loops_comparison.dispatch.c.src.

@seberg
Copy link
Member

seberg commented Jun 7, 2022

Sorry, yes. I got the wrong intrinsic, I meant: I do think we should swap out the intrinsic/flag used and that seems like it should be the right solution. Are you planning on making a PR?

@rafaelcfsousa
Copy link
Contributor

Yes, I will submit a PR. Give me ~1h and I will submit it (I will test the other comparison function as well).

@WarrenWeckesser
Copy link
Member Author

FWIW: Locally, I just tried switching _CMP_NEQ_OQ to _CMP_NEQ_UQ for npyv_cmpneq_f32 and npyv_cmpneq_f64. The existing tests still passed, and the example that I gave above worked correctly for float32 and float64. @rafaelcfsousa, thanks for tracking down the issue so quickly!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
00 - Bug component: SIMD Issues in SIMD (fast instruction sets) code or machinery
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants