Raise errors on float->int conversions with NaN, inf #14412

rth · 2019-09-03T06:38:37Z

With ndarray.astype(..., casting="unsafe") float arrays with NaN/inf will be converted to np.iinfo(dtype).min/max,

>>> np.array([1, np.nan], dtype=np.float32).astype(np.int32)
array([          1, -2147483648], dtype=int32)
>>> np.array([1, np.inf], dtype=np.float32).astype(np.int32)
array([          1, -2147483648], dtype=int32)

(there are also some inconsistencies there cf #6109). This is very bad in practical applications as output data will be wrong by orders of magnitude. Other casting options simply disallow casting float to int.

At the same time, casting from dtype=np.object works as expected,

>>> np.array([1, np.inf], dtype=np.object).astype(np.int32)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
OverflowError: cannot convert float infinity to integer

It could be useful to have some additional casting option allowing to convert int to float, but would error on NaN or inf. This could be done manually,

if array.dtype.kind == 'f' and np.dtype(dtype).kind == 'i':
   # check for overflows and NaN
   _assert_all_finite(array)
result = array.astype(dtype)

but having it in numpy would be useful.

I also wonder if there is really a case when not raising on such conversions is meaningful (even with casting='unsafe'). For instance pandas does this conversions as expected (using numpy dtypes),

>>> pd.Series([1, np.nan], dtype=np.float64).astype(np.int)
[...]
ValueError: Cannot convert non-finite values (NA or inf) to integer

Numpy/Python version information:

>>> import sys, numpy; print(numpy.__version__, sys.version)
1.16.4 3.7.3 (default, Mar 27 2019, 22:11:17) [GCC 7.3.0]

The text was updated successfully, but these errors were encountered:

seberg · 2019-09-03T13:52:14Z

Yeah, I have been wondering about that myself. But on the other hand, blowing up casting options is a bit annoying. For integers specifically, it could be part of a error flags or so (i.e. if we had integer overflow errors, that could clearly handle this part).

aiqc · 2021-06-16T14:36:08Z

I agree about raising an error. Here is a simpler example to follow:

import numpy as np

arr2D = [
    [9,9,9],
    [9,np.nan,9],
    [9,9,9]
]

arr2D = np.array(arr2D)
arr2D = arr2D.astype('int32')

>>> arr2D
array([[          9,           9,           9],
       [          9, -2147483648,           9],
       [          9,           9,           9]], dtype=int32)

⚠️⚠️⚠️ -2147483648 ⚠️⚠️⚠️

I think this is C's int minimum?

seberg · 2021-06-16T14:47:03Z

@aiqc that returns undefined behaviour (as defind by the C standard). It is a bit scary/annoying, but right now you must not rely on the minimum value to be returned. Typical values are either min-int or 0. Most importantly, IIRC, tthis can fluctuate on the same computer (because the CPU has different ways to do the same thing except for this type of undefined behaviour, and we switch depending on whats faster – that is, we may or may not use SIMD instructions).

xref gh-19248. If we would not silence floating point warnings would at least warn you about this happening:

In [1]: import numpy as np
   ...: 
   ...: arr2D = [
   ...:     [9,9,9],
   ...:     [9,9e100,9],
   ...:     [9,9,9]
   ...: ]

In [2]: np.positive(arr2D, casting="unsafe", signature="l")
<ipython-input-2-7fc1fa2347c4>:1: RuntimeWarning: invalid value encountered in positive
  np.positive(arr2D, casting="unsafe", signature="l")
Out[2]: 
array([[                   9,                    9,                    9],
       [                   9, -9223372036854775808,                    9],
       [                   9,                    9,                    9]])

layne-sadler · 2021-06-16T14:54:25Z

@seberg if it can’t be reliably parsed then that’s even more reason to raise error on dtype conversion?

seberg · 2021-06-16T14:57:23Z

Yes, I don't mind raising an error, but I think a warning would be a good improvement. There are two things we may need to consider:

There may be quite a bit of code assuming it gets the smallest possible int value (and gets away with that, even if it isn't always true)
It would be good to know if there are any speed implications to sanitizing the values. Even if we just ensure the result is indeed the smallest possible int.

seberg · 2022-06-14T17:48:15Z

We are now giving a warning in this cases after gh-21437. Lets keep this one issue open about possibly raising an error. Or even sanitizing the output.
(Although, I currently doubt it will top the priority list in the forseeable future.)

aiqc · 2022-06-14T17:51:03Z

>>> np.nan == -2147483648
False

parejkoj · 2024-09-26T23:23:49Z

Pinging this, as we discovered some failures in our code due to:

np.full(n, np.nan, dtype=np.int64)

returning an architecture-dependent value (-MAX_INT on Linux x86-64, 0 on macos arm64), as expected for an undefined operation. I strongly support making this raise, and/or adding an unsafe_cast option to seterr for all float->int casts, so that users can choose to raise for all such unsafe casts. I don't see how anyone would want to maintain the existing undefined non-finite value casting behavior.

The warning message added earlier is a start, but it can be hard to track down just where in the code that warning is coming from. If one could make float->int casts always, that would at least make it very obvious what needed to be fixed in the user code.

seberg · 2024-09-27T04:34:26Z

That example fails in new NumPy versions @parejkoj. Although, not if you use e.g. np.float64(np.nan).

parejkoj · 2024-09-27T18:25:37Z

Oh, I'm glad that that's now a failure! We're currently on numpy version 1.26.4, and working on compatibility checks for 2.0. Is there a PR with this particular fix I could look at?

I'm not sure what you mean by "not if you use np.float64(np.nan)", though?

seberg · 2024-09-29T12:45:36Z

If you pass a NumPy Nan, rather than the Python float nan it will cast normally, and normal casts are still allowed/unchanged.

rth changed the title ~~astype casting option to error on float->dtype conversions with NaN, inf~~ Raise errors on float->int conversions with NaN, inf Sep 3, 2019

rth mentioned this issue Sep 3, 2019

Fix float->int casting with NaN, inf in check_array scikit-learn/scikit-learn#14872

Merged

1 task

seberg mentioned this issue Sep 7, 2021

numpy.ndarray.astype converts np.nan differently across OSs #19816

Closed

seberg mentioned this issue Sep 16, 2021

Inconsistent behavior in casting from float32 to uint8 depending on platform #19881

Closed

seberg mentioned this issue Nov 8, 2021

Nan are converted to int with slicing #4592

Closed

grlee77 mentioned this issue Dec 6, 2021

strange astype casting behavior on MacOS x86_64 conda-forge/numpy-feedstock#249

Open

1 task

This was referenced Jun 14, 2022

np.nan astype np.int32 gives -2147483648 #13101

Closed

unexpected behaviour on full_like(<int_array>, nan) #8017

Closed

astype(dtype) converts NaN to np.iinfo(dtype).min for int32 and int64, but 0 for int16 #6109

Closed

seberg mentioned this issue Aug 3, 2022

BUG: np.nan converts to np.int64 without error. #22076

Closed

DocOtak mentioned this issue Sep 29, 2022

Don't cast NaN to integer pydata/xarray#7098

Closed

aganders3 mentioned this issue Oct 8, 2022

Avoid possible divide-by-zero in Vectors layer thumbnail update napari/napari#5192

Merged

6 tasks

seberg mentioned this issue Feb 12, 2025

BUG: numpy.ndarray.astype treats float( sys.maxsize ) unexpectedly #28317

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Raise errors on float->int conversions with NaN, inf #14412

Raise errors on float->int conversions with NaN, inf #14412

rth commented Sep 3, 2019 •

edited

Loading

seberg commented Sep 3, 2019

aiqc commented Jun 16, 2021 •

edited

Loading

seberg commented Jun 16, 2021

layne-sadler commented Jun 16, 2021

seberg commented Jun 16, 2021

seberg commented Jun 14, 2022

aiqc commented Jun 14, 2022

parejkoj commented Sep 26, 2024 •

edited

Loading

seberg commented Sep 27, 2024

parejkoj commented Sep 27, 2024

seberg commented Sep 29, 2024

Raise errors on float->int conversions with NaN, inf #14412

Raise errors on float->int conversions with NaN, inf #14412

Comments

rth commented Sep 3, 2019 • edited Loading

Numpy/Python version information:

seberg commented Sep 3, 2019

aiqc commented Jun 16, 2021 • edited Loading

seberg commented Jun 16, 2021

layne-sadler commented Jun 16, 2021

seberg commented Jun 16, 2021

seberg commented Jun 14, 2022

aiqc commented Jun 14, 2022

parejkoj commented Sep 26, 2024 • edited Loading

seberg commented Sep 27, 2024

parejkoj commented Sep 27, 2024

seberg commented Sep 29, 2024

rth commented Sep 3, 2019 •

edited

Loading

aiqc commented Jun 16, 2021 •

edited

Loading

parejkoj commented Sep 26, 2024 •

edited

Loading