Skip to content

Unexpected truncation when storing float to int array #8733

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
OliverEvans96 opened this issue Mar 2, 2017 · 24 comments
Closed

Unexpected truncation when storing float to int array #8733

OliverEvans96 opened this issue Mar 2, 2017 · 24 comments

Comments

@OliverEvans96
Copy link

I was caught offguard by the following behavior using numpy 1.12:

Current behavior

In  [1]:  a=array([1,0])
          a[0] = 1 + 1.1  
          print(a.dtype)       
          print(a)
          print(a.dtype)               
Out [1]:  int64
          [ 2  0 ]
          int64 

Expected behavior

In  [2]:  a=array([1,0])
          a[0] = 1 + 1.1  
          print(a.dtype)       
          print(a)
          print(a.dtype)               
Out [2]:  int64
          [ 2.1  0. ]
          float64

It may be that this is intentional behavior, but I would like to suggest automatic conversion of integer arrays to floating point arrays when a float is saved to an element or slice of the array. I'm not sure how straighforward implementation of this feature would be.

At the very least, there ought to be a warning when truncation occurs, especially since I at no point in the above code explicity declare the dtype of A to be int64, so I (and I think most users) would expect it to default to float64.

I'm interested in opinions on the matter.

Thanks,
Oliver

@eric-wieser
Copy link
Member

Automatic conversion can be a very expensive operation if the array is large, so I'm definitely -1 on that. A warning seems pretty reasonable though, and it wouldn't surprise me if a mechanism already exists to do that.

@OliverEvans96
Copy link
Author

OliverEvans96 commented Mar 2, 2017 via email

@eric-wieser
Copy link
Member

eric-wieser commented Mar 2, 2017

That choice is mostly irrelevant - the important thing is that it's best to default to the backwards-compatible behaviour.

@OliverEvans96
Copy link
Author

OliverEvans96 commented Mar 2, 2017 via email

@eric-wieser
Copy link
Member

eric-wieser commented Mar 2, 2017

Ok, so here's where it's already implemented:

>>> a=array([1,0])
>>> np.copyto(a[0,...], 1 + 1.1)
TypeError: Cannot cast scalar from dtype('float64') to dtype('int32') according to the rule 'same_kind'

a[0,...] is a trick to return a view of a single element, rather than create a scalar

@OliverEvans96
Copy link
Author

OliverEvans96 commented Mar 3, 2017 via email

@eric-wieser
Copy link
Member

array_assign_subscript is most like the place to look.

@eric-wieser
Copy link
Member

Digging deeper into copyto, it looks like you'd want to add

if (!PyArray_CanCastTypeTo(PyArray_DESCR(src), PyArray_DESCR(dst), casting))

somewhere.

I think if you did implement this, it should take the form of np.setconfig(casting='same-kind') or some global setting like floating point warnings.

@charris
Copy link
Member

charris commented Mar 3, 2017

We should definitely raise an error for this. ISTR that this issue may have been reported before.

@pv
Copy link
Member

pv commented Mar 3, 2017 via email

@eric-wieser
Copy link
Member

eric-wieser commented Mar 3, 2017

Hmm, what's the end goal here?

If I have this code

a = np.zeros((100, 100), dtype=np.int8)
b = np.zeros((100,), dtype=float64)

a[0] = b

We want it to throw a warning, right? What should that warning encourage if this behaviour is deliberate? a[0] = b.astype(np.int8) incurs an extra copy, and np.copyto(a[0], b, casting=...) doesn't work for advanced indexing.

@charris
Copy link
Member

charris commented Mar 3, 2017

The idea is that most such occurrences will be programming errors. That was certainly the case for complex numbers. Being explicit also has the advantage that the conversion, truncation or rounding, will be clear. The code will also be easier to understand with an explicit cast.

@eric-wieser
Copy link
Member

Right, but my point is that we have no way of specifying that cast without also doing a copy, do we?

@charris
Copy link
Member

charris commented Mar 3, 2017

I think you can use functions for that, copyto for instance. The default conversion of floats is truncation. For inplace addition you can use the np.add with the out parameter, etc.

@seberg
Copy link
Member

seberg commented Mar 3, 2017

For inplace ufunc, we already have the warning. Assignment is the only thing that is "unsafe" still. (though possibly its just a deprecation warning, don't remember the current state)

@eric-wieser
Copy link
Member

out and copyto only work for views, not advanced indexing

@OliverEvans96
Copy link
Author

OliverEvans96 commented Mar 3, 2017 via email

@eric-wieser
Copy link
Member

Also, I guess I'm making assumptions here about what __setitem__ does under the hood. It might be that casting incurs a copy even implictly

@seberg
Copy link
Member

seberg commented Mar 3, 2017

Depends a lot on the code paths taken, most involved ones end up doing much the same as copyto (or even call it for the view cases). Plus object arrays do get an extra copy in some cases. But no, normally there is no copy made. You could add indexing to copyto in principle. I am not quite sure I am convinced that this is the only right thing. OTOH, it is one of those traps learning numpy

@njsmith
Copy link
Member

njsmith commented Mar 3, 2017

Ugh, I thought we had deprecated this. But @seberg's right, we seem to have gone all the way through the deprecation cycle for in-place operations, but missed simple assignment:

In [4]: a = np.arange(10)

In [5]: a += 0.5
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-5-633f1f7b6cb9> in <module>()
----> 1 a += 0.5

TypeError: Cannot cast ufunc add output from dtype('float64') to dtype('int64') with casting rule 'same_kind'

In [6]: a[:] = 0.5

In [7]: 

I agree with @charris et al that we should deprecate these unsafe conversions on assignments and eventually make them an error.

@OliverEvans96
Copy link
Author

I'm not sure whether this directly addresses the issue or if it's reasonable, but it just occured to me that there would be no ambiguity in my original situation if a=array([1,0]) defaulted to a float array. If that were the case, then somebody who intended to truncate the value they're saving in an int array would simply explicitly declare a=array([1,0],dtype=int64).

Is there a reason this is not presently the case? Would it break backwards-compatability or be significantly less efficient for most use cases?

@charris
Copy link
Member

charris commented Mar 9, 2017

if a=array([1,0]) defaulted to a float array. If that were the case,

Terrible compatibility break. The default is to use the minimum suitable kind compatible with the given elements.

@stevengj
Copy link

stevengj commented Mar 24, 2022

FWIW, Julia has always thrown an error when you try to implicitly assign a non-integer value to an integer-typed location:

julia> a = [1,0]
2-element Vector{Int64}:
 1
 0

julia> a[1] = 1.1
ERROR: InexactError: Int64(1.1)

and I haven't heard anyone say that they would prefer silent truncation. (a[1] = 2.0 succeeds because the floating-point value 2.0 has an exact conversion to an integer type.)

Deprecating (emitting a warning) then throwing an error seems like a good route for numpy too.

@seberg
Copy link
Member

seberg commented Jan 18, 2024

Closing this, part of the discussion here predates gh-25621 by a long time, but it just seems like a question more anyway even back then.

@seberg seberg closed this as not planned Won't fix, can't repro, duplicate, stale Jan 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants