Skip to content

linspace behavior is underspecified for integer dtypes #392

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
kgryte opened this issue Feb 17, 2022 · 1 comment · Fixed by #393
Closed

linspace behavior is underspecified for integer dtypes #392

kgryte opened this issue Feb 17, 2022 · 1 comment · Fixed by #393
Labels
topic: Creation Array creation.

Comments

@kgryte
Copy link
Contributor

kgryte commented Feb 17, 2022

The specification for linspace is currently underspecified for integer output array dtypes. Namely, the specification is silent concerning the following scenarios:

  1. what should happen start and stop are floating-point numbers with non-integral values and dtype is an integer data type?

    linspace( 1.4, 100.75, num=50, dtype="int32" )
  2. what should happen when start and stop are Python ints and dtype is an integer data type, but the computed spacing between adjacent elements is not an integral value?

    linspace( 0, 10, num=100, dtype="int32" )

For both scenarios, one option is to add a note to the specification stating that behavior in these instances is implementation-dependent--a conforming implementation would be free to round/truncate, raise an exception, or something else.

Otherwise, for (1), this could be explicitly disallowed in the specification; i.e., if start and stop are floating-point, then only floating-point dtypes are allowed.

For (2), this is somewhat trickier to resolve as the spacing between adjacent elements is an implicit variable which users may find difficult to reason about without explicitly computing the spacing and determining whether, say, a specific argument combination might trigger an exception. However, we could opt to be pedantic and only allow integer dtypes when start, stop, and the computed spacing are all integer values. Otherwise, an exception should be raised.

A final option is that linspace could be restricted to only floating-point data types. If a user wants to return an output array with an integer data type, they could use arange and explicitly specify the step.

Prior Art

  • NumPy's linspace floors non-integral values when the output array data type is an integer dtype.

    >>> np.linspace(1.0,10,21,dtype="int32")                                                            
    array([ 1,  1,  1,  2,  2,  3,  3,  4,  4,  5,  5,  5,  6,  6,  7,  7,  8,
        8,  9,  9, 10], dtype=int32)
  • TensorFlow's linspace only supports floating-point data types.

  • PyTorch's linspace floors non-integral values when the output array data type is an integer dtype.

    >>> torch.linspace(1.0,10,21,dtype=torch.int32)
    tensor([ 1,  1,  1,  2,  2,  3,  3,  4,  4,  4,  4,  5,  5,  6,  6,  7,  8,  8,
         9,  9, 10], dtype=torch.int32)
  • MXNet's np.linspace attempts to match NumPy's API.

Observations

In general, returning non-evenly spaced values due to integer rounding seems generally undesired and not particularly useful. If the specification was more restrictive, users wanting to replicate the current functionality of NumPy and PyTorch could generate a floating-point output array using linspace and then perform an explicit cast to the desired integer data type.

@kgryte kgryte added the topic: Creation Array creation. label Feb 17, 2022
@kgryte
Copy link
Contributor Author

kgryte commented Feb 21, 2022

This issue was discussed during the consortium meeting on 2022-02-17. The general consensus is that the primary intended use case of linspace is for floating-point output dtypes. While array libraries, such as NumPy, may have historical reasons for supporting integer output dtypes, other APIs, such as arange, may be better suited for generating integer arrays.

Accordingly, the recommendation is to update the specification to recommend only floating-point output data types for linspace, while allowing conforming implementations to keep support for integer data types should they have a compelling reason to do so. For those libraries choosing to support integer output data types, behavior is implementation-defined.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic: Creation Array creation.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant