Skip to content

Issues with "Mixing arrays and Python scalars" section #98

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
asmeurer opened this issue Dec 5, 2020 · 14 comments
Closed

Issues with "Mixing arrays and Python scalars" section #98

asmeurer opened this issue Dec 5, 2020 · 14 comments
Labels
topic: Type Promotion Type promotion.
Milestone

Comments

@asmeurer
Copy link
Member

asmeurer commented Dec 5, 2020

A few issues with the mixing arrays and Python scalars section that came up from testing:

  • The matmul operator (@) does not (and should not) support scalar types, so it should be excluded.
  • Should boolean arrays be required to work with arithmetic operators. NumPy gives errors in some cases, for example:
>>> import numpy as np
>>> np.array(False) - False
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: numpy boolean subtract, the `-` operator, is not supported, use the bitwise_xor, the `^` operator, or the logical_xor function instead. 
  • How should casting work? For example, int8 array + large Python int. Should it cast the int, or give an error. Or should this be unspecified.
  • The above question also applies for Python float -> float32 casting, which is implicitly supported by the text that is currently there, but is unsafe in general (is there a "safe" variant of float64 -> float32, or is it always unsafe? Either way, this would be value based casting, which I thought we were trying to avoid).
@asmeurer
Copy link
Member Author

asmeurer commented Dec 5, 2020

Actually, I think - is the only operator that NumPy doesn't allow for booleans. But there's also the reverse which is that bitwise operators are not supported for floats. So I think there should be some text like "for operations that are supported by the given dtypes" and the operators text should specify that (I guess the bitwise ones already do, but there's nothing in __sub__ about boolean support or non-support).

@asmeurer
Copy link
Member Author

asmeurer commented Dec 5, 2020

IMHO, the spec should split the operators into four classes:

  • Arithmetic operators (+, -, *, /, etc. (including unary + and -))
  • Boolean operators (~, ^, |, etc.)
  • Comparison operators (==, <=, ...)
  • Array operators (@ is the only one)

Arithmetic operators should be defined for integer and floating point dtypes, but not required for the boolean dtype. Boolean operators should be defined for integer (bitwise) and boolean dtypes, but are not required for floating point dtypes. Comparison operators are defined for all arrays regardless of dtype. The array operator (matmul) is defined only for integer and floating point dtypes, and also has additional shape constraints (the input arrays must have at least 1 (2?) dimensions, which align in a certain way).

Also, arithmetic, boolean, and array operators participate in type promotion. Comparison operators may error on invalid type combinations (which aren't specified anyway), but always return an array with a bool dtype.

But I may also be missing or forgetting prior discussions that happened with this.

@asmeurer
Copy link
Member Author

asmeurer commented Dec 5, 2020

A similar split could be made for the 2-argument array functions.

@asmeurer
Copy link
Member Author

asmeurer commented Dec 5, 2020

Also, it looks like NumPy doesn't do what the spec says for float scalars. The spec says:

The expected behavior is then equivalent to:

  1. Convert the scalar to a 0-D array with the same dtype as that of the array
    used in the expression.
  2. Execute the operation for array <op> 0-D array (or 0-D array <op> array if scalar was the left-hand argument).

But NumPy apparently converts float scalars into float64:

>>> (np.array(0.0, dtype=np.float32) * 0.0).dtype
dtype('float64')
>>> (np.array(0.0, dtype=np.float32) * np.array(0.0, dtype=np.float32)).dtype
dtype('float32')

Is it intentional to give a different behavior from NumPy here (I didn't check the other array libraries yet)?

asmeurer added a commit to data-apis/array-api-tests that referenced this issue Dec 5, 2020
This is based on what the spec currently says, but there are some issues with
it (see data-apis/array-api#98).
@shoyer
Copy link
Contributor

shoyer commented Dec 17, 2020

Is it intentional to give a different behavior from NumPy here (I didn't check the other array libraries yet)?

Yes, this is an intentional deviation, copied from libraries such as PyTorch and JAX. On GPU/TPU it is important to avoid inadvertent type promotion to float64.

@asmeurer
Copy link
Member Author

The NumPy behavior is also apparently only that way for shape () arrays:

>>> (np.array(0.0, dtype=np.float32) * 0.0).dtype
dtype('float64')
>>> (np.array([0.0], dtype=np.float32) * 0.0).dtype
dtype('float32')

So we should consider the NumPy promotion to float64 as incorrect (I intended to raise this issue on the NumPy tracker but didn't get around to it yet).

@shoyer
Copy link
Contributor

shoyer commented Dec 17, 2020

So we should consider the NumPy promotion to float64 as incorrect (I intended to raise this issue on the NumPy tracker but didn't get around to it yet).

NumPy's type promotion rules treat 0d arrays the same scalars:

>>> (np.float32(0.0) * 0.0).dtype
dtype('float64')

@asmeurer
Copy link
Member Author

Yeah, the behavior is the same whether it's an "array scalar" or "scalar constant" (or whatever NumPy calls them). But we don't make any such distinction in the spec, and don't special-case 0-D arrays as far as type promotion is concerned:

Type promotion rules must apply when determining the common result type for two array operands during an arithmetic operation, regardless of array dimension. Accordingly, zero-dimensional arrays must be subject to the same type promotion rules as dimensional arrays.

@shoyer
Copy link
Contributor

shoyer commented Dec 17, 2020 via email

@rgommers
Copy link
Member

IMHO, the spec should split the operators into four classes:

* Arithmetic operators (+, -, *, /, etc. (including unary + and -))

* Boolean operators (~, ^, |, etc.)

* Comparison operators (==, <=, ...)

* Array operators (@ is the only one)

Arithmetic operators should be defined for integer and floating point dtypes, but not required for the boolean dtype. Boolean operators should be defined for integer (bitwise) and boolean dtypes, but are not required for floating point dtypes. Comparison operators are defined for all arrays regardless of dtype. The array operator (matmul) is defined only for integer and floating point dtypes, and also has additional shape constraints (the input arrays must have at least 1 (2?) dimensions, which align in a certain way).

I agree with all this, seems like that would help make the operator description and behaviour clearer.

Also, arithmetic, boolean, and array operators participate in type promotion. Comparison operators may error on invalid type combinations (which aren't specified anyway), but always return an array with a bool dtype.

I would not mix this into the type promotion section; that would make it harder to understand for little gain. The individual functions already describe this, and I don't think anyone will be surprised by for example equal(int_array, int_array) -> bool_array.

@asmeurer
Copy link
Member Author

Yes, as far as organization in the spec document, I think the above distinctions should go on the page about operators, not the page about type promotion (or someone more general if we also apply it to functions). The type promotion page describes those operator classes that participate in type promotion, but doesn't talk about the other ones.

@rgommers
Copy link
Member

rgommers commented Dec 19, 2020

How should casting work? For example, int8 array + large Python int. Should it cast the int, or give an error. Or should this be unspecified.

Here's how libraries currently behave, testing an int32 array added to a Python ints (3 results: small, too large for 32-bit, too large for 64-bit).

Code:

import numpy as np
import dask.array as da
import torch
import tensorflow as tf
import jax.numpy as jnp
import mxnet
try:
    import cupy as cp
except ImportError:
    # CuPy is GPU-only, so may not be available
    cp = None


def ones(mod):
    # Create a (3, 2)-shaped array of int32 1's
    if mod in (da, mxnet.nd):
        x = mod.ones((3, 2), dtype=np.int32)  # MXNet doesn't have dtype literals
    else:
        x = mod.ones((3, 2), dtype=mod.int32)

    return x


def get_firstelem(x):
    # Get element (0, 0) from 2-D arrays
    if isinstance(x, da.Array):
        x = x.compute()

    if mod == mxnet.nd:
        x0 = int(x[0, 0].asscalar())  # MXNet doesn't return scalar
    else:
        x0 = int(x[0, 0])

    return x0



libraries = {
    'numpy': np,
    'pytorch': torch,
    'MXNet': mxnet.nd,
    'dask': da,
    'tensorflow': tf,
    'jax': jnp,
}

if cp is not None:
    libraries['cupy'] = cp


results = libraries.copy()


int_small = 1
int_above32bit = int(np.iinfo(np.int32).max) + 10
int_above64bit = int(np.iinfo(np.int64).max) + 10
int_vals = [int_small, int_above32bit, int_above64bit]
for name, mod in libraries.items():
    results[name] = None
    res = []
    for val in int_vals:
        x = ones(mod)
        try:
            tmp = val + x
            out = (tmp.dtype, get_firstelem(tmp))
        except Exception as e:
            out = e

        res.append(out)

    results[name] = res


for key, val in results.items():
    print('')
    print(key + ':\n' + (len(key) + 1) * '-')
    for v in val:
        print(v)

Results:

numpy:
------
(dtype('int32'), 2)
(dtype('int64'), 2147483658)
(dtype('float64'), 9223372036854775808)

pytorch:
--------
(torch.int32, 2)
(torch.int32, -2147483638)
Overflow when unpacking long

MXNet:
------
(<class 'numpy.int32'>, 2)
(<class 'numpy.int32'>, -2147483647)
(<class 'numpy.int32'>, -2147483647)

dask:
-----
(dtype('int32'), 2)
(dtype('int64'), 2147483658)
(dtype('float64'), 9223372036854775808)

tensorflow:
-----------
(tf.int32, 2)
(tf.int32, -2147483638)
Can't convert Python sequence with out-of-range integer to Tensor.

jax:
----
(dtype('int32'), 2)
(dtype('int32'), -2147483638)
Python int too large to convert to C long

cupy:
-----
(dtype('int32'), 2)
(dtype('int64'), 2147483658)
(dtype('int64'), -9223372036854775798)
  • NumPy does value-based casting.
  • MXNet consistently returns int32 (with overflows).
  • PyTorch, TensorFlow and JAX have mix between overflowing int32 and raising, depending on if the Python int fits into int64.
  • CuPy upcasts to int64 if Python int fits into int64, and overflows beyond that.
  • Dask behaves like NumPy here, but that's inherited behaviour - for a dask array of cupy arrays it would behave like CuPy.

So this is a mess. I personally think NumPy and MXNet are behaving the most consistent here, what CuPy does is also not unreasonable, and what PyTorch/TensorFlow/JAX do looks bad.

The above question also applies for Python float -> float32 casting, which is implicitly supported by the text that is currently there, but is unsafe in general (is there a "safe" variant of float64 -> float32, or is it always unsafe? Either way, this would be value based casting, which I thought we were trying to avoid).

Yes indeed. Here there's only two flavors, upcasting to float64 or keeping as float32 in which case the result will be positive or negative inf:

>>> float_above32bit
5.104235199577933e+38

>>> r = torch.ones(2, dtype=torch.float32) + float_above32bit
>>> r.dtype
torch.float32
>>> r
tensor([inf, inf])

>>> r = np.ones(2, dtype=np.float32) + float_above32bit
>>> r.dtype
dtype('float64')
>>> r
array([5.1042352e+38, 5.1042352e+38])

Only NumPy and CuPy upcast; MXNet, TensorFlow, JAX and PyTorch all keep float32 and turn the Python float into inf.

In both the int and float cases, I think we have little choice to simply state in the spec that the casting behaviour will be implementation-dependent.

@shoyer
Copy link
Contributor

shoyer commented Dec 19, 2020

Unsigned integer overflow is technically undefined behavior in C/C++, and that seems to work out OK. It's not great, but I think these inconsistencies are mostly an indication that nobody expects Python's big ints to do something sensible when combined with fixed size arrays.

@kgryte
Copy link
Contributor

kgryte commented Nov 4, 2021

Wrt the OP:

  1. Disallowing scalar operands when using the matmul operator was addressed in Disallow scalar operands when using the matmul operator #307.
  2. Whether boolean arrays should be allowed in arithmetic operations has been addressed. The spec currently includes guidance that operands should have numeric data types.
  3. Casting and overflow behavior is left unspecified and implementation-dependent.
  4. Same as (3).

Grouping operators into separate classes was addressed in #308.

As the concerns raised in this issue have been resolved, will close out. Any further concerns and discussion can be raised on a new issue.

@kgryte kgryte closed this as completed Nov 4, 2021
cr313 added a commit to cr313/test-array-api that referenced this issue Apr 19, 2024
This is based on what the spec currently says, but there are some issues with
it (see data-apis/array-api#98).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic: Type Promotion Type promotion.
Projects
None yet
Development

No branches or pull requests

4 participants