Add `dtype` keyword argument to `sum` and `prod` #238

kgryte · 2021-07-26T18:53:45Z

This PR

adds a dtype keyword argument to sum and prod based on discussions in gh-202 and in consortium meetings. Adding such a keyword aids in preventing overflows during accumulation.
specifies that the default behavior (i.e., with dtype=None) be that the returned array have the default data type of the same kind as the data type of the input array, so long as the default data type has a range of values equal to or larger than the input array data type.

For example,
- if provided an int8 array and the default integer data type is int32, the returned array should have data type int32.
- if provided an int64 array and the default integer data type is int32, the returned array should have data type int64.
- if provided a float32 array and the default floating-point data type is float64, the returned array should have data type float64.
- if provided a float64 array and the default floating-point data type is float32, the returned array should have data type float64.
In the event a user wants to return an array having the same data type as the input array, a user can explicitly specify the return array dtype. The default behavior is intended to prevent potential accumulation footguns.

leofang

LGTM except for a few nitpicks.

spec/API_specification/statistical_functions.md

Co-authored-by: Leo Fang <leofang@bnl.gov>

…-api into sum-prod-dtype

…sum-prod-dtype

spec/API_specification/statistical_functions.md

Co-authored-by: Aaron Meurer <asmeurer@gmail.com>

spec/API_specification/statistical_functions.md

asmeurer · 2021-09-17T22:48:18Z

This looks good to me.

asmeurer · 2021-09-17T22:51:32Z

Actually scratch that, I just noticed a potential problem. NumPy's dtype argument to sum is documented as follows:

dtype : dtype, optional
   The type of the returned array and of the accumulator in which the
   elements are summed.  The dtype of `a` is used by default unless `a`
   has an integer dtype of less precision than the default platform
   integer.  In that case, if `a` is signed then the platform integer
   is used while if `a` is unsigned then an unsigned integer of the
   same precision as the platform integer is used.

As written here now, a uint64 array would cast to int64 by default, which would be a downcast (uint64 can represent values that don't fit in the signed int64).

spec/API_specification/statistical_functions.md

asmeurer · 2021-09-17T22:59:15Z

OK, I think that wording makes it clearer. This does look good now.

kgryte · 2021-09-17T22:59:24Z

@asmeurer I've updated the text to clarify promotion guidance.

kgryte · 2021-09-17T23:14:42Z

This has been reviewed. If any further changes are necessary, we can add them in a follow-up PR.

seberg · 2021-09-17T23:14:49Z

As the quoted docs say, numpy only applies this to integers (and bool). It does not upcast floats, in particular float32. Is that part intentional, or just a typo?

kgryte · 2021-09-17T23:19:59Z

@seberg Upcasting floats is intentional, as, in principle (and while not currently required by the spec), also applies to float16 (bfloat16) in accelerator libraries, where overflow/precision concerns are still valid, albeit less of common concern than, e.g., int8.

seberg · 2021-09-17T23:28:34Z

@kgryte ah OK. I have not thought much about what I would consider ideal, it is/was just a bit surprising.

kgryte · 2021-09-17T23:39:33Z

@seberg Were we to make a distinction between integral and floating dtypes for how promotion is handled relative to the default dtypes, this would add complexity to user mental models in terms of when promotion occurs.

And further, the same concerns motivating promotion to the default dtype for smaller integral dtypes applies to floating dtypes. Namely, that accumulation can lead to overflow and erroneous results. For example, for float32, the maximum safe integer is less than 2e7, leading to potential user footguns when summing and/or multiplying many large positive numbers. If the default floating dtype for a specification-compliant array library is float64 and dtype is None, the library should promote to float64 to avoid overflow and/or precision errors.

When an array library's default floating dtype is float64, if a user wants to return a float32 array when providing a float32 array, they can explicitly specify via the dtype kwarg that they wish to receive an output array of the same dtype. (This same logic applies to a user who provides an int8 array and wants an int8 array in return). Whether an array library chooses to accumulate internally in float32 is, however, an implementation detail.

kgryte added 2 commits July 26, 2021 11:43

Add dtype keyword to sum and prod

b6b86cd

Update copy

ce75682

kgryte added the API change Changes to existing functions or objects in the API. label Jul 26, 2021

Fix copy

8bb356b

leofang requested changes Jul 27, 2021

View reviewed changes

asmeurer mentioned this pull request Jul 27, 2021

ENH: Implementation of the NEP 47 (adopting the array API standard) numpy/numpy#18585

Merged

asmeurer reviewed Jul 27, 2021

View reviewed changes

spec/API_specification/statistical_functions.md Outdated Show resolved Hide resolved

kgryte and others added 7 commits July 27, 2021 22:42

Remove empty lines

6e23ff1

Co-authored-by: Leo Fang <leofang@bnl.gov>

Remove duplicate word

f6e7f9f

Co-authored-by: Leo Fang <leofang@bnl.gov>

Remove empty lines

0cd7bf4

Merge branch 'sum-prod-dtype' of https://github.com/pydata-apis/array…

f57c042

…-api into sum-prod-dtype

Fix promotion rules for sum and prod

d40f70c

Merge branch 'main' of https://github.com/pydata-apis/array-api into …

69756c2

…sum-prod-dtype

Merge branch 'main' of https://github.com/pydata-apis/array-api into …

7320f46

…sum-prod-dtype

asmeurer reviewed Sep 17, 2021

View reviewed changes

spec/API_specification/statistical_functions.md Outdated Show resolved Hide resolved

Update desc

db7a266

Co-authored-by: Aaron Meurer <asmeurer@gmail.com>

kgryte commented Sep 17, 2021

View reviewed changes

spec/API_specification/statistical_functions.md Outdated Show resolved Hide resolved

Update desc

ae7e79b

kgryte commented Sep 17, 2021

View reviewed changes

spec/API_specification/statistical_functions.md Outdated Show resolved Hide resolved

kgryte commented Sep 17, 2021

View reviewed changes

spec/API_specification/statistical_functions.md Outdated Show resolved Hide resolved

Clarify casting rules

feb5fe4

kgryte merged commit 25e24ca into main Sep 17, 2021

kgryte deleted the sum-prod-dtype branch September 17, 2021 23:14

kgryte mentioned this pull request Sep 17, 2021

Specify casting rules and accepted input dtypes for reductions better #202

Closed

seberg mentioned this pull request Sep 29, 2021

ENH: Add the linalg extension to the array_api submodule numpy/numpy#19980

Merged

rgommers mentioned this pull request Jan 17, 2024

Reconsider sum/prod/trace upcasting for floating-point dtypes #731

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add `dtype` keyword argument to `sum` and `prod` #238

Add `dtype` keyword argument to `sum` and `prod` #238

Uh oh!

kgryte commented Jul 26, 2021 •

edited

Loading

Uh oh!

leofang left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

asmeurer commented Sep 17, 2021

Uh oh!

asmeurer commented Sep 17, 2021

Uh oh!

Uh oh!

Uh oh!

asmeurer commented Sep 17, 2021

Uh oh!

kgryte commented Sep 17, 2021

Uh oh!

kgryte commented Sep 17, 2021

Uh oh!

seberg commented Sep 17, 2021

Uh oh!

kgryte commented Sep 17, 2021

Uh oh!

seberg commented Sep 17, 2021

Uh oh!

kgryte commented Sep 17, 2021

Uh oh!

Uh oh!

Add dtype keyword argument to sum and prod #238

Add dtype keyword argument to sum and prod #238

Uh oh!

Conversation

kgryte commented Jul 26, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

leofang left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

asmeurer commented Sep 17, 2021

Uh oh!

asmeurer commented Sep 17, 2021

Uh oh!

Uh oh!

Uh oh!

asmeurer commented Sep 17, 2021

Uh oh!

kgryte commented Sep 17, 2021

Uh oh!

kgryte commented Sep 17, 2021

Uh oh!

seberg commented Sep 17, 2021

Uh oh!

kgryte commented Sep 17, 2021

Uh oh!

seberg commented Sep 17, 2021

Uh oh!

kgryte commented Sep 17, 2021

Uh oh!

Uh oh!

Add `dtype` keyword argument to `sum` and `prod` #238

Add `dtype` keyword argument to `sum` and `prod` #238

kgryte commented Jul 26, 2021 •

edited

Loading