Fixes to mean and better precision on sum (cpu) #3687

willyborn · 2025-08-04T12:38:24Z

When strided weights are used, the mean function typically returned random results.
The produced result of mean is no longer dependent on platform or processing style (linear, strided, on CPU or GPU)

Description

b74f28f sum on CPU now has the same precision as other platforms

Introduction of Kahan summation algorithm for series. This algorithm reduced the error when a series of summations are performed with values of varying sizes.
With this introduction, the difference in result of a linear array and strided array is minimal. The precision in the tests are as a result improved.
As result, the precision of many tests can be improved now since all platforms generate similar results, independent from parallel or serial processing.

9307d6f Fixes to mean, on all platforms
All:

Introduction of Kahan summation algorithm for series. This algorithm reduced the error when a series of summations are performed with values of varying sizes.
With this introduction, the difference in result of a linear array and strided array is minimal. The precision in the tests are as a result improved.
Negative weights can no longer generate a division by zero.
In most cases sub-arrays would generated random results with strided weights.

CPU:

the idxs of the weights array is now calculated according to its own strides iso the strides of the input array.

CUDA:

Param were used without full initialisation. Because the construction of Param’s are deprecated, they are replaced by the createEmptyArray() constructions.
Idxs of weights are now calculated according to its own strides. Idem for the offsets.
Small arrays are processed on the CPU. Strides arrays are now processed on the GPU, independent from size since the CPU version only handles linear arrays.

ONEAPI:

Idxs of weights are now calculated according to its own strides. Idem for offsets.
Param were used without full initialisation. Because the construction of Param’s are deprecated, they are replaced by the createEmptyArray() constructions.
Small arrays are processed on the CPU. Strides arrays are now processed on the GPU, independent from size since the CPU version only handles linear arrays.
Offsets are handle for the linear processing.

OPENCL:

Idxs of weights are now calculated according to its own strides. Idem for offsets.
When converting a linear array to a vector, the weights offsets were overwritten.
The size of the tmpOut and tmpWeight arrays are now correctly sized. Intermediate results were overwritten resulting in a mean of a part of the array.
Small arrays are processed on the CPU. Strides arrays are now processed on the GPU, independent from size since the CPU version only handles linear arrays.
Param were used without full initialisation. Because the construction of Param’s are deprecated, they are replaced by the createEmptyArray() constructions.

TEST:
- extra tests added on all temporary formats.
- allowed fault tolerance for tests is lowered, since now the correct mean is calculated for all backends.
- MeanAllTest now uses random data iso constants for testing. This blocked the detection of partial processing of the input.

Additional information about the PR answering following questions:

Is this a new feature or a bug fix? BUG
More detail if necessary to describe all commits in pull request.
Why these changes are necessary. CORRECT RESULTS WERE NOT GUARANTEED
Potential impact on specific hardware, software or backends. NONE
New functions and their functionality. NONE
Can this PR be backported to older versions? NO
Future changes not implemented in this PR. NO
Fixes:

Changes to Users

Better precision in all circumstances, and independent form platform or array format.

Checklist

christophe-murphy · 2025-08-04T23:55:08Z

Thanks for this willyborn. FYI, our Buildbot CI system failed to build the PR with the following error:

/var/lib/buildbot/workers/ubuntu2204-cuda-toolkit-12-9/linux-all/build/src/backend/cuda/kernel/mean.hpp(65): error: more than one operator "*" matches these operands: built-in operator "arithmetic * arithmetic" function "operator*(const __half &, const __half &)" (declared at line 223 of /usr/local/cuda/include/cuda_fp16.hpp) operand types are: __half * float To y = (rhs - (*lhs)) * r_wt / (*l_wt) - (*c);

willyborn · 2025-08-06T19:01:25Z

I'm on ubuntu 22.04LTS with GCC, and I can not reproduce any warning/error you mentioned.
As result, I encapsulated every element inside the stable_mean so that misunderstandings by the compiler now are eliminated. I hope this solves the error.

PS: Just to be certain I performed the same change for all platforms.

christophe-murphy · 2025-08-07T18:46:24Z

This is happening when NVCC is compiling the stable_mean CUDA device kernel. Which version of CUDA are you using? Could you try with 12.9

willyborn · 2025-08-08T18:07:01Z

I am using CUDA 12.9.86 on ubuntu 24.04.3 LTS kernel 6.14.
I have a GTX1080 which is on architecture 6.1 (stores halves, calculates in float).

As soon as I included all major architectures to the nvcc options, I got the error also, which is now also fixed and committed.

sum on CPU now has the same precision as other platforms

058db8a

willyborn force-pushed the subArray_mean branch from b74f28f to b449cbf Compare August 6, 2025 18:58

Fixes to mean, on all platforms

4a8dca4

willyborn force-pushed the subArray_mean branch from b449cbf to 4a8dca4 Compare August 8, 2025 18:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fixes to mean and better precision on sum (cpu) #3687

Fixes to mean and better precision on sum (cpu) #3687

Uh oh!

willyborn commented Aug 4, 2025

Uh oh!

christophe-murphy commented Aug 4, 2025

Uh oh!

willyborn commented Aug 6, 2025

Uh oh!

christophe-murphy commented Aug 7, 2025

Uh oh!

willyborn commented Aug 8, 2025 •

edited

Loading

Uh oh!

Uh oh!

Fixes to mean and better precision on sum (cpu) #3687

Are you sure you want to change the base?

Fixes to mean and better precision on sum (cpu) #3687

Uh oh!

Conversation

willyborn commented Aug 4, 2025

Description

Changes to Users

Checklist

Uh oh!

christophe-murphy commented Aug 4, 2025

Uh oh!

willyborn commented Aug 6, 2025

Uh oh!

christophe-murphy commented Aug 7, 2025

Uh oh!

willyborn commented Aug 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

willyborn commented Aug 8, 2025 •

edited

Loading