Ragged reduction #2786

syurkevi · 2020-03-09T22:56:11Z

Addresses #2782 .

TODO:

tests
other ragged functions(min, sum, product, count, any, all)

include/af/algorithm.h

umar456 · 2020-03-10T16:28:04Z

include/af/algorithm.h

+
+       \note NaN values are ignored
+    */
+    AFAPI void max(array &val, array &idx, const array &in, const int dim, const array &ragged_len);


Opinion: I think ragged_len should come in after the in array.

umar456 · 2020-03-10T17:39:37Z

src/api/c/reduce.cpp

@@ -804,6 +820,68 @@ af_err af_imax(af_array *val, af_array *idx, const af_array in, const int dim) {
    return ireduce_common<af_max_t>(val, idx, in, dim);
 }

+template<af_op_t op>
+static af_err rreduce_common(af_array *val, af_array *idx, const af_array in,


Will an overload to reduce_common not work?

src/api/c/reduce.cpp

include/af/algorithm.h

src/api/c/reduce.cpp

umar456 · 2020-03-10T17:46:54Z

src/backend/cpu/ireduce.cpp

+
+template<af_op_t op, typename T>
+void rreduce(Array<T> &out, Array<uint> &loc, const Array<T> &in,
+             const int dim, const Array<uint> &rlen) {


Is this function still needed? It looks like you can combine this with ireduce.

WilliamTambellini · 2020-03-10T22:12:04Z

@syurkevi sounds good:
ArrayFire v3.7.0 (CUDA, 64-bit Linux, build 4206fa3)
Platform: CUDA Runtime 10.2, Driver: 440.33.01
[0] Quadro RTX 3000, 5935 MB, CUDA Compute 7.5
Benchmark max at f32
in max ragmax
4 16384 0.038 0.037
8 16384 0.034 0.036
16 16384 0.035 0.037
32 16384 0.037 0.040
64 16384 0.074 0.081
128 16384 0.148 0.159
256 16384 0.293 0.309

Though f16 not really faster than f32:
Device 0 isHalfAvailable ? yes
ArrayFire v3.7.0 (CUDA, 64-bit Linux, build 4206fa3)
Platform: CUDA Runtime 10.2, Driver: 440.33.01
[0] Quadro RTX 3000, 5935 MB, CUDA Compute 7.5
Benchmark max at f16
in max ragmax
4 16384 0.040 0.037
8 16384 0.035 0.037
16 16384 0.036 0.039
32 16384 0.038 0.040
64 16384 0.067 0.070
128 16384 0.143 0.152
256 16384 0.282 0.298

Could you please just add a minimalist bench like this one:
https://gist.github.com/WilliamTambellini/0f9309ab27b7076369f24c3217af4ffd
?

WilliamTambellini · 2020-03-15T21:54:59Z

@syurkevi could you please at least rebase for me to see what needs to be finished and advise accordingly ?

9prady9 · 2020-03-16T12:31:17Z

@syurkevi I took care of rebase from latest master. If you are adding more ragged functions and need to touch the ireduce kernel. You can find the kernels in the file src/backend/cuda/kernel/ireduce.cuh and the kernel wrappers inside src/backend/cuda/kernel/ireduce.hpp. The backend source file is src/backend/cuda/ireduce.cpp and not ireduce.cu.

If you face any issues while editing/adding new things to kernels, please ping me. I can guide you through the nvrtc related changes.

WilliamTambellini · 2020-03-16T20:16:24Z

Thank you @9prady9
@syurkevi I am only interested by the max reduction, as agreed with @umar456 when ordering the support. Could you please limit that PR to the max/raggedmax reduction ?
Kind regards

umar456 requested changes Mar 10, 2020

View reviewed changes

9prady9 force-pushed the ragged_max branch from 4206fa3 to 855c7fd Compare March 16, 2020 12:27

9prady9 force-pushed the ragged_max branch from bd94a36 to e4511fc Compare March 16, 2020 13:07

syurkevi force-pushed the ragged_max branch 2 times, most recently from e502b36 to 6599dd2 Compare March 21, 2020 08:10

umar456 added this to the 3.8.0 milestone Mar 24, 2020

umar456 changed the title ~~Ragged reduction [WIP]~~ Ragged reduction Mar 27, 2020

syurkevi force-pushed the ragged_max branch from 1dc39cd to 1ba35fd Compare March 31, 2020 17:10

syurkevi added 6 commits March 31, 2020 13:51

initial ragged max api and cuda implementation

0806080

move ragged lengths into single ireduce kernel implementation

43090d3

adds opencl, cpu ragged max to ireduce

d87e5bc

fix issue with cuda bounds for higher dimensions, adds range based tests

194259b

opencl kernel updates for higher dimensions

a65d502

check out of bounds access in lengths array

6b395e5

syurkevi force-pushed the ragged_max branch 2 times, most recently from aa0a856 to 2307ccf Compare March 31, 2020 21:57

fix incorrect nullptr for empty buffer in cl backend, clang-format

e604ee6

syurkevi force-pushed the ragged_max branch from 2307ccf to e604ee6 Compare March 31, 2020 22:02

syurkevi added 2 commits April 1, 2020 13:35

update api

b93c7ef

remove old tests

df80fb3

syurkevi force-pushed the ragged_max branch from 4046e6b to df80fb3 Compare April 6, 2020 22:15

umar456 approved these changes Apr 7, 2020

View reviewed changes

9prady9 merged commit 0a66851 into arrayfire:master Apr 7, 2020

umar456 mentioned this pull request Apr 8, 2020

Ragged max reduction #2782

Closed

WilliamTambellini mentioned this pull request Mar 24, 2023

[Perf] Bad perf for ragged max #3382

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Ragged reduction #2786

Ragged reduction #2786

Uh oh!

syurkevi commented Mar 9, 2020

Uh oh!

Uh oh!

Uh oh!

umar456 Mar 10, 2020

Uh oh!

umar456 Mar 10, 2020

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

umar456 Mar 10, 2020

Uh oh!

WilliamTambellini commented Mar 10, 2020 •

edited

Loading

Uh oh!

WilliamTambellini commented Mar 15, 2020

Uh oh!

9prady9 commented Mar 16, 2020

Uh oh!

WilliamTambellini commented Mar 16, 2020

Uh oh!

Uh oh!

Ragged reduction #2786

Ragged reduction #2786

Uh oh!

Conversation

syurkevi commented Mar 9, 2020

Uh oh!

Uh oh!

Uh oh!

umar456 Mar 10, 2020

Choose a reason for hiding this comment

Uh oh!

umar456 Mar 10, 2020

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

umar456 Mar 10, 2020

Choose a reason for hiding this comment

Uh oh!

WilliamTambellini commented Mar 10, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

WilliamTambellini commented Mar 15, 2020

Uh oh!

9prady9 commented Mar 16, 2020

Uh oh!

WilliamTambellini commented Mar 16, 2020

Uh oh!

Uh oh!

WilliamTambellini commented Mar 10, 2020 •

edited

Loading