Skip to content

GAPI Fluid: SIMD Multiply kernel. #21024

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Nov 17, 2021

Conversation

anna-khakimova
Copy link
Member

@anna-khakimova anna-khakimova commented Nov 9, 2021

SIMD optimization for Fluid Multiply kernel.

Performance report:

FluidMulSIMD.xlsx

force_builders=Linux AVX2,Custom,Custom Win,Custom Mac
build_gapi_standalone:Linux x64=ade-0.1.1f
build_gapi_standalone:Win64=ade-0.1.1f
Xbuild_gapi_standalone:Mac=ade-0.1.1f
build_gapi_standalone:Linux x64 Debug=ade-0.1.1f

build_image:Custom=centos:7
buildworker:Custom=linux-1
build_gapi_standalone:Custom=ade-0.1.1f

Xbuild_image:Custom=ubuntu-openvino-2021.3.0:20.04
build_image:Custom Win=openvino-2021.4.1
build_image:Custom Mac=openvino-2021.2.0

buildworker:Custom Win=windows-3

test_modules:Custom=gapi,python2,python3,java
test_modules:Custom Win=gapi,python2,python3,java
test_modules:Custom Mac=gapi,python2,python3,java

buildworker:Custom=linux-1
# disabled due high memory usage: test_opencl:Custom=ON
Xtest_opencl:Custom=OFF
Xtest_bigdata:Custom=1
Xtest_filter:Custom=*

CPU_BASELINE:Custom Win=AVX512_SKX
CPU_BASELINE:Custom=SSE4_2

{
// like OpenCV: returns 0, if y=0
// like OpenCV: returns 0, if DST type=uchar/short/ushort and divider(y)=0
auto result = y? scale * x / y: 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can SCR2 be float?

Copy link
Member Author

@anna-khakimova anna-khakimova Nov 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it can be. And next overload handles this case.

BINARY_(uchar, uchar, uchar, run_arithm, dst, src1, src2, ARITHM_MULTIPLY, scale);
BINARY_(uchar, ushort, ushort, run_arithm, dst, src1, src2, ARITHM_MULTIPLY, scale);
BINARY_(uchar, short, short, run_arithm, dst, src1, src2, ARITHM_MULTIPLY, scale);
BINARY_(uchar, float, float, run_arithm, dst, src1, src2, ARITHM_MULTIPLY, scale);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it supposed to have SRC1 & SRC2 for the same type (i mean if it falls into the div/mul with SFINAE with SRC1 & SRC2 then do we need to satisfy static_check(std::is_same<SRC1, SRC2>))?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check is already exist. Please take a look here

namespace gapi {
namespace fluid {

#define DIV_SIMD(SRC, DST) \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like order is different with BINARY_
https://github.com/opencv/opencv/pull/21024/files#diff-fe0ac6b5a07c1bec59c3756100eb186c9117a82dd1d951ef6997b8423a89943dR787

What do you think to align: DST <-->SRC according to BINARY_ order or vise versa?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two first arguments of div_simd() have SRC type. Third argument has DST type. So for MACRO I should mention SRC type first and second should be DST.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BYNARY_ macro is working good, so in the bounds of this task, I can't change API of this MACRO. Otherwise I have to change its invocation for all kernels that use it.

Copy link
Contributor

@sivanov-work sivanov-work Nov 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your point.
But what do you think to align argument positional agreement in newly added functions for mul/div in accommodation with the old approach with BINARY_?

UPDATED: Or it will affect template type auto-deduction?

CV_ALWAYS_INLINE
typename std::enable_if<(std::is_same<SRC, short>::value && std::is_same<DST, ushort>::value) ||
(std::is_same<SRC, ushort>::value && std::is_same<DST, ushort>::value) ||
(std::is_same<SRC, ushort>::value && std::is_same<DST, short>::value), int>::type
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO: enable_if is good for simple condition check otherwise it confused
but as I noticed we need to pass type into v_load_f32 and others function, so we need to know actual type here...

IMHO again:
To avoid eyeballs scattering when reading this code i think it is possible to introduce MACRO at least

#define SRC_DST_SHORT_USHORT (std::is_same<SRC, >...)
#define DST_SHORT_USHORT (std::is_same ...)

...
template<...>
CV_ALWAYS_INLINE 
typename std::enable_if<SRC_DST_SHORT_USHORT, int >::type 
div_hal(...)
...
template<...>
CV_ALWAYS_INLINE 
typename std::enable_if<SRC_DST_SHORT_USHORT, int >::type 
mul_hal(...)

Feel free to ignore it

{
#if CV_SIMD
x = div_simd(in1, in2, out, length, scale);
#endif
for (; x < length; ++x)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there are no loop-unrolling MACRO in opencv?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense here?

@dmatveev dmatveev self-assigned this Nov 15, 2021
@dmatveev dmatveev added this to the 4.5.5 milestone Nov 15, 2021
Copy link
Contributor

@dmatveev dmatveev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 thanks!

@alalek
Copy link
Member

alalek commented Nov 15, 2021

Need to rebase after "Div" PR merge (git rebase -i upstream/4.x and drop "div" commits).

@anna-khakimova
Copy link
Member Author

Need to rebase after "Div" PR merge (git rebase -i upstream/4.x and drop "div" commits).

@alalek Ok. Done. Please take a look.

@opencv-pushbot opencv-pushbot merged commit 4b6047e into opencv:4.x Nov 17, 2021
@alalek alalek mentioned this pull request Dec 30, 2021
@alalek alalek mentioned this pull request Feb 22, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants