Further optimize DNN for RISC-V Vector. #21086

hanliutong · 2021-11-19T07:54:39Z

This patch is going to further optimize DNN for RVV based on my GSoC work. The previous version is #20521.

There are 3 changes in this patch.

Using vsetvl instead of a branch to handle vector tail (The last few elements of each row, which can not fill the entire vector register).

I wrote an example on Godbolt about the different between using vsetvl and using if to show that use of vsetvl eliminates conditional jumps and just introduce a statement (sub).
Unify the name of variables, which is about vl

The variable naming in each function before is independent and unfriendly to readers. So I modified the variable name about vl with the same rule. For now, in all 4 functions:
- All the following variables are used for vl parameters in intrinsic, but different names have different meanings:
  vlm<LMUL> : The maximum value that vl can be set for a certain LMUL. It is a constant value.
  vl : The number of elements processed in each inner loop, which will be used to process tail in the final loop.
  unroll_tail : The number of elements processed in each outer loop, also used to process tail in the final loop, but this tail is caused by loop unrolling
- And there are new parameters intrudced by CHANGE 1 called avl, which represents the number of unprocessed elements, and used as the parameter of vsetvl.
Update the way function fastConv handles the matrix tail (The last few rows of the matrix, usually caused by loop unrolling, the vl for matrix tail is called unroll_tail in CHANGE 2).

In previous version, I use both vl and mask for the matrix tail to handle the different sizes of the blocksize and here is the discussion at the time. However, mask usually takes a lot of costs and I find a new way to only use vl to handle that. With that, no mask, even no additional branch is needed.

I have already tested this patch on QEMU, the minimal DNN test data set show the same result on the patch and on the master branch:

[==========] 1687 tests from 77 test cases ran. (73103 ms total)
[  PASSED  ] 1684 tests.
[  FAILED  ] 3 tests, listed below:
[  FAILED  ] Test_ONNX_layers.Elementwise_not/0, where GetParam() = OCV/CPU
[  FAILED  ] Test_ONNX_layers.Compare/0, where GetParam() = OCV/CPU
[  FAILED  ] Test_ONNX_layers.CompareSameDims/0, where GetParam() = OCV/CPU

 3 FAILED TESTS

Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

I agree to contribute to the project under Apache 2 License.
To the best of my knowledge, the proposed patch is not based on a code under GPL or other license that is incompatible with OpenCV
The PR is proposed to proper branch
There is reference to original bug report and related work
There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
The feature is well documented and sample code can be built with the project CMake

hanliutong · 2021-11-22T07:17:45Z

~~Full test cases show that there is a bug in fastDepthwiseConv. I'm working on it.~~
Fixed by 16682b7

Further optimize DNN for RISC-V Vector. * Optimize DNN on RVV by using vsetvl. * Rename vl. * Update fastConv by using setvl instead of mask. * Fix fastDepthwiseConv

hanliutong added 3 commits November 18, 2021 22:32

Optimize DNN on RVV by using vsetvl.

652f345

Rename vl.

625f46f

Update fastConv by using setvl instead of mask.

242d526

asmorkalov self-requested a review November 19, 2021 08:34

hanliutong added 2 commits November 22, 2021 16:07

Fix fastDepthwiseConv

16682b7

Merge branch '4.x' into rvv-dnn

6d7c703

asmorkalov approved these changes Dec 10, 2021

View reviewed changes

alalek assigned asmorkalov Dec 10, 2021

alalek merged commit 1599f9f into opencv:4.x Dec 10, 2021

hanliutong deleted the rvv-dnn branch December 13, 2021 01:44

alalek mentioned this pull request Dec 30, 2021

(5.x) Merge 4.x #21371

Merged

alalek mentioned this pull request Feb 22, 2022

(5.x) Merge 4.x #21651

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Further optimize DNN for RISC-V Vector. #21086

Further optimize DNN for RISC-V Vector. #21086

Uh oh!

hanliutong commented Nov 19, 2021

Uh oh!

hanliutong commented Nov 22, 2021 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Further optimize DNN for RISC-V Vector. #21086

Further optimize DNN for RISC-V Vector. #21086

Uh oh!

Conversation

hanliutong commented Nov 19, 2021

Pull Request Readiness Checklist

Uh oh!

hanliutong commented Nov 22, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

hanliutong commented Nov 22, 2021 •

edited

Loading