Skip to content

Further optimize DNN for RISC-V Vector. #21086

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Dec 10, 2021
Merged

Further optimize DNN for RISC-V Vector. #21086

merged 5 commits into from
Dec 10, 2021

Conversation

hanliutong
Copy link
Contributor

This patch is going to further optimize DNN for RVV based on my GSoC work. The previous version is #20521.

There are 3 changes in this patch.

  1. Using vsetvl instead of a branch to handle vector tail (The last few elements of each row, which can not fill the entire vector register).

    I wrote an example on Godbolt about the different between using vsetvl and using if to show that use of vsetvl eliminates conditional jumps and just introduce a statement (sub).

  2. Unify the name of variables, which is about vl

    The variable naming in each function before is independent and unfriendly to readers. So I modified the variable name about vl with the same rule. For now, in all 4 functions:

    • All the following variables are used for vl parameters in intrinsic, but different names have different meanings:
      vlm<LMUL> : The maximum value that vl can be set for a certain LMUL. It is a constant value.
      vl : The number of elements processed in each inner loop, which will be used to process tail in the final loop.
      unroll_tail : The number of elements processed in each outer loop, also used to process tail in the final loop, but this tail is caused by loop unrolling

    • And there are new parameters intrudced by CHANGE 1 called avl, which represents the number of unprocessed elements, and used as the parameter of vsetvl.

  3. Update the way function fastConv handles the matrix tail (The last few rows of the matrix, usually caused by loop unrolling, the vl for matrix tail is called unroll_tail in CHANGE 2).

    In previous version, I use both vl and mask for the matrix tail to handle the different sizes of the blocksize and here is the discussion at the time. However, mask usually takes a lot of costs and I find a new way to only use vl to handle that. With that, no mask, even no additional branch is needed.

I have already tested this patch on QEMU, the minimal DNN test data set show the same result on the patch and on the master branch:

[==========] 1687 tests from 77 test cases ran. (73103 ms total)
[  PASSED  ] 1684 tests.
[  FAILED  ] 3 tests, listed below:
[  FAILED  ] Test_ONNX_layers.Elementwise_not/0, where GetParam() = OCV/CPU
[  FAILED  ] Test_ONNX_layers.Compare/0, where GetParam() = OCV/CPU
[  FAILED  ] Test_ONNX_layers.CompareSameDims/0, where GetParam() = OCV/CPU

 3 FAILED TESTS

Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

  • I agree to contribute to the project under Apache 2 License.
  • To the best of my knowledge, the proposed patch is not based on a code under GPL or other license that is incompatible with OpenCV
  • The PR is proposed to proper branch
  • There is reference to original bug report and related work
  • There is accuracy test, performance test and test data in opencv_extra repository, if applicable
    Patch to opencv_extra has the same branch name.
  • The feature is well documented and sample code can be built with the project CMake

@asmorkalov asmorkalov self-requested a review November 19, 2021 08:34
@hanliutong
Copy link
Contributor Author

hanliutong commented Nov 22, 2021

Full test cases show that there is a bug in fastDepthwiseConv. I'm working on it.
Fixed by 16682b7

@alalek alalek merged commit 1599f9f into opencv:4.x Dec 10, 2021
@hanliutong hanliutong deleted the rvv-dnn branch December 13, 2021 01:44
@alalek alalek mentioned this pull request Dec 30, 2021
@alalek alalek mentioned this pull request Feb 22, 2022
a-sajjad72 pushed a commit to a-sajjad72/opencv that referenced this pull request Mar 30, 2023
Further optimize DNN for RISC-V Vector.

* Optimize DNN on RVV by using vsetvl.

* Rename vl.

* Update fastConv by using setvl instead of mask.

* Fix fastDepthwiseConv
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants