-
-
Notifications
You must be signed in to change notification settings - Fork 56.2k
imgproc: Bilateral filter performance improvement #27433
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: 4.x
Are you sure you want to change the base?
Conversation
|
b250df5
to
3a86990
Compare
|
46c1fd3
to
74a0ad3
Compare
- Use of additional buffer removed. - Performance improved by avoiding memory store and load.
74a0ad3
to
5198e0f
Compare
e8fc8e0
to
85a3b0e
Compare
|
@asmorkalov Please let me know if any changes are required, thanks. |
@fengyuentau @mshabunin could you take a look? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My perf stats:
i7-12700K
Name of Test i7-base i7-patch i7-patch
vs
i7-base
(x-factor)
BilateralFilter::TestBilateralFilter::(640x480, 3, CV_8UC1, 1) 0.217 0.109 2.00
BilateralFilter::TestBilateralFilter::(640x480, 3, CV_8UC1, 5) 0.577 0.081 7.14
BilateralFilter::TestBilateralFilter::(640x480, 3, CV_32FC1, 1) 0.281 0.289 0.97
BilateralFilter::TestBilateralFilter::(640x480, 3, CV_32FC1, 5) 0.318 0.322 0.99
BilateralFilter::TestBilateralFilter::(640x480, 3, CV_8UC3, 1) 0.211 0.226 0.93
BilateralFilter::TestBilateralFilter::(640x480, 3, CV_8UC3, 5) 0.386 0.238 1.62
BilateralFilter::TestBilateralFilter::(640x480, 3, CV_32FC3, 1) 0.691 0.724 0.95
BilateralFilter::TestBilateralFilter::(640x480, 3, CV_32FC3, 5) 0.679 0.718 0.95
BilateralFilter::TestBilateralFilter::(640x480, 5, CV_8UC1, 1) 0.538 0.168 3.21
BilateralFilter::TestBilateralFilter::(640x480, 5, CV_8UC1, 5) 1.540 0.166 9.25
BilateralFilter::TestBilateralFilter::(640x480, 5, CV_32FC1, 1) 0.492 0.520 0.95
BilateralFilter::TestBilateralFilter::(640x480, 5, CV_32FC1, 5) 0.578 0.611 0.95
BilateralFilter::TestBilateralFilter::(640x480, 5, CV_8UC3, 1) 0.408 0.436 0.94
BilateralFilter::TestBilateralFilter::(640x480, 5, CV_8UC3, 5) 0.952 0.421 2.26
BilateralFilter::TestBilateralFilter::(640x480, 5, CV_32FC3, 1) 1.021 1.053 0.97
BilateralFilter::TestBilateralFilter::(640x480, 5, CV_32FC3, 5) 1.023 1.058 0.97
BilateralFilter::TestBilateralFilter::(1920x1080, 3, CV_8UC1, 1) 1.669 0.538 3.10
BilateralFilter::TestBilateralFilter::(1920x1080, 3, CV_8UC1, 5) 4.356 0.533 8.17
BilateralFilter::TestBilateralFilter::(1920x1080, 3, CV_32FC1, 1) 2.249 2.219 1.01
BilateralFilter::TestBilateralFilter::(1920x1080, 3, CV_32FC1, 5) 2.518 2.437 1.03
BilateralFilter::TestBilateralFilter::(1920x1080, 3, CV_8UC3, 1) 1.087 1.078 1.01
BilateralFilter::TestBilateralFilter::(1920x1080, 3, CV_8UC3, 5) 2.457 1.075 2.29
BilateralFilter::TestBilateralFilter::(1920x1080, 3, CV_32FC3, 1) 7.366 8.181 0.90
BilateralFilter::TestBilateralFilter::(1920x1080, 3, CV_32FC3, 5) 7.395 8.180 0.90
BilateralFilter::TestBilateralFilter::(1920x1080, 5, CV_8UC1, 1) 4.873 1.112 4.38
BilateralFilter::TestBilateralFilter::(1920x1080, 5, CV_8UC1, 5) 9.479 1.171 8.09
BilateralFilter::TestBilateralFilter::(1920x1080, 5, CV_32FC1, 1) 5.369 5.276 1.02
BilateralFilter::TestBilateralFilter::(1920x1080, 5, CV_32FC1, 5) 6.962 6.835 1.02
BilateralFilter::TestBilateralFilter::(1920x1080, 5, CV_8UC3, 1) 2.013 2.022 1.00
BilateralFilter::TestBilateralFilter::(1920x1080, 5, CV_8UC3, 5) 6.663 2.034 3.28
BilateralFilter::TestBilateralFilter::(1920x1080, 5, CV_32FC3, 1) 9.325 9.236 1.01
BilateralFilter::TestBilateralFilter::(1920x1080, 5, CV_32FC3, 5) 9.812 8.985 1.09
m2:
Name of Test m2-base m2-patch m2-patch
vs
m2-base
(x-factor)
BilateralFilter::TestBilateralFilter::(640x480, 3, CV_8UC1, 1) 0.250 0.212 1.18
BilateralFilter::TestBilateralFilter::(640x480, 3, CV_8UC1, 5) 0.240 0.223 1.08
BilateralFilter::TestBilateralFilter::(640x480, 3, CV_32FC1, 1) 0.410 0.387 1.06
BilateralFilter::TestBilateralFilter::(640x480, 3, CV_32FC1, 5) 0.402 0.407 0.99
BilateralFilter::TestBilateralFilter::(640x480, 3, CV_8UC3, 1) 0.421 0.381 1.11
BilateralFilter::TestBilateralFilter::(640x480, 3, CV_8UC3, 5) 0.420 0.377 1.11
BilateralFilter::TestBilateralFilter::(640x480, 3, CV_32FC3, 1) 0.973 1.125 0.87
BilateralFilter::TestBilateralFilter::(640x480, 3, CV_32FC3, 5) 0.961 0.984 0.98
BilateralFilter::TestBilateralFilter::(640x480, 5, CV_8UC1, 1) 0.504 0.453 1.11
BilateralFilter::TestBilateralFilter::(640x480, 5, CV_8UC1, 5) 0.498 0.448 1.11
BilateralFilter::TestBilateralFilter::(640x480, 5, CV_32FC1, 1) 0.888 0.849 1.05
BilateralFilter::TestBilateralFilter::(640x480, 5, CV_32FC1, 5) 0.889 0.863 1.03
BilateralFilter::TestBilateralFilter::(640x480, 5, CV_8UC3, 1) 0.917 0.818 1.12
BilateralFilter::TestBilateralFilter::(640x480, 5, CV_8UC3, 5) 0.905 0.820 1.10
BilateralFilter::TestBilateralFilter::(640x480, 5, CV_32FC3, 1) 1.719 1.844 0.93
BilateralFilter::TestBilateralFilter::(640x480, 5, CV_32FC3, 5) 1.717 1.826 0.94
BilateralFilter::TestBilateralFilter::(1920x1080, 3, CV_8UC1, 1) 0.980 0.851 1.15
BilateralFilter::TestBilateralFilter::(1920x1080, 3, CV_8UC1, 5) 0.977 0.858 1.14
BilateralFilter::TestBilateralFilter::(1920x1080, 3, CV_32FC1, 1) 2.329 2.251 1.03
BilateralFilter::TestBilateralFilter::(1920x1080, 3, CV_32FC1, 5) 2.344 2.251 1.04
BilateralFilter::TestBilateralFilter::(1920x1080, 3, CV_8UC3, 1) 2.116 1.827 1.16
BilateralFilter::TestBilateralFilter::(1920x1080, 3, CV_8UC3, 5) 2.015 1.829 1.10
BilateralFilter::TestBilateralFilter::(1920x1080, 3, CV_32FC3, 1) 4.744 4.968 0.95
BilateralFilter::TestBilateralFilter::(1920x1080, 3, CV_32FC3, 5) 4.769 4.940 0.97
BilateralFilter::TestBilateralFilter::(1920x1080, 5, CV_8UC1, 1) 2.020 1.841 1.10
BilateralFilter::TestBilateralFilter::(1920x1080, 5, CV_8UC1, 5) 2.077 1.831 1.13
BilateralFilter::TestBilateralFilter::(1920x1080, 5, CV_32FC1, 1) 4.431 4.245 1.04
BilateralFilter::TestBilateralFilter::(1920x1080, 5, CV_32FC1, 5) 4.406 4.208 1.05
BilateralFilter::TestBilateralFilter::(1920x1080, 5, CV_8UC3, 1) 3.971 3.643 1.09
BilateralFilter::TestBilateralFilter::(1920x1080, 5, CV_8UC3, 5) 4.016 3.684 1.09
BilateralFilter::TestBilateralFilter::(1920x1080, 5, CV_32FC3, 1) 7.990 8.692 0.92
BilateralFilter::TestBilateralFilter::(1920x1080, 5, CV_32FC3, 5) 7.919 8.646 0.92
K1 (RISC-V):
GCC
Name of Test base-gcc patch-gcc patch-gcc
vs
base-gcc
(x-factor)
BilateralFilter::TestBilateralFilter::(640x480, 3, CV_8UC1, 1) 0.829 0.829 1.00
BilateralFilter::TestBilateralFilter::(640x480, 3, CV_8UC1, 5) 0.832 0.840 0.99
BilateralFilter::TestBilateralFilter::(640x480, 3, CV_32FC1, 1) 2.301 2.272 1.01
BilateralFilter::TestBilateralFilter::(640x480, 3, CV_32FC1, 5) 2.227 2.223 1.00
BilateralFilter::TestBilateralFilter::(640x480, 3, CV_8UC3, 1) 2.181 2.205 0.99
BilateralFilter::TestBilateralFilter::(640x480, 3, CV_8UC3, 5) 2.214 2.240 0.99
BilateralFilter::TestBilateralFilter::(640x480, 3, CV_32FC3, 1) 10.406 10.414 1.00
BilateralFilter::TestBilateralFilter::(640x480, 3, CV_32FC3, 5) 10.439 10.447 1.00
BilateralFilter::TestBilateralFilter::(640x480, 5, CV_8UC1, 1) 1.584 1.592 0.99
BilateralFilter::TestBilateralFilter::(640x480, 5, CV_8UC1, 5) 1.580 1.586 1.00
BilateralFilter::TestBilateralFilter::(640x480, 5, CV_32FC1, 1) 4.317 4.316 1.00
BilateralFilter::TestBilateralFilter::(640x480, 5, CV_32FC1, 5) 4.327 4.337 1.00
BilateralFilter::TestBilateralFilter::(640x480, 5, CV_8UC3, 1) 4.471 4.451 1.00
BilateralFilter::TestBilateralFilter::(640x480, 5, CV_8UC3, 5) 4.623 4.628 1.00
BilateralFilter::TestBilateralFilter::(640x480, 5, CV_32FC3, 1) 23.164 23.141 1.00
BilateralFilter::TestBilateralFilter::(640x480, 5, CV_32FC3, 5) 23.283 23.268 1.00
BilateralFilter::TestBilateralFilter::(1920x1080, 3, CV_8UC1, 1) 5.850 5.838 1.00
BilateralFilter::TestBilateralFilter::(1920x1080, 3, CV_8UC1, 5) 5.885 5.872 1.00
BilateralFilter::TestBilateralFilter::(1920x1080, 3, CV_32FC1, 1) 19.223 24.505 0.78
BilateralFilter::TestBilateralFilter::(1920x1080, 3, CV_32FC1, 5) 19.261 24.559 0.78
BilateralFilter::TestBilateralFilter::(1920x1080, 3, CV_8UC3, 1) 20.550 20.696 0.99
BilateralFilter::TestBilateralFilter::(1920x1080, 3, CV_8UC3, 5) 21.531 21.531 1.00
BilateralFilter::TestBilateralFilter::(1920x1080, 3, CV_32FC3, 1) 64.941 65.103 1.00
BilateralFilter::TestBilateralFilter::(1920x1080, 3, CV_32FC3, 5) 64.396 64.549 1.00
BilateralFilter::TestBilateralFilter::(1920x1080, 5, CV_8UC1, 1) 10.840 10.854 1.00
BilateralFilter::TestBilateralFilter::(1920x1080, 5, CV_8UC1, 5) 10.769 10.791 1.00
BilateralFilter::TestBilateralFilter::(1920x1080, 5, CV_32FC1, 1) 37.066 53.972 0.69
BilateralFilter::TestBilateralFilter::(1920x1080, 5, CV_32FC1, 5) 37.288 53.857 0.69
BilateralFilter::TestBilateralFilter::(1920x1080, 5, CV_8UC3, 1) 44.346 44.966 0.99
BilateralFilter::TestBilateralFilter::(1920x1080, 5, CV_8UC3, 5) 46.038 46.246 1.00
BilateralFilter::TestBilateralFilter::(1920x1080, 5, CV_32FC3, 1) 140.903 141.787 0.99
BilateralFilter::TestBilateralFilter::(1920x1080, 5, CV_32FC3, 5) 140.954 141.242 1.00
Name of Test base-clang patch-clang patch-clang
vs
base-clang
(x-factor)
BilateralFilter::TestBilateralFilter::(640x480, 3, CV_8UC1, 1) 0.794 0.789 1.01
BilateralFilter::TestBilateralFilter::(640x480, 3, CV_8UC1, 5) 0.789 0.794 0.99
BilateralFilter::TestBilateralFilter::(640x480, 3, CV_32FC1, 1) 2.114 2.158 0.98
BilateralFilter::TestBilateralFilter::(640x480, 3, CV_32FC1, 5) 2.155 2.138 1.01
BilateralFilter::TestBilateralFilter::(640x480, 3, CV_8UC3, 1) 2.008 2.022 0.99
BilateralFilter::TestBilateralFilter::(640x480, 3, CV_8UC3, 5) 2.012 2.106 0.96
BilateralFilter::TestBilateralFilter::(640x480, 3, CV_32FC3, 1) 10.111 10.110 1.00
BilateralFilter::TestBilateralFilter::(640x480, 3, CV_32FC3, 5) 10.098 10.177 0.99
BilateralFilter::TestBilateralFilter::(640x480, 5, CV_8UC1, 1) 1.479 1.474 1.00
BilateralFilter::TestBilateralFilter::(640x480, 5, CV_8UC1, 5) 1.489 1.477 1.01
BilateralFilter::TestBilateralFilter::(640x480, 5, CV_32FC1, 1) 4.053 4.026 1.01
BilateralFilter::TestBilateralFilter::(640x480, 5, CV_32FC1, 5) 4.079 4.054 1.01
BilateralFilter::TestBilateralFilter::(640x480, 5, CV_8UC3, 1) 3.998 3.989 1.00
BilateralFilter::TestBilateralFilter::(640x480, 5, CV_8UC3, 5) 4.003 4.228 0.95
BilateralFilter::TestBilateralFilter::(640x480, 5, CV_32FC3, 1) 22.711 22.615 1.00
BilateralFilter::TestBilateralFilter::(640x480, 5, CV_32FC3, 5) 22.657 22.750 1.00
BilateralFilter::TestBilateralFilter::(1920x1080, 3, CV_8UC1, 1) 5.492 5.486 1.00
BilateralFilter::TestBilateralFilter::(1920x1080, 3, CV_8UC1, 5) 5.490 5.509 1.00
BilateralFilter::TestBilateralFilter::(1920x1080, 3, CV_32FC1, 1) 21.069 23.356 0.90
BilateralFilter::TestBilateralFilter::(1920x1080, 3, CV_32FC1, 5) 20.874 23.308 0.90
BilateralFilter::TestBilateralFilter::(1920x1080, 3, CV_8UC3, 1) 19.265 19.432 0.99
BilateralFilter::TestBilateralFilter::(1920x1080, 3, CV_8UC3, 5) 19.286 19.260 1.00
BilateralFilter::TestBilateralFilter::(1920x1080, 3, CV_32FC3, 1) 62.823 62.852 1.00
BilateralFilter::TestBilateralFilter::(1920x1080, 3, CV_32FC3, 5) 62.446 62.767 0.99
BilateralFilter::TestBilateralFilter::(1920x1080, 5, CV_8UC1, 1) 10.134 10.115 1.00
BilateralFilter::TestBilateralFilter::(1920x1080, 5, CV_8UC1, 5) 10.091 10.074 1.00
BilateralFilter::TestBilateralFilter::(1920x1080, 5, CV_32FC1, 1) 41.929 51.640 0.81
BilateralFilter::TestBilateralFilter::(1920x1080, 5, CV_32FC1, 5) 42.109 51.780 0.81
BilateralFilter::TestBilateralFilter::(1920x1080, 5, CV_8UC3, 1) 41.756 41.321 1.01
BilateralFilter::TestBilateralFilter::(1920x1080, 5, CV_8UC3, 5) 41.532 41.543 1.00
BilateralFilter::TestBilateralFilter::(1920x1080, 5, CV_32FC3, 1) 138.652 139.128 1.00
BilateralFilter::TestBilateralFilter::(1920x1080, 5, CV_32FC3, 5) 138.680 139.114 1.00
Hm, Is IPP enabled in your Intel build? What is the build CPU baseline? |
Baseline is |
Added performance on riscv. |
@fengyuentau, @asmorkalov |
Hello @madanm3 Thanks a lot for the contribution! Looks like the ball is on my side. I made several experiments and get controversial results and I cannot reproduce speedup similar to yours. Please give me some time to debug the issue or prove my findings. |
@asmorkalov, The performance gains I observed on AMD Genoa differ significantly from the i7-12700K results. |
Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
Patch to opencv_extra has the same branch name.