core: add ARM64 NEON support for cvRound in fast_math.hpp #27575

pratham-mcw · 2025-07-24T11:42:39Z

I agree to contribute to the project under Apache 2 License.
To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
The PR is proposed to the proper branch
There is a reference to the original bug report and related work
There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
The feature is well documented and sample code can be built with the project CMake
This PR adds ARM64 NEON intrinsics-based implementation for the cvRound function in fast_math.hpp for ARM64 targets (_MSC_VER && _M_ARM64).
Both float and double overloads now use NEON intrinsics (vcntn_s32_f32, vcvtn_s64_f64, etc.) for efficient and accurate rounding.
The implementation is consistent with platform-specific handling already present for SSE2 (x64) and does not affect non-ARM64 builds.

Performance Improvements :

The optimization significantly improves the performance of cvRound on Windows ARM64 targets.
The table below shows timing comparisons before and after the optimization:

core: add ARM64 NEON support for cvRound in fast_math.hpp

6efca65

asmorkalov added the category: core label Jul 24, 2025

asmorkalov self-requested a review July 25, 2025 08:47

asmorkalov self-assigned this Jul 28, 2025

asmorkalov added this to the 4.13.0 milestone Jul 28, 2025

asmorkalov added the optimization label Jul 28, 2025

asmorkalov approved these changes Jul 28, 2025

View reviewed changes

asmorkalov merged commit 8e20ec2 into opencv:4.x Jul 28, 2025
54 of 55 checks passed

asmorkalov mentioned this pull request Jul 30, 2025

5.x merge 4.x #27604

Merged

Provide feedback