CARVIEW |
Navigation Menu
-
-
Notifications
You must be signed in to change notification settings - Fork 56.2k
Enable SIMD_SCALABLE for exp and sqrt #26886
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
We have v_exp universal intrinsic introduced in #25881. I propose to integrated it to core first. |
Build with Syntacore Clang Run on self-hosted Disable OpenCL summary Run baseline filter from PR body
7af263c
to
af22e47
Compare
By changing
So the main question in algo part. Despite tests passed, |
regarding the algorithmic part: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sk1er52, to give you a credit for a done work, I recommend to merge this PR with v_exp
universal intrinsics enabled. Let's keep this PR or separate issue for the future discussion. Potential x2 performance improvement should be analyzed more precisely.
Please add a link to the article in PR description |
My perf results for Muse Pi v 30 (GCC 14.2):
|
Enable SIMD_SCALABLE for exp and sqrt opencv#26886 ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake ``` CPU - Banana Pi k1, compiler - clang 18.1.4 ``` ``` Geometric mean (ms) Name of Test baseline hal ui hal ui vs vs baseline baseline (x-factor) (x-factor) Exp::ExpFixture::(127x61, 32FC1) 0.358 -- 0.033 -- 10.70 Exp::ExpFixture::(640x480, 32FC1) 14.304 -- 1.167 -- 12.26 Exp::ExpFixture::(1280x720, 32FC1) 42.785 -- 3.538 -- 12.09 Exp::ExpFixture::(1920x1080, 32FC1) 96.206 -- 7.927 -- 12.14 Exp::ExpFixture::(127x61, 64FC1) 0.433 0.050 0.098 8.59 4.40 Exp::ExpFixture::(640x480, 64FC1) 17.315 1.935 3.813 8.95 4.54 Exp::ExpFixture::(1280x720, 64FC1) 52.181 5.877 11.519 8.88 4.53 Exp::ExpFixture::(1920x1080, 64FC1) 117.082 13.157 25.854 8.90 4.53 ``` Additionally, this PR brings Sqrt optimization with UI: ``` Geometric mean (ms) Name of Test baseline ui ui vs baseline (x-factor) Sqrt::SqrtFixture::(127x61, 5, false) 0.111 0.027 4.11 Sqrt::SqrtFixture::(127x61, 6, false) 0.149 0.053 2.82 Sqrt::SqrtFixture::(640x480, 5, false) 4.374 0.967 4.52 Sqrt::SqrtFixture::(640x480, 6, false) 5.885 2.046 2.88 Sqrt::SqrtFixture::(1280x720, 5, false) 12.960 2.915 4.45 Sqrt::SqrtFixture::(1280x720, 6, false) 17.648 6.107 2.89 Sqrt::SqrtFixture::(1920x1080, 5, false) 29.178 6.524 4.47 Sqrt::SqrtFixture::(1920x1080, 6, false) 39.709 13.670 2.90 ``` Reference Muller, J.-M. Elementary Functions: Algorithms and Implementation. 2nd ed. Boston: Birkhäuser, 2006. https://www.springer.com/gp/book/9780817643720
Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
Patch to opencv_extra has the same branch name.
Additionally, this PR brings Sqrt optimization with UI:
Reference
Muller, J.-M. Elementary Functions: Algorithms and Implementation. 2nd ed. Boston: Birkhäuser, 2006.
https://www.springer.com/gp/book/9780817643720