Add support for v_sin and v_cos (Sine and Cosine) #25892

WanliZhong · 2024-07-09T18:23:59Z

This PR aims to implement v_sincos(v_float16 x), v_sincos(v_float32 x) and v_sincos(v_float64 x).
Merged after #25891 and #26023

NOTE:
Also, the patch changes already added v_exp, v_log and v_erf to pass parameters by reference instead of by value, to match API of other universal intrinsics.

TODO:

double and half float precision
tests for them
doc to explain the implementation

Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

I agree to contribute to the project under Apache 2 License.
To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
The PR is proposed to the proper branch
There is a reference to the original bug report and related work
There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
The feature is well documented and sample code can be built with the project CMake

asmorkalov · 2024-10-07T05:21:55Z

modules/core/include/opencv2/core/hal/intrin_math.hpp

@@ -405,6 +405,248 @@ inline _TpVec64F v_log_default_64f(const _TpVec64F &x) {
 }
 //! @}

+//! @name Sine and Cosine
+//! @{
+template<typename _TpVec16F, typename _TpVec16S>


It'll be great to add reference to the algorithm source/description and m.b. how the coefficients are generated.

I can't find the original algorithm, the approximate implementations are the same and they refer to (it's already existed in the intrin_math.hpp

/* Universal Intrinsics implementation of sin, cos, exp and log Inspired by Intel Approximate Math library, and based on the corresponding algorithms of the cephes math library */ /* Copyright (C) 2010,2011 RJVB - extensions */ /* Copyright (C) 2011 Julien Pommier

The explanation is my own interpretation. The basic idea involves leveraging periodicity and trigonometric identities to scale the input, followed by the use of a Taylor series for calculation. The coefficients are nearly identical to those in the Taylor series, with some adjustments.

asmorkalov

👍

Add support for v_sin and v_cos (Sine and Cosine) opencv#25892 This PR aims to implement `v_sincos(v_float16 x)`, `v_sincos(v_float32 x)` and `v_sincos(v_float64 x)`. Merged after opencv#25891 and opencv#26023 **NOTE:** Also, the patch changes already added `v_exp`, `v_log` and `v_erf` to pass parameters by reference instead of by value, to match API of other universal intrinsics. TODO: - [x] double and half float precision - [x] tests for them - [x] doc to explain the implementation ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [ ] There is a reference to the original bug report and related work - [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [ ] The feature is well documented and sample code can be built with the project CMake

[HAL RVV] unify and impl polar_to_cart | add perf test #26999 ### Summary 1. Implement through the existing `cv_hal_polarToCart32f` and `cv_hal_polarToCart64f` interfaces. 2. Add `polarToCart` performance tests 3. Make `cv::polarToCart` use CALL_HAL in the same way as `cv::cartToPolar` 4. To achieve the 3rd point, the original implementation was moved, and some modifications were made. Tested through: ```sh opencv_test_core --gtest_filter="*PolarToCart*:*Core_CartPolar_reverse*" opencv_perf_core --gtest_filter="*PolarToCart*" --perf_min_samples=300 --perf_force_samples=300 ``` ### HAL performance test ***UPDATE***: Current implementation is no more depending on vlen. **NOTE**: Due to the 4th point in the summary above, the `scalar` and `ui` test is based on the modified code of this PR. The impact of this patch on `scalar` and `ui` is evaluated in the next section, `Effect of Point 4`. Vlen 256 (Muse Pi): ``` Name of Test scalar ui rvv ui rvv vs vs scalar scalar (x-factor) (x-factor) PolarToCart::PolarToCartFixture::(127x61, 32FC1) 0.315 0.110 0.034 2.85 9.34 PolarToCart::PolarToCartFixture::(127x61, 64FC1) 0.423 0.163 0.045 2.59 9.34 PolarToCart::PolarToCartFixture::(640x480, 32FC1) 13.695 4.325 1.278 3.17 10.71 PolarToCart::PolarToCartFixture::(640x480, 64FC1) 17.719 7.118 2.105 2.49 8.42 PolarToCart::PolarToCartFixture::(1280x720, 32FC1) 40.678 13.114 3.977 3.10 10.23 PolarToCart::PolarToCartFixture::(1280x720, 64FC1) 53.124 21.298 6.519 2.49 8.15 PolarToCart::PolarToCartFixture::(1920x1080, 32FC1) 95.158 29.465 8.894 3.23 10.70 PolarToCart::PolarToCartFixture::(1920x1080, 64FC1) 119.262 47.743 14.129 2.50 8.44 ``` ### Effect of Point 4 To make `cv::polarToCart` behave the same as `cv::cartToPolar`, the implementation detail of the former has been moved to the latter's location (from `mathfuncs.cpp` to `mathfuncs_core.simd.hpp`). #### Reason for Changes: This function works as follows: $y = \text{mag} \times \sin(\text{angle})$ and $x = \text{mag} \times \cos(\text{angle})$. The original implementation first calculates the values of $\sin$ and $\cos$, storing the results in the output buffers $x$ and $y$, and then multiplies the result by $\text{mag}$. However, when the function is used as an in-place operation (one of the output buffers is also an input buffer), the original implementation allocates an extra buffer to store the $\sin$ and $\cos$ values in case the $\text{mag}$ value gets overwritten. This extra buffer allocation prevents `cv::polarToCart` from functioning in the same way as `cv::cartToPolar`. Therefore, the multiplication is now performed immediately without storing intermediate values. Since the original implementation also had AVX2 optimizations, I have applied the same optimizations to the AVX2 version of this implementation. ***UPDATE***: UI use v_sincos from #25892 now. The original implementation has AVX2 optimizations but is slower much than current UI so it's removed, and AVX2 perf test is below. Scalar implementation isn't changed because it's faster than using UI's method. #### Test Result `scalar` and `ui` test is done on Muse PI, and AVX2 test is done on Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz. `scalar` test: ``` Name of Test orig pr pr vs orig (x-factor) PolarToCart::PolarToCartFixture::(127x61, 32FC1) 0.333 0.294 1.13 PolarToCart::PolarToCartFixture::(127x61, 64FC1) 0.385 0.403 0.96 PolarToCart::PolarToCartFixture::(640x480, 32FC1) 14.749 12.343 1.19 PolarToCart::PolarToCartFixture::(640x480, 64FC1) 19.419 16.743 1.16 PolarToCart::PolarToCartFixture::(1280x720, 32FC1) 44.155 37.822 1.17 PolarToCart::PolarToCartFixture::(1280x720, 64FC1) 62.108 50.358 1.23 PolarToCart::PolarToCartFixture::(1920x1080, 32FC1) 99.011 85.769 1.15 PolarToCart::PolarToCartFixture::(1920x1080, 64FC1) 127.740 112.874 1.13 ``` `ui` test: ``` Name of Test orig pr pr vs orig (x-factor) PolarToCart::PolarToCartFixture::(127x61, 32FC1) 0.306 0.110 2.77 PolarToCart::PolarToCartFixture::(127x61, 64FC1) 0.455 0.163 2.79 PolarToCart::PolarToCartFixture::(640x480, 32FC1) 13.381 4.325 3.09 PolarToCart::PolarToCartFixture::(640x480, 64FC1) 21.851 7.118 3.07 PolarToCart::PolarToCartFixture::(1280x720, 32FC1) 39.975 13.114 3.05 PolarToCart::PolarToCartFixture::(1280x720, 64FC1) 67.006 21.298 3.15 PolarToCart::PolarToCartFixture::(1920x1080, 32FC1) 90.362 29.465 3.07 PolarToCart::PolarToCartFixture::(1920x1080, 64FC1) 129.637 47.743 2.72 ``` AVX2 test: ``` Name of Test orig pr pr vs orig (x-factor) PolarToCart::PolarToCartFixture::(127x61, 32FC1) 0.019 0.009 2.11 PolarToCart::PolarToCartFixture::(127x61, 64FC1) 0.022 0.013 1.74 PolarToCart::PolarToCartFixture::(640x480, 32FC1) 0.788 0.355 2.22 PolarToCart::PolarToCartFixture::(640x480, 64FC1) 1.102 0.618 1.78 PolarToCart::PolarToCartFixture::(1280x720, 32FC1) 2.383 1.042 2.29 PolarToCart::PolarToCartFixture::(1280x720, 64FC1) 3.758 2.316 1.62 PolarToCart::PolarToCartFixture::(1920x1080, 32FC1) 5.577 2.559 2.18 PolarToCart::PolarToCartFixture::(1920x1080, 64FC1) 9.710 6.424 1.51 ``` A slight performance loss occurs because the check for whether $mag$ is nullptr is performed with every calculation, instead of being done once per batch. This is to reuse current `SinCos_32f` function. ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [ ] The PR is proposed to the proper branch - [ ] There is a reference to the original bug report and related work - [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [ ] The feature is well documented and sample code can be built with the project CMake

WanliZhong added this to the 4.11.0 milestone Jul 9, 2024

WanliZhong added bug category: core optimization feature and removed bug category: core labels Jul 9, 2024

WanliZhong force-pushed the v_sincos branch from 55a2265 to e6294e6 Compare August 12, 2024 07:49

WanliZhong mentioned this pull request Aug 13, 2024

New CPU HAL for OpenCV 5.0 #25019

Open

WanliZhong added 3 commits October 5, 2024 01:30

Add support for v_sin and v_cos (Sine and Cosine)

09156c6

solve the conflicts

791942e

pass parameters by reference instead of by value

47ec0aa

WanliZhong force-pushed the v_sincos branch from 8f1162f to 47ec0aa Compare October 5, 2024 14:27

WanliZhong added 3 commits October 5, 2024 22:32

remove some blank spaces

12ae64e

use lf for double value in output

fade4c9

fix the bug in fp16

8f6b45a

WanliZhong marked this pull request as ready for review October 5, 2024 15:41

WanliZhong requested review from asmorkalov and vpisarev October 5, 2024 15:41

refine test

7b4192b

asmorkalov reviewed Oct 7, 2024

View reviewed changes

asmorkalov self-assigned this Oct 10, 2024

asmorkalov approved these changes Oct 10, 2024

View reviewed changes

asmorkalov merged commit 687e37e into opencv:4.x Oct 10, 2024
29 of 30 checks passed

asmorkalov mentioned this pull request Oct 23, 2024

5.x merge 4.x #26358

Merged

fengyuentau mentioned this pull request Mar 10, 2025

[HAL RVV] unify and impl polar_to_cart | add perf test #26999

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Add support for v_sin and v_cos (Sine and Cosine) #25892

Add support for v_sin and v_cos (Sine and Cosine) #25892

Uh oh!

WanliZhong commented Jul 9, 2024 •

edited by vpisarev

Loading

Uh oh!

asmorkalov Oct 7, 2024

Uh oh!

WanliZhong Oct 7, 2024 •

edited

Loading

Uh oh!

asmorkalov left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Add support for v_sin and v_cos (Sine and Cosine) #25892

Add support for v_sin and v_cos (Sine and Cosine) #25892

Uh oh!

Conversation

WanliZhong commented Jul 9, 2024 • edited by vpisarev Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request Readiness Checklist

Uh oh!

asmorkalov Oct 7, 2024

Choose a reason for hiding this comment

Uh oh!

WanliZhong Oct 7, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

asmorkalov left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

WanliZhong commented Jul 9, 2024 •

edited by vpisarev

Loading

WanliZhong Oct 7, 2024 •

edited

Loading