CARVIEW |
Navigation Menu
-
-
Notifications
You must be signed in to change notification settings - Fork 56.2k
Optimize some function with lasx. #23929
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I accidentally deleted the last pr. I resubmitted this and made some changes. |
Test results are packed in the zip file above. Please check @asmorkalov . |
@fengyuentau Accuracy tests look good. Could you run performance test before the patch and after the patch and check regressions.
|
Please do not merge this PR yet, I will submit an optimization patch for lsx soon. |
@asmorkalov I submitted two more patches. The first patch fixes a bug in the opencv_perf_gapi test, but there are some questions. This manual of "https://linux.die.net/man/3/lrintf" describes that If x is a NaN or an infinity, or the rounded value is too large to be stored in a long (long long in the case of the ll* functions) then a domain error occurs, and the return value is unspecified. However, a specific reference value is given in the test case. |
cmake/checks/cpu_lsx.cpp
Outdated
#if defined(__loongarch_sx) | ||
# include <lsxintrin.h> | ||
# define CV_LSX 1 | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is build check. No need to introduce if defined
guards. In case if the check build fails (e.g. no include, no function, etc) the check fails and variable is set to false in CMake script.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your review. I have remove it.
@@ -390,9 +391,12 @@ elseif(RISCV) | |||
set(CPU_BASELINE "DETECT" CACHE STRING "${HELP_CPU_BASELINE}") | |||
|
|||
elseif(LOONGARCH64) | |||
ocv_update(CPU_LASX_TEST_FILE "${OpenCV_SOURCE_DIR}/cmake/checks/cpu_lsx.cpp") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CPU_LSX_TEST_FILE?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your review. This is a spelling error and I have corrected it.
ocv_update(CPU_LASX_FLAGS_ON "-mlasx") | ||
set(CPU_BASELINE "LSX" CACHE STRING "${HELP_CPU_BASELINE}") | ||
set(CPU_BASELINE "LASX" CACHE STRING "${HELP_CPU_BASELINE}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't make sense to set the same variable twice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your review. This is indeed a problem that will cause only lsx to be turned on and lasx to not be turned on. I will find a suitable way to handle it, and then modify it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed the way to enable LASX by default by determining the CPU_BASELINE_DISABLE value.
08f90ca
to
555cdff
Compare
Hello @CNClareChen , let me know if this patch is ready for testing. |
Yes, this patch can be tested now. Thanks! |
Attachment result is tested on Loongson 3A5000 (2.5GHz) . @fengyuentau @asmorkalov @CNClareChen |
Sorry for the late response. I will test the patch on my side tomorrow. |
Results of accuracy and performance are attached here: logs-555cdff.zip. Tests are done on our Loongson server with dual 3C5000LL CPUs, which has 32 cores @ 2.2GHz. @asmorkalov Please take a look. |
@asmorkalov Hi, is there any additional information required to merge the pr ? |
@fengyuentau Thanks a lot for the benchmark!
Example of volatile behavior:
|
@asmorkalov Thanks for your reply. In the scecond patch I submitted, in order to fix the failure of the functional test when the input data is infinite, some instructions were added to determine whether the input data is infinite, which caused the ROUND opreation performance to decrease. This is unavoiable. |
This patch optimizes some lasx functions and reduces the runtime of opencv_test_core from 662,238ms to 633603ms on the 3A5000 platform. Signed-off-by: Hao Chen <chenhao@loongson.cn>
This patch optimizes some lasx functions and reduces the runtime of opencv_test_core from 633603ms to 630280ms on the 3A5000 platform. Signed-off-by: Hao Chen <chenhao@loongson.cn>
Some failed tests are observed, could you find out why? IIRC, it is related to ffmpeg?
Also attaching performance logs logs-58421e5.zip. @asmorkalov Could you give comments on this? |
Thinks for your reply! The highgui test requires creating a window and drawing graphics on it. So do not use SSH to directly access the testing machine for testing, Instead, use "ssh -X" to remotely access the testing machine for testing. |
@asmorkalov Do you have any questions or suggestions about this PR? |
@CNClareChen, we observe some serious performance regressions with your patch applied. We found that one of the reason (probably the only reason) is that you inserted some extra checks into cvRound() function. Please, fix those regressions (remove checks) and your patch can be merged then |
@vpisarev Thanks for your reply! If the check is deleted, the opencv_perf_gapi case will fail. The reason is that when the input value is infinite, __lasx_xvftint_w_s retunrn INIT_MAX, while the test case requires cvRound to return a negative value. Do you mean that I should ignore this situation. |
|
@CNClareChen, I agree with @opencv-alalek. The test should be changed not to operate on numbers like INT_MAX. |
The performance of this patch on 3A5000 platform is as follows: before: opencv_perf_core 2503 tests from 58 test cases ran. (442405 ms total) after: opencv_perf_core 2503 tests from 58 test cases ran. (212597 ms total) Signed-off-by: Hao Chen <chenhao@loongson.cn>
@vpisarev @opencv-alalek Thanks for your reply! I have removed the checks in cvRound() function and resubmitted it. |
Attaching performance logs is tested on Loongson 3A5000 (2.5GHz) . @asmorkalov @vpisarev @opencv-alalek |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
@fengyuentau Do you have any objections? I propose to merge PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please merge 👍
* Optimize some function with lasx. Optimize some function with lasx. opencv#23929 This patch optimizes some lasx functions and reduces the runtime of opencv_test_core from 662,238ms to 633603ms on the 3A5000 platform. ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake
* Optimize some function with lasx. Optimize some function with lasx. opencv#23929 This patch optimizes some lasx functions and reduces the runtime of opencv_test_core from 662,238ms to 633603ms on the 3A5000 platform. ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake
* Optimize some function with lasx. Optimize some function with lasx. opencv#23929 This patch optimizes some lasx functions and reduces the runtime of opencv_test_core from 662,238ms to 633603ms on the 3A5000 platform. ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake
This patch optimizes some lasx functions and reduces the runtime of opencv_test_core from 662,238ms to 633603ms on the 3A5000 platform.
Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
Patch to opencv_extra has the same branch name.