CARVIEW |
Navigation Menu
-
-
Notifications
You must be signed in to change notification settings - Fork 56.2k
core: vectorize cv::normalize / cv::norm #26885
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Hm.. The patch introduces significant regression for L1 and L2 on ARMv8 (Tested with Jetson Orin):
|
@asmorkalov I do not have access to orin now, but can you apply the following changes and run the test instead of the OCL ones? diff --git a/modules/core/perf/perf_norm.cpp b/modules/core/perf/perf_norm.cpp
index 07f989f21c..e364192bad 100644
--- a/modules/core/perf/perf_norm.cpp
+++ b/modules/core/perf/perf_norm.cpp
@@ -14,7 +14,7 @@ typedef perf::TestBaseWithParam<Size_MatType_NormType_t> Size_MatType_NormType;
PERF_TEST_P(Size_MatType_NormType, norm,
testing::Combine(
testing::Values(TYPICAL_MAT_SIZES),
- testing::Values(TYPICAL_MAT_TYPES),
+ testing::Values(CV_8UC1, CV_8SC1, CV_16UC1, CV_16SC1, CV_32SC1, CV_32FC1, CV_64FC1),
testing::Values((int)NORM_INF, (int)NORM_L1, (int)NORM_L2)
)
) I also observed some performance regressions on my host with i7-12700k cpu. |
Found that if cpu baseline is sse, there are performance regressions, mostly on 8-bit, 16-bit kernels. If cpu baseline is set to AVX/AVX2, this PR makes sense with some performance improvement. |
The fix does not resole ARM regression: |
Regression on RISC-V for some reason: https://github.com/opencv/opencv/actions/runs/13302059428/job/37145128708?pr=26885#step:16:715
Tested on my SpaceMIT MUSE Pi (K1) with GCC, it was green. |
This comment was marked as outdated.
This comment was marked as outdated.
…cy issue only on ci
modules/core/src/norm.simd.hpp
Outdated
struct NormL2_SIMD<double, double> { | ||
double operator() (const double* src, int n) const { | ||
int j = 0; | ||
double s = 0.f; | ||
#if CV_RVV // This is introduced to workaround the accuracy issue on ci | ||
s = normL2_rvv(src, n, j); | ||
#else |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is introduced to workaround failed tests on ci of rvv node.
The RVV CI node is getting weird some accuracy issues on the normL2 with double.
But it works fine on real hardware. Could it be some qemu bugs?
modules/core/src/norm.simd.hpp
Outdated
struct NormL1_SIMD<double, double> { | ||
double operator() (const double* src, int n) const { | ||
int j = 0; | ||
double s = 0.f; | ||
#if CV_RVV // This is introduced to workaround the accuracy issue on ci | ||
s = normL1_rvv(src, n, j); | ||
#else |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is introduced to workaround failed tests on ci of rvv node.
The RVV CI node is getting weird some accuracy issues on the normL1 with double.
But it works fine on real hardware. Could it be some qemu bugs?
@asmorkalov Hello, I have fixed performance on Orin with |
@asmorkalov Let me know if there is any blocker of merging this PR. |
The last version is much better for ARMv8. Jetson Orin does not show regressions. |
@asmorkalov By the way, CI node "PR:4.x / Android-Test / BuildAndTest (pull_request)" seems to be broken. Should we disable this for now since you do not seem to have time to fix it? |
Jetson tk1 (armv7) result. There are several regressions for FP32. I do not think it's critical. |
core: vectorize cv::normalize / cv::norm opencv#26885 Checklist: | | normInf | normL1 | normL2 | | ---- | ------- | ------ | ------ | | bool | - | - | - | | 8u | √ | √ | √ | | 8s | √ | √ | √ | | 16u | √ | √ | √ | | 16s | √ | √ | √ | | 16f | - | - | - | | 16bf | - | - | - | | 32u | - | - | - | | 32s | √ | √ | √ | | 32f | √ | √ | √ | | 64u | - | - | - | | 64s | - | - | - | | 64f | √ | √ | √ | *: Vectorization of data type bool, 16f, 16bf, 32u, 64u and 64s needs to be done on 5.x. ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [ ] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [ ] The feature is well documented and sample code can be built with the project CMake
Checklist:
*: Vectorization of data type bool, 16f, 16bf, 32u, 64u and 64s needs to be done on 5.x.
Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
Patch to opencv_extra has the same branch name.