CARVIEW |
Navigation Menu
-
-
Notifications
You must be signed in to change notification settings - Fork 56.2k
3rdparty: NDSRVP - Part 1.5: New Interfaces #25786
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
modules/imgproc/src/imgwarp.cpp
Outdated
@@ -2307,6 +2307,7 @@ class WarpAffineInvoker : | |||
opt_SSE4_1::WarpAffineInvoker_Blockline_SSE41(adelta + x, bdelta + x, xy, X0, Y0, bw); | |||
else | |||
#endif | |||
if( cv_hal_warpAffineBlocklineNN(adelta + x, bdelta + x, xy, X0, Y0, bw) != CV_HAL_ERROR_OK ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please extract the whole block to some cv::hal::
function. Define it here: https://github.com/opencv/opencv/blob/4.x/modules/imgproc/include/opencv2/imgproc/hal/hal.hpp Implementation can reside somewhere in this file (imgwarp.cpp).
This new function should first try external HAL function (using CALL_HAL macro), then try AVX, SSE, LASX, universal intrinsics, then fallback implementation.
Then cv::warpAffine
should call this new cv::hal::
function for CPU processing.
There are some functions implemented this way, e.g. cv::hal::normHamming
(
opencv/modules/core/src/norm.cpp
Lines 53 to 100 in 8d935e2
int normHamming(const uchar* a, int n, int cellSize) | |
{ | |
int output; | |
CALL_HAL_RET(normHamming8u, cv_hal_normHamming8u, output, a, n, cellSize); | |
if( cellSize == 1 ) | |
return normHamming(a, n); | |
const uchar* tab = 0; | |
if( cellSize == 2 ) | |
tab = popCountTable2; | |
else if( cellSize == 4 ) | |
tab = popCountTable4; | |
else | |
return -1; | |
int i = 0; | |
int result = 0; | |
#if (CV_SIMD || CV_SIMD_SCALABLE) | |
v_uint64 t = vx_setzero_u64(); | |
if ( cellSize == 2) | |
{ | |
v_uint16 mask = v_reinterpret_as_u16(vx_setall_u8(0x55)); | |
for(; i <= n - VTraits<v_uint8>::vlanes(); i += VTraits<v_uint8>::vlanes()) | |
{ | |
v_uint16 a0 = v_reinterpret_as_u16(vx_load(a + i)); | |
t = v_add(t, v_popcount(v_reinterpret_as_u64(v_and(v_or(a0, v_shr<1>(a0)), mask)))); | |
} | |
} | |
else // cellSize == 4 | |
{ | |
v_uint16 mask = v_reinterpret_as_u16(vx_setall_u8(0x11)); | |
for(; i <= n - VTraits<v_uint8>::vlanes(); i += VTraits<v_uint8>::vlanes()) | |
{ | |
v_uint16 a0 = v_reinterpret_as_u16(vx_load(a + i)); | |
v_uint16 a1 = v_or(a0, v_shr<2>(a0)); | |
t = v_add(t, v_popcount(v_reinterpret_as_u64(v_and(v_or(a1, v_shr<1>(a1)), mask)))); | |
} | |
} | |
result += (int)v_reduce_sum(t); | |
vx_cleanup(); | |
#elif CV_ENABLE_UNROLLED | |
for( ; i <= n - 4; i += 4 ) | |
result += tab[a[i]] + tab[a[i+1]] + tab[a[i+2]] + tab[a[i+3]]; | |
#endif | |
for( ; i < n; i++ ) | |
result += tab[a[i]]; | |
return result; | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
4 new functions have been extracted. Further accuracy checks might be needed for other related platforms(AVX2, LASX, etc.).
@Junyan721113 friendly reminder. |
cc @vpisarev |
22d10fe
to
f3729de
Compare
e4d8dd2
to
7a0336d
Compare
@fengyuentau Please attention on the changes. |
This comment was marked as outdated.
This comment was marked as outdated.
Okay, it seems the changes are not modifying the core method regarding warpAffine and warpPerspective. The new kernel is written fully with universal intrinsics (some parts are using neon intrinsics for the best performance). Could this be merged soon? Otherwise it can lead to merge conflicts. |
7a0336d
to
35463e0
Compare
@asmorkalov is there any other change needed to be made? |
Summary
Previous context
From PR #25167:
Part 1.5: New Interfaces (Ready for PR)
cv::ndsrvp::warpAffine
&cv::ndsrvp::warpPerspective
into...Blockline
&...BlocklineNN
cv::ndsrvp::remap
via newcv_hal_remap32f
interfaceWhat's noticing is that the
remap
function called bywarpAffine
andwarpPerspective
does not use HAL interfacecv_hal_remap32f
.Performance tests
Remap
Geometric mean (ms)
Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
Patch to opencv_extra has the same branch name.