CARVIEW |
Navigation Menu
-
-
Notifications
You must be signed in to change notification settings - Fork 56.2k
GAPI Fluid: SIMD AVX2 Resize F32C1. #21728
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
009e3bb
to
c1938d8
Compare
c1938d8
to
c0e49b5
Compare
c0e49b5
to
8f920d4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think last loop can be optimized in single memcpy
|
||
#if defined __GNUC__ | ||
# pragma GCC diagnostic push | ||
# pragma GCC diagnostic ignored "-Wstrict-overflow" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why we need to mask it here? does it hide some sort of error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a legacy carried over from OpenVINO. This warning suppression was added to OpenVINO without my participation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest to remove it and try build with opencv CI
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Applied.
Running CI checks were successful.
{ | ||
GAPI_DbgAssert(xRatioEq1 && yRatioEq1); | ||
int length = inSz.width; // == outSz.width | ||
for (int line = 0; line < lpi; ++line) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still memory allocated continuously (i believe) is it possible to make copy whole matrix block ( which is linear data in memory) using single one memcpy
call instead of wasting CPU cycles on loop iteration?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Applied.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM (when tests are passed)
if (inSz.width >= nlanes && outSz.width >= nlanes) | ||
{ | ||
avx2::calcRowLinear32FC1Impl(reinterpret_cast<float**>(dst), | ||
reinterpret_cast<const float**>(src0), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reinterpret cast is normally a red flag, can this be avoided? can avx2::calcRowLinear32FC1Impl
also accept a float*[]
type instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. Removed.
@alalek Could you please take a look this PR and merge if it looks good for you? |
…vx_simd GAPI Fluid: SIMD AVX2 Resize F32C1. * GAPI Fluid: Resize F32C1 scalar. * Final version * GAPI Fluid: SIMD AVX2 for Resize F32C1. * Applied comments. * Deleted warning suppression. * Applied comments.
SIMD AVX2 for Resize F32C1.
Performance:
ResizeF32C1_SIMD_AVX.xlsx