CARVIEW |
Navigation Menu
-
-
Notifications
You must be signed in to change notification settings - Fork 56.2k
GAPI Fluid: SIMD for AddC kernel. #21119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
cb58db3
to
d5554e2
Compare
d5554e2
to
fc59736
Compare
fc59736
to
57a9eed
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in overall LGTM.
Could you please clarify about: Why 1, 2 & 4 falls into the same case
and Warning?Exception?
static void initScratch(const GMatDesc&, const GScalarDesc&, int, Buffer& scratch) | ||
{ | ||
#if CV_SIMD | ||
// Max value of v_float32::nlanes = 16 in AVX512 ISA |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to launch this library compiled with CV_SIMD on a machine without SIMD support?
I mean: do we need runtime check on supporting SIMD on executable machine ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#if CV_SIMD
is the compilation macro. When CV_SIMD
is false this code is truncated in compilation time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The question is different.
There are BUILD machine and EXECUTION machine. The build machine may turn on SIMD or other instructions sets through MACRO. In this case the appropriate code burn out binary (#if CV_SIMD) and we will have buflen = 16
.
Let's assume that OPENCV was built on machine with CV_SIMD (code is exist) but executes on machine with CPU which doesn't support SIMD (code exist but cannot be executed) - so we still have 16
here, but actual earlier dispatching work out in assumption that there are no SIMD support and expect that it should have 4
due to CPU doesn't support SIMD
To resolve this situation we usually use TWO kind of check
#if EXTENSION_EXIST // compile code ON
if( extension_supported()) // test is supported
{
// supported
}
else
{
// compiled, but not supported
}
#else
// not compiled, but not supported
Is this situation applicable here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case, no errors or problems will arise. The buffer will be larger, but it will not cause any unwanted side effects.
constexpr int nlanes = v_float32::nlanes; | ||
constexpr int chan = 3; | ||
constexpr int lanes = chan * nlanes; | ||
|
||
int x = 0; | ||
for (;;) | ||
{ | ||
for (; x <= length - lanes; x += lanes) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like this copy-paste part with lanes calculation & internal for-loop could be incorporated in initial dispatcher addc_simd_c3(const SRC in[], const float scalar[], DST out[], const int length)
then
this *_impl
would just preform intrisict calculation only without this dupliacted pre- and post- operations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Applied.
case 3: \ | ||
return addc_simd_c3(in, scalar, out, length); \ | ||
default: \ | ||
break; \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Warning/Exception?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Applied.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, thanks!
37c7008
to
535133c
Compare
@alalek Could you please merge the PR? |
* GAPI Fluid: SIMD for AddC kernel * Final version * Applied comments.
SIMD for GAPI Fluid AddC kernel.
Performance report:
FluidAddCSIMD.xlsx