CARVIEW |
Navigation Menu
-
-
Notifications
You must be signed in to change notification settings - Fork 56.2k
GAPI Fluid: SIMD Multiply kernel. #21024
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
{ | ||
// like OpenCV: returns 0, if y=0 | ||
// like OpenCV: returns 0, if DST type=uchar/short/ushort and divider(y)=0 | ||
auto result = y? scale * x / y: 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can SCR2 be float
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it can be. And next overload handles this case.
BINARY_(uchar, uchar, uchar, run_arithm, dst, src1, src2, ARITHM_MULTIPLY, scale); | ||
BINARY_(uchar, ushort, ushort, run_arithm, dst, src1, src2, ARITHM_MULTIPLY, scale); | ||
BINARY_(uchar, short, short, run_arithm, dst, src1, src2, ARITHM_MULTIPLY, scale); | ||
BINARY_(uchar, float, float, run_arithm, dst, src1, src2, ARITHM_MULTIPLY, scale); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it supposed to have SRC1 & SRC2 for the same type (i mean if it falls into the div/mul with SFINAE with SRC1 & SRC2 then do we need to satisfy static_check(std::is_same<SRC1, SRC2>)
)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This check is already exist. Please take a look here
namespace gapi { | ||
namespace fluid { | ||
|
||
#define DIV_SIMD(SRC, DST) \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like order is different with BINARY_
https://github.com/opencv/opencv/pull/21024/files#diff-fe0ac6b5a07c1bec59c3756100eb186c9117a82dd1d951ef6997b8423a89943dR787
What do you think to align: DST <-->SRC according to BINARY_ order or vise versa?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two first arguments of div_simd() have SRC type. Third argument has DST type. So for MACRO I should mention SRC type first and second should be DST.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BYNARY_ macro is working good, so in the bounds of this task, I can't change API of this MACRO. Otherwise I have to change its invocation for all kernels that use it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see your point.
But what do you think to align argument positional agreement in newly added functions for mul/div in accommodation with the old approach with BINARY_?
UPDATED: Or it will affect template type auto-deduction?
CV_ALWAYS_INLINE | ||
typename std::enable_if<(std::is_same<SRC, short>::value && std::is_same<DST, ushort>::value) || | ||
(std::is_same<SRC, ushort>::value && std::is_same<DST, ushort>::value) || | ||
(std::is_same<SRC, ushort>::value && std::is_same<DST, short>::value), int>::type |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMHO: enable_if
is good for simple condition check otherwise it confused
but as I noticed we need to pass type into v_load_f32
and others function, so we need to know actual type here...
IMHO again:
To avoid eyeballs scattering when reading this code i think it is possible to introduce MACRO at least
#define SRC_DST_SHORT_USHORT (std::is_same<SRC, >...)
#define DST_SHORT_USHORT (std::is_same ...)
...
template<...>
CV_ALWAYS_INLINE
typename std::enable_if<SRC_DST_SHORT_USHORT, int >::type
div_hal(...)
...
template<...>
CV_ALWAYS_INLINE
typename std::enable_if<SRC_DST_SHORT_USHORT, int >::type
mul_hal(...)
Feel free to ignore it
{ | ||
#if CV_SIMD | ||
x = div_simd(in1, in2, out, length, scale); | ||
#endif | ||
for (; x < length; ++x) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there are no loop-unrolling MACRO in opencv?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it make sense here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 thanks!
Need to rebase after "Div" PR merge ( |
ecc0595
to
5ad0b60
Compare
5ad0b60
to
c47673b
Compare
@alalek Ok. Done. Please take a look. |
SIMD optimization for Fluid Multiply kernel.
Performance report:
FluidMulSIMD.xlsx