CARVIEW |
Navigation Menu
-
-
Notifications
You must be signed in to change notification settings - Fork 56.2k
GAPI Fluid: SIMD for MulC kernel. #21177
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks similar with prevs PR, LGTM
cv::Size sz; | ||
MatType type = -1; | ||
int dtype = -1; | ||
double scale = 1.0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
scale is not configurable param?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While bug "Fluid: Add scaling support to MulC kernel." hasn't fixed yet, the MulC kernel doesn't support scaling, so it makes no sense to configure it via test parameters.
float* sc = scratch.OutLine<float>(); | ||
|
||
for (int i = 0; i < scratch.length(); ++i) | ||
sc[i] = static_cast<float>(_scalar[i % chan]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks similar... can all the same similar places aggregated into single inline function like as load_scalar_from_scratch
or something more meaningful
|
||
initMatsRandU(type, sz, dtype, false); | ||
|
||
// OpenCV code /////////////////////////////////////////////////////////// | ||
cv::multiply(in_mat1, sc, out_mat_ocv, 1, dtype); | ||
cv::multiply(in_mat1, sc, out_mat_ocv, scale, dtype); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
scale is double, but in implementation functions it is float: is there any compile wraning about it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we leave the scale
of the double type, then we will have to convert all data to the double type. Only 2 elements will fit in a 128-bit vector, so we will have to do 2 times more iterations, which include load / store
and many other high-latency operations. We will significantly reduce performance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SIMDs for the Mul and the Div kernels also were implemented (by me and before by Evgeny Latkin) with this approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, if i confused: i meant interface part
not implementation: does cv::multiply
accept double (otherwise warning must happen) and how where it convert into float in gapi kernels?
If yes, then there is a some confusion between interface and it's implementation. But from other hand this situation doesn't affect scale
so much, because its probably expected x2,x4,x8 etc
@@ -138,6 +138,33 @@ SUBC_SIMD(float, float) | |||
|
|||
#undef SUBC_SIMD | |||
|
|||
#define MULC_SIMD(SRC, DST) \ | |||
int mulc_simd(const SRC in[], const float scalar[], DST out[], \ | |||
const int length, const int chan, const float scale) \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
float here, but UT has double - but maybe it is minor question...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we leave the scale of the double type, then we will have to convert all data to the double type. Only 2 elements will fit in a 128-bit vector, so we will have to do 2 times more iterations, which include load / store
and many other high-latency operations. We will significantly reduce performance.
case 2: \ | ||
case 4: \ | ||
{ \ | ||
if (std::fabs(scale - 1.0f) <= FLT_EPSILON) \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you put comment please about what happened here?
if scale ~ 1, then we use scalar version?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If scale = 1.0, we go to the branch without scaling, i.e. a*scalar
only.
309a221
to
0cc6ca0
Compare
0cc6ca0
to
fcc8a67
Compare
There are several GPU tests failed:
"Custom Win" by default doesn't run OpenCL testing. |
I've configured my windows the same as mentioned "Custom Win" CI check and run the test, but unfortunately I didn't manage to reproduce this failures. I can delivery fix for this failure in separate PR. Could you please run this test on new PR? |
* GAPI Fluid: SIMD for MulC kernel. * Changes for MulDouble kernel.
SIMD for GAPI Fluid MulC kernel.
Performance report:
FluidMulCSIMD.xlsx