GAPI Fluid: SIMD for SubC kernel. #21158

anna-khakimova · 2021-11-30T12:27:05Z

SIMD for GAPI Fluid SubC kernel.

Performance report:

force_builders=Linux AVX2,Custom,Custom Win,Custom Mac
build_gapi_standalone:Linux x64=ade-0.1.1f
build_gapi_standalone:Win64=ade-0.1.1f
Xbuild_gapi_standalone:Mac=ade-0.1.1f
build_gapi_standalone:Linux x64 Debug=ade-0.1.1f
build_image:Custom=centos:7
buildworker:Custom=linux-1
build_gapi_standalone:Custom=ade-0.1.1f
Xbuild_image:Custom=ubuntu-openvino-2021.3.0:20.04
build_image:Custom Win=openvino-2021.4.1
build_image:Custom Mac=openvino-2021.2.0
buildworker:Custom Win=windows-3
test_modules:Custom=gapi,python2,python3,java
test_modules:Custom Win=gapi,python2,python3,java
test_modules:Custom Mac=gapi,python2,python3,java
buildworker:Custom=linux-1
# disabled due high memory usage: test_opencl:Custom=ON
Xtest_opencl:Custom=OFF
Xtest_bigdata:Custom=1
Xtest_filter:Custom=*
CPU_BASELINE:Custom Win=AVX512_SKX
CPU_BASELINE:Custom=SSE4_2

sivanov-work

LGTM: added minor comments

sivanov-work · 2021-12-01T08:58:50Z

modules/gapi/src/backends/fluid/gfluidcore.cpp

+            float* sc = scratch.OutLine<float>();
+
+            for (int i = 0; i < scratch.length(); ++i)
+                sc[i] = static_cast<float>(_scalar[i % chan]);


Naive: till scratch size is known at compile phase - is it possible to make unrolling loop for that determined size?
In case of compiler didn't make unroll it automatically...

It is not entirely clear yet. Could you explain in more detail?

Sure:

Scratch buffer size depends from MACRO definition here https://github.com/opencv/opencv/pull/21158/files#diff-fe0ac6b5a07c1bec59c3756100eb186c9117a82dd1d951ef6997b8423a89943dR1505

we have constant buffer size here for CV_SIMD turn on/off case... So it is possible to introduce MACRO branch here by CV_SIMD turn on/off and make manual loop unrolling in a way

#ifdef CV_SIMD cs[0] = _scalar[0]; cs[1] = _scalar[1]; ...<16 + 2 times> #else

But from the other hand, C++ compiler can do this work of manual loop unrolling in more optimized way... But I do not known much details how is it applicable here: would C++ compiler make unrolling or wouldn't in this case - if scratch.length() has runtime (but not compile time) value.

So my point here: We can try to simply replace scratch.length() from dynamic parameter to constant parameter from MACRO where scratch buffer is determined and believe that C++ compiler would make loop unrolling or straight way vectorization here

P.S. this comment may be (or may not be ) applicable if this code is placed on critical performance path ONLY

It seems to me there are no sense to do this for current case, because this loop work just once when function is called for the first line of the handling picture. And the compiler usually does it well itself.

sivanov-work · 2021-12-01T09:00:33Z

modules/gapi/src/backends/fluid/gfluidcore_func.simd.hpp

@@ -81,6 +81,9 @@ MUL_SIMD(float, float)

 #undef MUL_SIMD

+struct add_op {};


simple find on a page doesn't return to me anything about using this structure: where is it used?

Ok. Removed.

sivanov-work · 2021-12-01T09:03:42Z

modules/gapi/src/backends/fluid/gfluidcore.cpp

+    {
+#if CV_SIMD
+        // 512 bits / 32 bits = 16 elements of float32 can contain a AVX 512 SIMD vector.
+        constexpr int maxNlanes = 16;


is it the same initialization which used for AddC? if yes - then maybe we need single common function to reuse in both?

sivanov-work · 2021-12-01T09:07:04Z

modules/gapi/src/backends/fluid/gfluidcore_func.simd.hpp

 CV_ALWAYS_INLINE
 typename std::enable_if<(std::is_same<DST, ushort>::value ||
                         std::is_same<DST, short>::value), void>::type
-addc_simd_common_impl(const SRC* inx, DST* outx, const v_float32& sc, const int nlanes)
+arithmOpScalar_simd_common_impl(op_tag t, const SRC* inx, DST* outx, const v_float32& sc, const int nlanes)


👍🏻
Now code for sum + sub is unified?

what do you think to rename op for something more less domain specific -> operation or op/operation_dispatcher ? too much short-named variables here and functions name stays hidden beside them

anna-khakimova · 2021-12-01T16:33:29Z

@alalek CI checks passed, PR was approved. Could you please merge it?

* GAPI Fluid: SIMD for SubC kernel. * Applied comments

anna-khakimova force-pushed the ak/simd_subC branch 2 times, most recently from 5abee9b to 86f21d4 Compare November 30, 2021 12:43

GAPI Fluid: SIMD for SubC kernel.

d26e6da

anna-khakimova force-pushed the ak/simd_subC branch from 86f21d4 to d26e6da Compare November 30, 2021 13:17

anna-khakimova added category: g-api / gapi optimization labels Nov 30, 2021

anna-khakimova requested review from sivanov-work and rgarnov December 1, 2021 08:49

anna-khakimova assigned sivanov-work Dec 1, 2021

anna-khakimova requested a review from alalek December 1, 2021 08:57

sivanov-work approved these changes Dec 1, 2021

View reviewed changes

Applied comments

2343e52

alalek merged commit 369b260 into opencv:4.x Dec 1, 2021

alalek mentioned this pull request Dec 30, 2021

(5.x) Merge 4.x #21371

Merged

alalek mentioned this pull request Feb 22, 2022

(5.x) Merge 4.x #21651

Merged

a-sajjad72 pushed a commit to a-sajjad72/opencv that referenced this pull request Mar 30, 2023

Merge pull request opencv#21158 from anna-khakimova:ak/simd_subC

51c8903

* GAPI Fluid: SIMD for SubC kernel. * Applied comments

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

GAPI Fluid: SIMD for SubC kernel. #21158

GAPI Fluid: SIMD for SubC kernel. #21158

Uh oh!

anna-khakimova commented Nov 30, 2021 •

edited

Loading

Uh oh!

sivanov-work left a comment

Uh oh!

sivanov-work Dec 1, 2021

Uh oh!

anna-khakimova Dec 1, 2021

Uh oh!

sivanov-work Dec 1, 2021

Uh oh!

anna-khakimova Dec 1, 2021

Uh oh!

sivanov-work Dec 1, 2021

Uh oh!

anna-khakimova Dec 1, 2021

Uh oh!

sivanov-work Dec 1, 2021

Uh oh!

anna-khakimova Dec 1, 2021

Uh oh!

sivanov-work Dec 1, 2021

Uh oh!

sivanov-work Dec 1, 2021

Uh oh!

anna-khakimova Dec 1, 2021

Uh oh!

anna-khakimova commented Dec 1, 2021 •

edited

Loading

Uh oh!

Uh oh!

		@@ -81,6 +81,9 @@ MUL_SIMD(float, float)

		#undef MUL_SIMD

		struct add_op {};

Uh oh!

GAPI Fluid: SIMD for SubC kernel. #21158

GAPI Fluid: SIMD for SubC kernel. #21158

Uh oh!

Conversation

anna-khakimova commented Nov 30, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sivanov-work left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

anna-khakimova commented Dec 1, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

anna-khakimova commented Nov 30, 2021 •

edited

Loading

anna-khakimova commented Dec 1, 2021 •

edited

Loading