GAPI Fluid: SIMD for MulC kernel. #21177

anna-khakimova · 2021-12-02T13:11:37Z

SIMD for GAPI Fluid MulC kernel.

Performance report:

force_builders=Linux AVX2,Custom,Custom Win,Custom Mac
build_gapi_standalone:Linux x64=ade-0.1.1f
build_gapi_standalone:Win64=ade-0.1.1f
Xbuild_gapi_standalone:Mac=ade-0.1.1f
build_gapi_standalone:Linux x64 Debug=ade-0.1.1f
build_image:Custom=centos:7
buildworker:Custom=linux-1
build_gapi_standalone:Custom=ade-0.1.1f
Xbuild_image:Custom=ubuntu-openvino-2021.3.0:20.04
build_image:Custom Win=openvino-2021.4.1
build_image:Custom Mac=openvino-2021.2.0
buildworker:Custom Win=windows-3
test_modules:Custom=gapi,python2,python3,java
test_modules:Custom Win=gapi,python2,python3,java
test_modules:Custom Mac=gapi,python2,python3,java
buildworker:Custom=linux-1
# disabled due high memory usage: test_opencl:Custom=ON
Xtest_opencl:Custom=OFF
Xtest_bigdata:Custom=1
Xtest_filter:Custom=*
CPU_BASELINE:Custom Win=AVX512_SKX
CPU_BASELINE:Custom=SSE4_2

sivanov-work

looks similar with prevs PR, LGTM

sivanov-work · 2021-12-03T07:49:42Z

modules/gapi/perf/common/gapi_core_perf_tests_inl.hpp

+    cv::Size sz;
+    MatType type = -1;
+    int dtype = -1;
+    double scale = 1.0;


scale is not configurable param?

While bug "Fluid: Add scaling support to MulC kernel." hasn't fixed yet, the MulC kernel doesn't support scaling, so it makes no sense to configure it via test parameters.

sivanov-work · 2021-12-03T07:53:14Z

modules/gapi/src/backends/fluid/gfluidcore.cpp

+            float* sc = scratch.OutLine<float>();
+
+            for (int i = 0; i < scratch.length(); ++i)
+                sc[i] = static_cast<float>(_scalar[i % chan]);


looks similar... can all the same similar places aggregated into single inline function like as load_scalar_from_scratch or something more meaningful

sivanov-work · 2021-12-03T07:54:14Z

modules/gapi/perf/common/gapi_core_perf_tests_inl.hpp


    initMatsRandU(type, sz, dtype, false);

    // OpenCV code ///////////////////////////////////////////////////////////
-    cv::multiply(in_mat1, sc, out_mat_ocv, 1, dtype);
+    cv::multiply(in_mat1, sc, out_mat_ocv, scale, dtype);


scale is double, but in implementation functions it is float: is there any compile wraning about it?

If we leave the scale of the double type, then we will have to convert all data to the double type. Only 2 elements will fit in a 128-bit vector, so we will have to do 2 times more iterations, which include load / store and many other high-latency operations. We will significantly reduce performance.

SIMDs for the Mul and the Div kernels also were implemented (by me and before by Evgeny Latkin) with this approach.

sorry, if i confused: i meant interface part not implementation: does cv::multiply accept double (otherwise warning must happen) and how where it convert into float in gapi kernels?
If yes, then there is a some confusion between interface and it's implementation. But from other hand this situation doesn't affect scale so much, because its probably expected x2,x4,x8 etc

sivanov-work · 2021-12-03T07:55:22Z

modules/gapi/src/backends/fluid/gfluidcore_func.dispatch.cpp

@@ -138,6 +138,33 @@ SUBC_SIMD(float, float)

 #undef SUBC_SIMD

+#define MULC_SIMD(SRC, DST)                                               \
+int mulc_simd(const SRC in[], const float scalar[], DST out[],            \
+              const int length, const int chan, const float scale)        \


float here, but UT has double - but maybe it is minor question...

If we leave the scale of the double type, then we will have to convert all data to the double type. Only 2 elements will fit in a 128-bit vector, so we will have to do 2 times more iterations, which include load / store and many other high-latency operations. We will significantly reduce performance.

sivanov-work · 2021-12-03T08:03:56Z

modules/gapi/src/backends/fluid/gfluidcore_func.simd.hpp

+    case 2:                                                                    \
+    case 4:                                                                    \
+    {                                                                          \
+        if (std::fabs(scale - 1.0f) <= FLT_EPSILON)                            \


could you put comment please about what happened here?
if scale ~ 1, then we use scalar version?

If scale = 1.0, we go to the branch without scaling, i.e. a*scalar only.

alalek · 2021-12-03T14:09:01Z

There are several GPU tests failed:

[  PASSED  ] 9369 tests.
[  FAILED  ] 27 tests, listed below:
[  FAILED  ] MulDoublePerfTestGPU/MulDoublePerfTest.TestPerformance/3, where GetParam() = (compare_f, 128x128, 8UC1, 5, { gapi.kernel_package })
[  FAILED  ] MulDoublePerfTestGPU/MulDoublePerfTest.TestPerformance/7, where GetParam() = (compare_f, 128x128, 8UC3, 5, { gapi.kernel_package })
[  FAILED  ] MulDoublePerfTestGPU/MulDoublePerfTest.TestPerformance/11, where GetParam() = (compare_f, 128x128, 16UC1, 5, { gapi.kernel_package })
[  FAILED  ] MulDoublePerfTestGPU/MulDoublePerfTest.TestPerformance/15, where GetParam() = (compare_f, 128x128, 16SC1, 5, { gapi.kernel_package })
[  FAILED  ] MulDoublePerfTestGPU/MulDoublePerfTest.TestPerformance/16, where GetParam() = (compare_f, 128x128, 32FC1, -1, { gapi.kernel_package })
[  FAILED  ] MulDoublePerfTestGPU/MulDoublePerfTest.TestPerformance/19, where GetParam() = (compare_f, 128x128, 32FC1, 5, { gapi.kernel_package })
[  FAILED  ] MulDoublePerfTestGPU/MulDoublePerfTest.TestPerformance/23, where GetParam() = (compare_f, 640x480, 8UC1, 5, { gapi.kernel_package })
[  FAILED  ] MulDoublePerfTestGPU/MulDoublePerfTest.TestPerformance/27, where GetParam() = (compare_f, 640x480, 8UC3, 5, { gapi.kernel_package })
[  FAILED  ] MulDoublePerfTestGPU/MulDoublePerfTest.TestPerformance/31, where GetParam() = (compare_f, 640x480, 16UC1, 5, { gapi.kernel_package })
[  FAILED  ] MulDoublePerfTestGPU/MulDoublePerfTest.TestPerformance/35, where GetParam() = (compare_f, 640x480, 16SC1, 5, { gapi.kernel_package })
[  FAILED  ] MulDoublePerfTestGPU/MulDoublePerfTest.TestPerformance/36, where GetParam() = (compare_f, 640x480, 32FC1, -1, { gapi.kernel_package })
[  FAILED  ] MulDoublePerfTestGPU/MulDoublePerfTest.TestPerformance/38, where GetParam() = (compare_f, 640x480, 32FC1, 2, { gapi.kernel_package })
[  FAILED  ] MulDoublePerfTestGPU/MulDoublePerfTest.TestPerformance/39, where GetParam() = (compare_f, 640x480, 32FC1, 5, { gapi.kernel_package })
[  FAILED  ] MulDoublePerfTestGPU/MulDoublePerfTest.TestPerformance/43, where GetParam() = (compare_f, 1280x720, 8UC1, 5, { gapi.kernel_package })
[  FAILED  ] MulDoublePerfTestGPU/MulDoublePerfTest.TestPerformance/47, where GetParam() = (compare_f, 1280x720, 8UC3, 5, { gapi.kernel_package })
[  FAILED  ] MulDoublePerfTestGPU/MulDoublePerfTest.TestPerformance/51, where GetParam() = (compare_f, 1280x720, 16UC1, 5, { gapi.kernel_package })
[  FAILED  ] MulDoublePerfTestGPU/MulDoublePerfTest.TestPerformance/55, where GetParam() = (compare_f, 1280x720, 16SC1, 5, { gapi.kernel_package })
[  FAILED  ] MulDoublePerfTestGPU/MulDoublePerfTest.TestPerformance/56, where GetParam() = (compare_f, 1280x720, 32FC1, -1, { gapi.kernel_package })
[  FAILED  ] MulDoublePerfTestGPU/MulDoublePerfTest.TestPerformance/58, where GetParam() = (compare_f, 1280x720, 32FC1, 2, { gapi.kernel_package })
[  FAILED  ] MulDoublePerfTestGPU/MulDoublePerfTest.TestPerformance/59, where GetParam() = (compare_f, 1280x720, 32FC1, 5, { gapi.kernel_package })
[  FAILED  ] MulDoublePerfTestGPU/MulDoublePerfTest.TestPerformance/63, where GetParam() = (compare_f, 1920x1080, 8UC1, 5, { gapi.kernel_package })
[  FAILED  ] MulDoublePerfTestGPU/MulDoublePerfTest.TestPerformance/67, where GetParam() = (compare_f, 1920x1080, 8UC3, 5, { gapi.kernel_package })
[  FAILED  ] MulDoublePerfTestGPU/MulDoublePerfTest.TestPerformance/71, where GetParam() = (compare_f, 1920x1080, 16UC1, 5, { gapi.kernel_package })
[  FAILED  ] MulDoublePerfTestGPU/MulDoublePerfTest.TestPerformance/75, where GetParam() = (compare_f, 1920x1080, 16SC1, 5, { gapi.kernel_package })
[  FAILED  ] MulDoublePerfTestGPU/MulDoublePerfTest.TestPerformance/76, where GetParam() = (compare_f, 1920x1080, 32FC1, -1, { gapi.kernel_package })
[  FAILED  ] MulDoublePerfTestGPU/MulDoublePerfTest.TestPerformance/78, where GetParam() = (compare_f, 1920x1080, 32FC1, 2, { gapi.kernel_package })
[  FAILED  ] MulDoublePerfTestGPU/MulDoublePerfTest.TestPerformance/79, where GetParam() = (compare_f, 1920x1080, 32FC1, 5, { gapi.kernel_package })

"Custom Win" by default doesn't run OpenCL testing.
BTW, windows-3 has Rocket Lake CPU with AVX512 support (i7-11700K).

anna-khakimova · 2021-12-06T11:38:28Z

There are several GPU tests failed:

[  PASSED  ] 9369 tests.
[  FAILED  ] 27 tests, listed below:
[  FAILED  ] MulDoublePerfTestGPU/MulDoublePerfTest.TestPerformance/3, where GetParam() = (compare_f, 128x128, 8UC1, 5, { gapi.kernel_package })
[  FAILED  ] MulDoublePerfTestGPU/MulDoublePerfTest.TestPerformance/7, where GetParam() = (compare_f, 128x128, 8UC3, 5, { gapi.kernel_package })
[  FAILED  ] MulDoublePerfTestGPU/MulDoublePerfTest.TestPerformance/11, where GetParam() = (compare_f, 128x128, 16UC1, 5, { gapi.kernel_package })
[  FAILED  ] MulDoublePerfTestGPU/MulDoublePerfTest.TestPerformance/15, where GetParam() = (compare_f, 128x128, 16SC1, 5, { gapi.kernel_package })
[  FAILED  ] MulDoublePerfTestGPU/MulDoublePerfTest.TestPerformance/16, where GetParam() = (compare_f, 128x128, 32FC1, -1, { gapi.kernel_package })
[  FAILED  ] MulDoublePerfTestGPU/MulDoublePerfTest.TestPerformance/19, where GetParam() = (compare_f, 128x128, 32FC1, 5, { gapi.kernel_package })
[  FAILED  ] MulDoublePerfTestGPU/MulDoublePerfTest.TestPerformance/23, where GetParam() = (compare_f, 640x480, 8UC1, 5, { gapi.kernel_package })
[  FAILED  ] MulDoublePerfTestGPU/MulDoublePerfTest.TestPerformance/27, where GetParam() = (compare_f, 640x480, 8UC3, 5, { gapi.kernel_package })
[  FAILED  ] MulDoublePerfTestGPU/MulDoublePerfTest.TestPerformance/31, where GetParam() = (compare_f, 640x480, 16UC1, 5, { gapi.kernel_package })
[  FAILED  ] MulDoublePerfTestGPU/MulDoublePerfTest.TestPerformance/35, where GetParam() = (compare_f, 640x480, 16SC1, 5, { gapi.kernel_package })
[  FAILED  ] MulDoublePerfTestGPU/MulDoublePerfTest.TestPerformance/36, where GetParam() = (compare_f, 640x480, 32FC1, -1, { gapi.kernel_package })
[  FAILED  ] MulDoublePerfTestGPU/MulDoublePerfTest.TestPerformance/38, where GetParam() = (compare_f, 640x480, 32FC1, 2, { gapi.kernel_package })
[  FAILED  ] MulDoublePerfTestGPU/MulDoublePerfTest.TestPerformance/39, where GetParam() = (compare_f, 640x480, 32FC1, 5, { gapi.kernel_package })
[  FAILED  ] MulDoublePerfTestGPU/MulDoublePerfTest.TestPerformance/43, where GetParam() = (compare_f, 1280x720, 8UC1, 5, { gapi.kernel_package })
[  FAILED  ] MulDoublePerfTestGPU/MulDoublePerfTest.TestPerformance/47, where GetParam() = (compare_f, 1280x720, 8UC3, 5, { gapi.kernel_package })
[  FAILED  ] MulDoublePerfTestGPU/MulDoublePerfTest.TestPerformance/51, where GetParam() = (compare_f, 1280x720, 16UC1, 5, { gapi.kernel_package })
[  FAILED  ] MulDoublePerfTestGPU/MulDoublePerfTest.TestPerformance/55, where GetParam() = (compare_f, 1280x720, 16SC1, 5, { gapi.kernel_package })
[  FAILED  ] MulDoublePerfTestGPU/MulDoublePerfTest.TestPerformance/56, where GetParam() = (compare_f, 1280x720, 32FC1, -1, { gapi.kernel_package })
[  FAILED  ] MulDoublePerfTestGPU/MulDoublePerfTest.TestPerformance/58, where GetParam() = (compare_f, 1280x720, 32FC1, 2, { gapi.kernel_package })
[  FAILED  ] MulDoublePerfTestGPU/MulDoublePerfTest.TestPerformance/59, where GetParam() = (compare_f, 1280x720, 32FC1, 5, { gapi.kernel_package })
[  FAILED  ] MulDoublePerfTestGPU/MulDoublePerfTest.TestPerformance/63, where GetParam() = (compare_f, 1920x1080, 8UC1, 5, { gapi.kernel_package })
[  FAILED  ] MulDoublePerfTestGPU/MulDoublePerfTest.TestPerformance/67, where GetParam() = (compare_f, 1920x1080, 8UC3, 5, { gapi.kernel_package })
[  FAILED  ] MulDoublePerfTestGPU/MulDoublePerfTest.TestPerformance/71, where GetParam() = (compare_f, 1920x1080, 16UC1, 5, { gapi.kernel_package })
[  FAILED  ] MulDoublePerfTestGPU/MulDoublePerfTest.TestPerformance/75, where GetParam() = (compare_f, 1920x1080, 16SC1, 5, { gapi.kernel_package })
[  FAILED  ] MulDoublePerfTestGPU/MulDoublePerfTest.TestPerformance/76, where GetParam() = (compare_f, 1920x1080, 32FC1, -1, { gapi.kernel_package })
[  FAILED  ] MulDoublePerfTestGPU/MulDoublePerfTest.TestPerformance/78, where GetParam() = (compare_f, 1920x1080, 32FC1, 2, { gapi.kernel_package })
[  FAILED  ] MulDoublePerfTestGPU/MulDoublePerfTest.TestPerformance/79, where GetParam() = (compare_f, 1920x1080, 32FC1, 5, { gapi.kernel_package })

"Custom Win" by default doesn't run OpenCL testing. BTW, windows-3 has Rocket Lake CPU with AVX512 support (i7-11700K).

I've configured my windows the same as mentioned "Custom Win" CI check and run the test, but unfortunately I didn't manage to reproduce this failures. I can delivery fix for this failure in separate PR. Could you please run this test on new PR?

* GAPI Fluid: SIMD for MulC kernel. * Changes for MulDouble kernel.

GAPI Fluid: SIMD for MulC kernel.

5ea73d4

anna-khakimova added optimization category: g-api / gapi labels Dec 2, 2021

anna-khakimova requested review from alalek, rgarnov and sivanov-work December 2, 2021 13:11

sivanov-work approved these changes Dec 3, 2021

View reviewed changes

anna-khakimova force-pushed the ak/simd_mulc branch from 309a221 to 0cc6ca0 Compare December 3, 2021 10:15

dmatveev added this to the 4.5.5 milestone Dec 3, 2021

dmatveev self-assigned this Dec 3, 2021

dmatveev approved these changes Dec 3, 2021

View reviewed changes

Changes for MulDouble kernel.

fcc8a67

anna-khakimova force-pushed the ak/simd_mulc branch from 0cc6ca0 to fcc8a67 Compare December 3, 2021 10:38

alalek merged commit c391080 into opencv:4.x Dec 3, 2021

alalek mentioned this pull request Dec 30, 2021

(5.x) Merge 4.x #21371

Merged

alalek mentioned this pull request Feb 22, 2022

(5.x) Merge 4.x #21651

Merged

a-sajjad72 pushed a commit to a-sajjad72/opencv that referenced this pull request Mar 30, 2023

Merge pull request opencv#21177 from anna-khakimova:ak/simd_mulc

2f3ab1b

* GAPI Fluid: SIMD for MulC kernel. * Changes for MulDouble kernel.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

GAPI Fluid: SIMD for MulC kernel. #21177

GAPI Fluid: SIMD for MulC kernel. #21177

Uh oh!

anna-khakimova commented Dec 2, 2021

Uh oh!

sivanov-work left a comment

Uh oh!

sivanov-work Dec 3, 2021

Uh oh!

anna-khakimova Dec 3, 2021 •

edited

Loading

Uh oh!

sivanov-work Dec 3, 2021

Uh oh!

sivanov-work Dec 3, 2021

Uh oh!

anna-khakimova Dec 3, 2021

Uh oh!

anna-khakimova Dec 3, 2021

Uh oh!

sivanov-work Dec 3, 2021

Uh oh!

sivanov-work Dec 3, 2021

Uh oh!

anna-khakimova Dec 3, 2021

Uh oh!

sivanov-work Dec 3, 2021

Uh oh!

anna-khakimova Dec 3, 2021 •

edited

Loading

Uh oh!

alalek commented Dec 3, 2021

Uh oh!

anna-khakimova commented Dec 6, 2021

Uh oh!

Uh oh!

Uh oh!

GAPI Fluid: SIMD for MulC kernel. #21177

GAPI Fluid: SIMD for MulC kernel. #21177

Uh oh!

Conversation

anna-khakimova commented Dec 2, 2021

Uh oh!

sivanov-work left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

anna-khakimova Dec 3, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

anna-khakimova Dec 3, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alalek commented Dec 3, 2021

Uh oh!

anna-khakimova commented Dec 6, 2021

Uh oh!

Uh oh!

anna-khakimova Dec 3, 2021 •

edited

Loading

anna-khakimova Dec 3, 2021 •

edited

Loading