Fast gemm for einsum #24509

Abdurrahheem · 2023-11-07T17:34:49Z

This PR adds performance tests for Einsum Layer with FastGemm. See below results of performance test on different inputs

Notation:

WX: windows10_x64
MX: macos_x64
MA: macos_arm64
UX: ubuntu_x64
UA: ubuntu_arm64

All data in ms (milliseconds).
Gemm is backend for matrix multiplication

Benchmarks: (arrow indicates increase in inference speed compared to einsum with gemm)

Equation	Inputs Mat Dims	UX (ms)	UA (ms)	MX (ms)	MA (ms)	WX (ms)
"ij, jk -> ik"	[2, 3], [3,2]	0.04 ± 0.00	-	-	-	-
"ij, jk -> ik"	[20, 30], [30,20]	0.07 ± 0.00	-	-	-	-
"ij, jk -> ik"	[113, 127], [127,113]	1.17 ± 0.02 ↓ ~ 48%	-	-	-	-
"imkj, injs -> imnks"	[1, 4, 7, 9], [1, 5, 9, 8]	0.10 ± 0.00	-	-	-	-
"imkj, injs -> imnks"	[1, 4, 70, 90], [1, 5, 90, 80]	5.75 ± 0.10 ↓ ~ 37%	-	-	-	-
"imkj, injs -> imnks"	[1, 4, 73, 91], [1, 5, 91, 57]	5.58 ± 0.12 ↓ ~ 48%	-	-	-	-
"ij -> i"	[30, 40]	0.03 ± 0.00	-	-	-	-
"ij -> i"	[113, 374]	0.13 ± 0.00	-	-	-	-
"...ij -> ...i"	[30, 40]	0.03 ± 0.00	-	-	-	-
"...ij -> ...i"	[113, 374]	0.13 ± 0.00	-	-	-	-
"...ij, ...jk -> ...ik"	[40, 50], [50,80]	0.26 ± 0.00	-	-	-	-
"...ij, ...jk -> ...ik"	[47, 51], [51, 83]	0.28 ± 0.01	-	-	-	-

Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

I agree to contribute to the project under Apache 2 License.
To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
The PR is proposed to the proper branch
There is a reference to the original bug report and related work
There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
The feature is well documented and sample code can be built with the project CMake

asmorkalov · 2023-11-07T19:03:16Z

modules/dnn/src/layers/einsum_layer.cpp

+    FastGemmOpt opt;
+    opt.init();


I propose to make it layer-wide, but not initialize each time.

fengyuentau · 2023-11-08T02:47:13Z

How do I understand your collected results? Is it faster or slower compared with cv::gemm?

Abdurrahheem · 2023-11-08T07:58:05Z

How do I understand your collected results? Is it faster or slower compared with cv::gemm?

Sorry. I added description. Inference is faster with FastGemm

modules/dnn/perf/perf_einsum.cpp

modules/dnn/src/layers/einsum_layer.cpp

asmorkalov · 2023-11-09T07:41:00Z

armv7 neon (jetson tk1):

Geometric mean (ms)
                                                       Name of Test                                                        4.x-baseline-1 4.x-fastgemm-1 4.x-fastgemm-1
                                                                                                                                                               vs      
                                                                                                                                                         4.x-baseline-1
                                                                                                                                                           (x-factor)  
einsum::Layer_Einsum::Eqiation=...ij -> ...i, InputSize=1, OutputSize=1, InputShape={{30, 40}}                                 0.027          0.027           1.03     
einsum::Layer_Einsum::Eqiation=...ij -> ...i, InputSize=1, OutputSize=1, InputShape={{113, 374}}                               0.120          0.120           1.00     
einsum::Layer_Einsum::Eqiation=...ij, ...jk -> ...ik, InputSize=2, OutputSize=1, InputShape={{40, 50}, {50, 80}}               0.459          0.269           1.71     
einsum::Layer_Einsum::Eqiation=...ij, ...jk -> ...ik, InputSize=2, OutputSize=1, InputShape={{47, 51}, {51, 83}}               0.523          0.292           1.79     
einsum::Layer_Einsum::Eqiation=ij -> i, InputSize=1, OutputSize=1, InputShape={{30, 40}}                                       0.027          0.026           1.03     
einsum::Layer_Einsum::Eqiation=ij -> i, InputSize=1, OutputSize=1, InputShape={{113, 374}}                                     0.121          0.120           1.01     
einsum::Layer_Einsum::Eqiation=ij, jk -> ik, InputSize=2, OutputSize=1, InputShape={{2, 3}, {3, 2}}                            0.058          0.053           1.08     
einsum::Layer_Einsum::Eqiation=ij, jk -> ik, InputSize=2, OutputSize=1, InputShape={{20, 30}, {30, 20}}                        0.135          0.119           1.14     
einsum::Layer_Einsum::Eqiation=ij, jk -> ik, InputSize=2, OutputSize=1, InputShape={{113, 127}, {127, 113}}                    3.635          2.044           1.78     
einsum::Layer_Einsum::Eqiation=imkj, injs -> imnks, InputSize=2, OutputSize=1, InputShape={{1, 4, 7, 9}, {1, 5, 9, 8}}         0.118          0.125           0.95     
einsum::Layer_Einsum::Eqiation=imkj, injs -> imnks, InputSize=2, OutputSize=1, InputShape={{1, 4, 70, 90}, {1, 5, 90, 80}}     30.297         10.084          3.00     
einsum::Layer_Einsum::Eqiation=imkj, injs -> imnks, InputSize=2, OutputSize=1, InputShape={{1, 4, 73, 91}, {1, 5, 91, 57}}     23.522         8.031           2.93

asmorkalov · 2023-11-09T07:59:04Z

x86 without AVX2:

Geometric mean (ms)
                                                       Name of Test                                                        4.x-baseline-1 4.x-fastgemm-1 4.x-fastgemm-1
                                                                                                                                                               vs      
                                                                                                                                                         4.x-baseline-1
                                                                                                                                                           (x-factor)  
einsum::Layer_Einsum::Eqiation=...ij -> ...i, InputSize=1, OutputSize=1, InputShape={{30, 40}}                                 0.007          0.007           0.96     
einsum::Layer_Einsum::Eqiation=...ij -> ...i, InputSize=1, OutputSize=1, InputShape={{113, 374}}                               0.043          0.044           1.00     
einsum::Layer_Einsum::Eqiation=...ij, ...jk -> ...ik, InputSize=2, OutputSize=1, InputShape={{40, 50}, {50, 80}}               0.254          0.153           1.66     
einsum::Layer_Einsum::Eqiation=...ij, ...jk -> ...ik, InputSize=2, OutputSize=1, InputShape={{47, 51}, {51, 83}}               0.289          0.163           1.77     
einsum::Layer_Einsum::Eqiation=ij -> i, InputSize=1, OutputSize=1, InputShape={{30, 40}}                                       0.007          0.007           0.97     
einsum::Layer_Einsum::Eqiation=ij -> i, InputSize=1, OutputSize=1, InputShape={{113, 374}}                                     0.042          0.044           0.97     
einsum::Layer_Einsum::Eqiation=ij, jk -> ik, InputSize=2, OutputSize=1, InputShape={{2, 3}, {3, 2}}                            0.007          0.007           1.10     
einsum::Layer_Einsum::Eqiation=ij, jk -> ik, InputSize=2, OutputSize=1, InputShape={{20, 30}, {30, 20}}                        0.038          0.030           1.28     
einsum::Layer_Einsum::Eqiation=ij, jk -> ik, InputSize=2, OutputSize=1, InputShape={{113, 127}, {127, 113}}                    0.760          0.630           1.21     
einsum::Layer_Einsum::Eqiation=imkj, injs -> imnks, InputSize=2, OutputSize=1, InputShape={{1, 4, 7, 9}, {1, 5, 9, 8}}         0.050          0.044           1.14     
einsum::Layer_Einsum::Eqiation=imkj, injs -> imnks, InputSize=2, OutputSize=1, InputShape={{1, 4, 70, 90}, {1, 5, 90, 80}}     4.641          3.224           1.44     
einsum::Layer_Einsum::Eqiation=imkj, injs -> imnks, InputSize=2, OutputSize=1, InputShape={{1, 4, 73, 91}, {1, 5, 91, 57}}     3.735          2.685           1.39

asmorkalov · 2023-11-09T08:00:23Z

So fastGemm integration totally makes sense. I propose to Extract platform detection and run it once in layer constructor. Other things looks good to me.

fixes to performace test

Abdurrahheem · 2023-11-13T07:51:30Z

So fastGemm integration totally makes sense. I propose to Extract platform detection and run it once in layer constructor. Other things looks good to me.

In my last comment I have fixed tests issues mentioned by @dkurt and fixed platform detection. Should I get results with new platform detection or can we merge the PR without it. In my opinion there will not be too much of a difference in terms of results, so we can merge the PR

modules/dnn/src/layers/einsum_layer.cpp

asmorkalov · 2023-11-13T08:47:14Z

@dkurt Do you have other remarks? If no, I propose to merge the PR after constructors fix.

modules/dnn/src/layers/einsum_layer.cpp

asmorkalov

👍

modules/dnn/src/layers/einsum_layer.cpp

dkurt · 2023-11-13T12:37:36Z

modules/dnn/src/layers/einsum_layer.cpp

+    for (size_t i = 0; i < output.size(); i++) {
+        Mat output_slice = output_buffer.row(i);
+        output[i].copyTo(output_slice);
+    }


Tried using concat. For some reason it fails the inference. Can you suggest the showcase the usage you had in your mind?

What is error message?

Excuse me, vconcat for sure:

// ... output.emplace_back(tmp_output.reshape(1, 1)); // ... Mat output_buffer; cv::vconcat(output, output_buffer); int outputDim[] = {static_cast<int>(output.size()), M, N}; output_buffer = output_buffer.reshape(1, 3, &outputDim[0]);

…encv into ash/dev_einsum_fast_gemm

dkurt · 2023-11-16T11:15:13Z

@Abdurrahheem, branch was pushed to origin by mistake: https://github.com/opencv/opencv/tree/ash/dev_einsum_fast_gemm

Please do locally:

git remote set-url --push origin ""

To avoid pushing to OpenCV:

$ git remote -v
dkurt   https://github.com/dkurt/opencv (fetch)
dkurt   https://github.com/dkurt/opencv (push)
origin  https://github.com/opencv/opencv (fetch)
origin   (push)

…_gemm Fast gemm for einsum opencv#24509 ## This PR adds performance tests for Einsum Layer with FastGemm. See below results of performance test on different inputs ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [ ] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake

Abdurrahheem added the category: dnn label Nov 7, 2023

Abdurrahheem requested review from asmorkalov, fengyuentau and dkurt November 7, 2023 17:34

Abdurrahheem self-assigned this Nov 7, 2023

asmorkalov reviewed Nov 7, 2023

View reviewed changes

asmorkalov added this to the 4.9.0 milestone Nov 8, 2023

dkurt reviewed Nov 8, 2023

View reviewed changes

replaced gemm with fast gemm

b470932

Abdurrahheem force-pushed the ash/dev_einsum_fast_gemm branch from 82f1c13 to b470932 Compare November 8, 2023 17:43

dkurt reviewed Nov 9, 2023

View reviewed changes

modules/dnn/src/layers/einsum_layer.cpp Show resolved Hide resolved

backend initialization for gemm in constructor and

a4cb7cd

fixes to performace test

Abdurrahheem marked this pull request as ready for review November 13, 2023 08:41

asmorkalov reviewed Nov 13, 2023

View reviewed changes

modules/dnn/src/layers/einsum_layer.cpp Outdated Show resolved Hide resolved

modules/dnn/src/layers/einsum_layer.cpp Outdated Show resolved Hide resolved

fengyuentau reviewed Nov 13, 2023

View reviewed changes

modules/dnn/src/layers/einsum_layer.cpp Outdated Show resolved Hide resolved

Code review fixes.

33932b3

asmorkalov approved these changes Nov 13, 2023

View reviewed changes

dkurt reviewed Nov 13, 2023

View reviewed changes

modules/dnn/src/layers/einsum_layer.cpp Show resolved Hide resolved

dkurt reviewed Nov 13, 2023

View reviewed changes

modules/dnn/src/layers/einsum_layer.cpp Outdated Show resolved Hide resolved

dkurt reviewed Nov 13, 2023

View reviewed changes

Abdurrahheem added 2 commits November 13, 2023 20:43

fixed to comments

43881c6

Merge branch 'ash/dev_einsum_fast_gemm' of github.com:Abdurrahheem/op…

fa89c55

…encv into ash/dev_einsum_fast_gemm

fengyuentau approved these changes Nov 16, 2023

View reviewed changes

asmorkalov merged commit 8c10545 into opencv:4.x Nov 16, 2023

asmorkalov mentioned this pull request Jan 19, 2024

5.x merge 4.x #24862

Merged

Uh oh!

Fast gemm for einsum #24509

Fast gemm for einsum #24509

Uh oh!

Conversation

Abdurrahheem commented Nov 7, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

This PR adds performance tests for Einsum Layer with FastGemm. See below results of performance test on different inputs

Pull Request Readiness Checklist

Uh oh!

asmorkalov Nov 7, 2023

Choose a reason for hiding this comment

Uh oh!

fengyuentau commented Nov 8, 2023

Uh oh!

Abdurrahheem commented Nov 8, 2023

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

asmorkalov commented Nov 9, 2023

Uh oh!

asmorkalov commented Nov 9, 2023

Uh oh!

asmorkalov commented Nov 9, 2023

Uh oh!

Abdurrahheem commented Nov 13, 2023

Uh oh!

Uh oh!

Uh oh!

asmorkalov commented Nov 13, 2023

Uh oh!

Uh oh!

asmorkalov left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

dkurt Nov 13, 2023

Choose a reason for hiding this comment

Uh oh!

Abdurrahheem Nov 13, 2023

Choose a reason for hiding this comment

Uh oh!

asmorkalov Nov 13, 2023

Choose a reason for hiding this comment

Uh oh!

dkurt Nov 14, 2023

Choose a reason for hiding this comment

Uh oh!

dkurt commented Nov 16, 2023

Uh oh!

Uh oh!

Abdurrahheem commented Nov 7, 2023 •

edited

Loading