CARVIEW |
Navigation Menu
-
-
Notifications
You must be signed in to change notification settings - Fork 56.2k
Fast gemm for einsum #24509
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fast gemm for einsum #24509
Conversation
FastGemmOpt opt; | ||
opt.init(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I propose to make it layer-wide, but not initialize each time.
How do I understand your collected results? Is it faster or slower compared with |
Sorry. I added description. Inference is faster with FastGemm |
82f1c13
to
b470932
Compare
armv7 neon (jetson tk1):
|
x86 without AVX2:
|
So fastGemm integration totally makes sense. I propose to Extract platform detection and run it once in layer constructor. Other things looks good to me. |
fixes to performace test
In my last comment I have fixed tests issues mentioned by @dkurt and fixed platform detection. Should I get results with new platform detection or can we merge the PR without it. In my opinion there will not be too much of a difference in terms of results, so we can merge the PR |
@dkurt Do you have other remarks? If no, I propose to merge the PR after constructors fix. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
for (size_t i = 0; i < output.size(); i++) { | ||
Mat output_slice = output_buffer.row(i); | ||
output[i].copyTo(output_slice); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hconcat
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tried using concat. For some reason it fails the inference. Can you suggest the showcase the usage you had in your mind?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is error message?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excuse me, vconcat
for sure:
// ...
output.emplace_back(tmp_output.reshape(1, 1));
// ...
Mat output_buffer;
cv::vconcat(output, output_buffer);
int outputDim[] = {static_cast<int>(output.size()), M, N};
output_buffer = output_buffer.reshape(1, 3, &outputDim[0]);
…encv into ash/dev_einsum_fast_gemm
@Abdurrahheem, branch was pushed to origin by mistake: https://github.com/opencv/opencv/tree/ash/dev_einsum_fast_gemm Please do locally:
To avoid pushing to OpenCV:
|
…_gemm Fast gemm for einsum opencv#24509 ## This PR adds performance tests for Einsum Layer with FastGemm. See below results of performance test on different inputs ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [ ] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake
…_gemm Fast gemm for einsum opencv#24509 ## This PR adds performance tests for Einsum Layer with FastGemm. See below results of performance test on different inputs ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [ ] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake
…_gemm Fast gemm for einsum opencv#24509 ## This PR adds performance tests for Einsum Layer with FastGemm. See below results of performance test on different inputs ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [ ] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake
This PR adds performance tests for Einsum Layer with FastGemm. See below results of performance test on different inputs
Notation:
All data in ms (milliseconds).
Gemm is backend for matrix multiplication
Benchmarks: (arrow indicates increase in inference speed compared to einsum with gemm)
Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
Patch to opencv_extra has the same branch name.