CARVIEW |
Navigation Menu
-
-
Notifications
You must be signed in to change notification settings - Fork 56.2k
Vulkan backend for NaryEltwiseLayer in DNN module #24768
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Hi @Haosonn, thanks for your contribution!
Yes. Previously patch of vulkan, I just focused on the Integrated graphics. Our Vulkan backend still needs a lot of optimization. In my opinion, the first priority is supporting more layers, so that we could reduce the number of calling
It's hard to do so, we can not predict if the next layer of NaryEltwiseLayer was supported by Vulkan. Some fast transfer strategy like MNN's vulkan, they have two different implementations: VkBuffer and VkImage. And the VkImage is much faster on data transfering of GPU-CPU. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zihaomu Please review this PR as well.
4ae98b5
to
836f0d1
Compare
Several tests failed:
Also see https://pullrequest.opencv.org/buildbot/builders/precommit_linux64/builds/105934/steps/test_objdetect/logs/stdio, which looks like memory issues. |
@Haosonn @fengyuentau please rebase and fix conflicts. |
add several test cases Update Update Update Update Update
add a preheat calculation
& uncomment some operators in OpNary constructor
@zihaomu @fengyuentau Could you take a look again? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM π Thanks for the contribution!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your contribution! π
Vulkan backend for NaryEltwiseLayer in DNN module opencv#24768 We improve Vulkan backend for ``NaryEltwiseLayer`` in DNN module by: - add a basic framework for Vulkan backend in ``NaryEltwiseLayer`` - add a compute shader for binary forwarding (an imitation of what has been done in native OpenCV backend including broadcasting and eltwise-operation) - typo fixed: - Wrong info output in ``context.cpp`` Currently, our implementation (or all layers supporting Vulkan backend) runs pretty slow on discrete GPUs basically due to IO cost in function ``copyToHost``, and we are going to fix that by - find out the best ``VkMemoryProperty`` for various discrete GPUs - prevent ``copyToHost`` in middle layers during forwarding, (i.e keep data in GPU memory) ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [ ] There is a reference to the original bug report and related work - [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [ ] The feature is well documented and sample code can be built with the project CMake Co-authored-by: IskXCr <IskXCr@outlook.com>
This patch cause FP16 test failures: #24954 |
I see performance degradation for this test case with 1/2/4 threads (no threading in implementation anyway) on 12700K:
To reviewers: PRs with optimization or other non-trivial implementation changes should have attached performance reports. |
Pow is not supported yet in Vulkan backend. So I guess something else happened? |
There is regression on CPU, not Vulkan. |
It looks weirder to me that this patch did very limited changes on the CPU implementation but yet affected the CPU performance, specifically Pow only. Let me investigate it. |
Update: Oh, I see, use @opencv-alalek Do you know how to force opencv_perf_* running 100 samples? I found they can run 10 to 100 samples, which may lead to some mistakes. |
@fengyuentau , there is TEST_CYCLE_N(100)
{
β¦
} Or you may use
Sorry, I missed the thing that you already found |
We improve Vulkan backend for
NaryEltwiseLayer
in DNN module by:NaryEltwiseLayer
context.cpp
Currently, our implementation (or all layers supporting Vulkan backend) runs pretty slow on discrete GPUs basically due to IO cost in function
copyToHost
, and we are going to fix that byfind out the best
VkMemoryProperty
for various discrete GPUsprevent
copyToHost
in middle layers during forwarding, (i.e keep data in GPU memory)Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
Patch to opencv_extra has the same branch name.