You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
upgrade the Vulkan header file from version 1.0 to 1.2 to support the fp16 and int8 data format.
Carefully optimized the convolution layer and gemm layer. speed up from 170 ms to 36 ms of ResNet50 with Vulkan Backend.
Remove support for some layers like: pooling, permute, LRN, relu. The support of these layers will slow down the DNN inference speed because their kernels are not well-optimized. I think you should leave this task for the next step. GSoC students could take on some work.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Optimize DNN Vulkan backend
merge with: opencv/ci-gha-workflow#95.
My purposes for this PR:
1.0
to1.2
to support thefp16
andint8
data format.Vulkan CI result can be found at this PR
We only optimize the integrated GPU, and the discrete GPU like Nvidia GPU will run relatively slowly.
There are two CIs:
TODO List:
Performance Test
NOTE: Currently PR is only optimized for integrated graphics, it will run very slowly on discrete graphics like Nvidia GPU.
Test on Apple M1 chip.
Patch performance:
Since the old vulkan kernel is almost without optimize, it works very slowly.
Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
Patch to opencv_extra has the same branch name.