CARVIEW |
Navigation Menu
-
-
Notifications
You must be signed in to change notification settings - Fork 56.2k
DNN: Accelerating convolution #21910
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
modules/dnn/src/layers/convolution_arm/conv_2d_3x3s1_winograd.hpp
Outdated
Show resolved
Hide resolved
4204f89
to
e85f3d5
Compare
3281b3c
to
67d4dc2
Compare
This comment was marked as resolved.
This comment was marked as resolved.
@@ -42,6 +42,23 @@ | |||
|
|||
#include "opencv2/core/hal/intrin.hpp" | |||
|
|||
#ifndef FAST_CONV_PRAM |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since you've added separate fast_convolution.simd.hpp, why not move all the new convolution kernels there? Please, move conv_block and depthWiseBlock there. At once, please, normalize the naming. If you use lowercase names with underscores for conv_block, use the same style for depthwise_block. Or make everything mixed case: convBlock, depthWiseBlock
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
} | ||
} | ||
|
||
void runFastConv2d(InputArray _input, OutputArray _output, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you have 2 functions with very similar names: doConvolution and runFastConv2d. What's the difference between 'do' and 'run'? Can you modify the name of one of the functions to make it more clear?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed.
int Kstripes = Kg_nblocks*stripes_per_sample; | ||
int nsubtasks = N*ngroups*Kstripes; | ||
|
||
float* inpbuf_all = (float *)fastMalloc(inputbufsize * sizeof(float )); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In Ficus engine I had to use C, because this is the native output language for Ficus compiler. In C++ code, please, never use plain "malloc" or its alternatives. Use std::vector<>
instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for code reviewing, I will fix these issues in the next update.
Hi @vpisarev, the Code has been updated. And some comments are left. |
@@ -71,6 +71,8 @@ using namespace cv::dnn::ocl4dnn; | |||
using namespace cv::dnn::cuda4dnn; | |||
#endif | |||
|
|||
#include "./fast_convolution/fast_convolution.hpp" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
./
Relative prefix should not be used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed.
@@ -253,11 +255,13 @@ class ConvolutionLayerImpl CV_FINAL : public BaseConvolutionLayerImpl | |||
{ | |||
public: | |||
enum { VEC_ALIGN = 8, DFT_TYPE = CV_32F }; | |||
Mat weightsMat; | |||
Mat weightsMat, fastWeights; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is better to put this near fastConv2dImpl
Also it makes sense to add documetation/information about its layout and the difference from weighstMat
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
/* | ||
This file is a part of ficus language project. | ||
See ficus/LICENSE for the licensing terms | ||
*/ | ||
// This file is modified from the ficus (https://github.com/vpisarev/ficus/blob/master/lib/NN/OpConv.fx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vpisarev Please verify as this integration contradicts to 3rdparty original files and/or 3rdparty adopted files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for code reviewing. Any advice on this? I don't know how to modify it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alalek, could you please explain your comment? What's the contradiction? I can confirm that the code has been borrowed from Ficus, licensed under Apache 2 license.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I expect header similar to modules/dnn/src/layers/fast_convolution/winograd_3x3s1_f63.cpp
where we have OpenCV header on top and then the original license header of the adapted code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
CV_LOG_WARNING(NULL, "Runing at unoptimized code. The combination of FAST_CONV_MR and/or FAST_CONV_NR " | ||
"is not supported in SIMD128 branch."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ISA-targeted/SIMD code should not emit warnings (or in general call any other non-optimized functions).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zihaomu, it should be compile-time error. If user changes FAST_CONV_MR/FAST_CONV_NR, he/she should also modify the optimized loop or explicitly disable it and switch to C implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I will update it later.
@@ -71,6 +71,8 @@ using namespace cv::dnn::ocl4dnn; | |||
using namespace cv::dnn::cuda4dnn; | |||
#endif | |||
|
|||
#include "fast_convolution/fast_convolution.hpp" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to take a look on:
- test failures in Linux Debug configuration
- test failures in Linux AVX2 configuration (
-DCPU_BASELINE=AVX2
) - looks like unconditional doubling of weights storage requires more memory and several Win32 tests started to fail with OOM message. @vpisarev
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for code reviewing. The failure in Linux Debug and Linux AVX2 only occurs in the quantized model. Since parameters of int8 layers rely on the output of fp32 models, we can modify the threshold to solve it in a short time.
For Win32, I'm looking for a way around it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For Win32, I have changed the memory limitation of FasterRCNN_vgg16
from 1GB to 2GB. FasterRCNN_vgg16
has a large memory requirement(412 MB needed on one FC layer). The new patch needs to pack the weightMats
in advance in the initialization phase. So for Conv, we need twice as much memory to store the weights.
modules/dnn/test/test_backends.cpp
Outdated
@@ -545,7 +545,7 @@ TEST_P(DNNTestNetwork, FastNeuralStyle_eccv16) | |||
Mat img = imread(findDataFile("dnn/googlenet_1.png")); | |||
Mat inp = blobFromImage(img, 1.0, Size(320, 240), Scalar(103.939, 116.779, 123.68), false, false); | |||
// Output image has values in range [-143.526, 148.539]. | |||
float l1 = 4e-5, lInf = 2e-3; | |||
float l1 = 5e-5, lInf = 2e-3; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use x2-x5 values for test tolerance checks.
4e-5 => 1e-4 instead of 5e-5
Below:
1e-5 => 1e-4 instead of 1.01e-5
After merging this PR, Android build fails: https://github.com/opencv/ci-gha-workflow/runs/7157789387?check_suite_focus=true#step:8:1639 |
Thanks for that, i'm looking for a solution, it's estimated to take a day or two to resolve. |
DNN: Accelerating convolution * Fast Conv of ARM, X86 and universal intrinsics. * improve code style. * error fixed. * improve the License * optimize memory allocated and Adjust the threshold. * change FasterRCNN_vgg16 to 2GB memory.
The goal of this proposal is to speed up the convolution layer, thereby speeding up the running speed of the dnn module.
Performance Test on ARM (Appel M1 Chip, 8 threads)
On ARM platform, it can achieve 2.5X speed up on ResNet50, 1.7X speed up on MobileNetV2.
Performance Test on X86 (AMD 5600X, 12 threads)
It can achieve 15% faster on X86 platform.
Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
Patch to opencv_extra has the same branch name.
WIP