CARVIEW |
Navigation Menu
-
-
Notifications
You must be signed in to change notification settings - Fork 56.2k
Faster implementation of blobFromImages for cpu nchw output #26127
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@alexlyulkov please add performance test. |
@fengyuentau, after you finish with C3 optimization in warping functions and before you move to bicubic case optimization, may I ask you to take a look at it? We need to compare speed of this implementation with existing one in 4.x and 5.x branches. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vpisarev @asmorkalov This patch generally brings better performance regardless 4.x or 5.x branch, although I only tested on my Macbook Air with M1. See below for detailed performance testing results. Code for perfomance testing: fengyuentau@b01f28c
Geometric mean (ms)
Name of Test base-4x patch-4x patch-4x
vs
base-4x
(x-factor)
HWC_TO_NCHW::Utils_blobFromImage::{ 32, 32 } 0.005 0.001 5.00
HWC_TO_NCHW::Utils_blobFromImage::{ 64, 64 } 0.013 0.001 8.94
HWC_TO_NCHW::Utils_blobFromImage::{ 128, 128 } 0.052 0.009 5.53
HWC_TO_NCHW::Utils_blobFromImage::{ 256, 256 } 0.205 0.037 5.58
HWC_TO_NCHW::Utils_blobFromImage::{ 512, 512 } 0.935 0.274 3.42
HWC_TO_NCHW::Utils_blobFromImage::{ 1024, 1024 } 3.246 0.671 4.84
HWC_TO_NCHW::Utils_blobFromImage::{ 2048, 2048 } 15.888 5.352 2.97
HWC_TO_NCHW::Utils_blobFromImages::{ 16, 32, 32 } 0.068 0.011 6.41
HWC_TO_NCHW::Utils_blobFromImages::{ 16, 64, 64 } 0.212 0.032 6.68
HWC_TO_NCHW::Utils_blobFromImages::{ 16, 128, 128 } 0.921 0.261 3.53
HWC_TO_NCHW::Utils_blobFromImages::{ 16, 256, 256 } 4.046 1.315 3.08
HWC_TO_NCHW::Utils_blobFromImages::{ 16, 512, 512 } 16.397 5.695 2.88
HWC_TO_NCHW::Utils_blobFromImages::{ 16, 1024, 1024 } 64.182 21.845 2.94
HWC_TO_NCHW::Utils_blobFromImages::{ 16, 2048, 2048 } 255.997 86.815 2.95
Geometric mean (ms)
Name of Test base-5x patch-5x patch-5x
vs
base-5x
(x-factor)
HWC_TO_NCHW::Utils_blobFromImage::{ 32, 32 } 0.005 0.001 5.17
HWC_TO_NCHW::Utils_blobFromImage::{ 64, 64 } 0.013 0.001 8.68
HWC_TO_NCHW::Utils_blobFromImage::{ 128, 128 } 0.050 0.009 5.34
HWC_TO_NCHW::Utils_blobFromImage::{ 256, 256 } 0.189 0.036 5.19
HWC_TO_NCHW::Utils_blobFromImage::{ 512, 512 } 0.910 0.433 2.10
HWC_TO_NCHW::Utils_blobFromImage::{ 1024, 1024 } 3.239 0.663 4.88
HWC_TO_NCHW::Utils_blobFromImage::{ 2048, 2048 } 15.499 9.550 1.62
HWC_TO_NCHW::Utils_blobFromImages::{ 16, 32, 32 } 0.067 0.011 5.85
HWC_TO_NCHW::Utils_blobFromImages::{ 16, 64, 64 } 0.207 0.035 5.98
HWC_TO_NCHW::Utils_blobFromImages::{ 16, 128, 128 } 0.902 0.450 2.00
HWC_TO_NCHW::Utils_blobFromImages::{ 16, 256, 256 } 3.893 2.279 1.71
HWC_TO_NCHW::Utils_blobFromImages::{ 16, 512, 512 } 15.899 9.360 1.70
HWC_TO_NCHW::Utils_blobFromImages::{ 16, 1024, 1024 } 61.762 23.486 2.63
HWC_TO_NCHW::Utils_blobFromImages::{ 16, 2048, 2048 } 249.740 94.881 2.63
BTW, I need to do the following changes so as to fix compile errors.
@fengyuentau Could you push your commit to Alex's branch. He is on vocation now. |
Signed-off-by: Yuantao Feng <yuantao.feng@opencv.org.cn>
Signed-off-by: Yuantao Feng <yuantao.feng@opencv.org.cn>
Faster implementation of blobFromImages for cpu nchw output opencv#26127 Faster implementation of blobFromImage and blobFromImages for HWC cv::Mat images -> NCHW cv::Mat case Running time on my pc in ms: **blobFromImage** ``` image size old new speed-up 32x32x3 0.008 0.002 4.0x 64x64x3 0.021 0.009 2.3x 128x128x3 0.164 0.037 4.4x 256x256x3 0.728 0.158 4.6x 512x512x3 3.310 0.628 5.2x 1024x1024x3 14.503 3.124 4.6x 2048x2048x3 61.647 28.049 2.2x ``` **blobFromImages** ``` image size old new speed-up 16x32x32x3 0.122 0.041 3.0x 16x64x64x3 0.790 0.165 4.8x 16x128x128x3 3.313 0.652 5.1x 16x256x256x3 13.495 3.127 4.3x 16x512x512x3 58.795 28.127 2.1x 16x1024x1024x3 251.135 121.955 2.1x 16x2048x2048x3 1023.570 487.188 2.1x ``` See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [ ] There is a reference to the original bug report and related work - [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake Update window_cocoa.mm
Faster implementation of blobFromImages for cpu nchw output opencv#26127 Faster implementation of blobFromImage and blobFromImages for HWC cv::Mat images -> NCHW cv::Mat case Running time on my pc in ms: **blobFromImage** ``` image size old new speed-up 32x32x3 0.008 0.002 4.0x 64x64x3 0.021 0.009 2.3x 128x128x3 0.164 0.037 4.4x 256x256x3 0.728 0.158 4.6x 512x512x3 3.310 0.628 5.2x 1024x1024x3 14.503 3.124 4.6x 2048x2048x3 61.647 28.049 2.2x ``` **blobFromImages** ``` image size old new speed-up 16x32x32x3 0.122 0.041 3.0x 16x64x64x3 0.790 0.165 4.8x 16x128x128x3 3.313 0.652 5.1x 16x256x256x3 13.495 3.127 4.3x 16x512x512x3 58.795 28.127 2.1x 16x1024x1024x3 251.135 121.955 2.1x 16x2048x2048x3 1023.570 487.188 2.1x ``` ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [ ] There is a reference to the original bug report and related work - [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake
Faster implementation of blobFromImages for cpu nchw output opencv#26127 Faster implementation of blobFromImage and blobFromImages for HWC cv::Mat images -> NCHW cv::Mat case Running time on my pc in ms: **blobFromImage** ``` image size old new speed-up 32x32x3 0.008 0.002 4.0x 64x64x3 0.021 0.009 2.3x 128x128x3 0.164 0.037 4.4x 256x256x3 0.728 0.158 4.6x 512x512x3 3.310 0.628 5.2x 1024x1024x3 14.503 3.124 4.6x 2048x2048x3 61.647 28.049 2.2x ``` **blobFromImages** ``` image size old new speed-up 16x32x32x3 0.122 0.041 3.0x 16x64x64x3 0.790 0.165 4.8x 16x128x128x3 3.313 0.652 5.1x 16x256x256x3 13.495 3.127 4.3x 16x512x512x3 58.795 28.127 2.1x 16x1024x1024x3 251.135 121.955 2.1x 16x2048x2048x3 1023.570 487.188 2.1x ``` ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [ ] There is a reference to the original bug report and related work - [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake
Faster implementation of blobFromImage and blobFromImages for
HWC cv::Mat images -> NCHW cv::Mat
case
Running time on my pc in ms:
blobFromImage
blobFromImages
Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
Patch to opencv_extra has the same branch name.