CARVIEW |
Navigation Menu
-
-
Notifications
You must be signed in to change notification settings - Fork 56.2k
dnn: parallelize nary elementwise forward implementation & enable related conformance tests #25630
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
double nstripes = getNumThreads(); | ||
parallel_for_(Range(0, nplanes), worker, nstripes); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nstripes = getNumThreads();
This should not be used.
Already discussed several months ago - e.g. #23047
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for review but take it easy, this pr is still drafting. I still remember our discussion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed. Performance results are also updated.
My results with Jetson tk1 (armv7+neon):
|
My results for Intel(R) Core(TM) i5-2500K CPU @ 3.30GHz (no AVX2):
|
Thank you @asmorkalov for adding more performance results :) |
Any review comments? |
The patch leads to significant OpenCL pipelines degradation, e.g.:
I use NVIDIA GF 1080 for benchmark. Looks like the patch prevents some graph fusing or some inference optimization. |
Ok, I will take a look at the problem. |
4be1a1f
to
f3adabe
Compare
@asmorkalov The performance "degradation" is due to very out-of-date code base (>450 commits behind 4.x). I have updated the code base. Performance testings (on Intel UHD 770) seem to be okay on my side. Feel free to retest on your side. Thinking positively, we have achieved a lot performance boosting from those commits (OCL is ~4x faster and CPU is ~1.3x faster). Maybe I can add the OCL backend for this layer later :) |
perf-dnn.zip
|
I also tried Xiaomi Mi 10 phone. The result is volatile (m.b. power management), but I do not see significant performance gain, besides NCHW_C_sum and NCHW_NCHW_pow. |
It is tuned to have multi-theading if input scale is large enough. Traditional convolutional nets do not have such a large input scale for elementwise layers. |
…_thread dnn: merge #25630 to 5.x #25900 Sync changes from #25630 to 5.x. ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake
This PR introduces the following changes:
Performance
i7-12700K, RAM 64GB, Ubuntu 22.04
Apple M1, RAM 16GB, macOS 14.4.1
Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
Patch to opencv_extra has the same branch name.