CARVIEW |
Select Language
HTTP/2 200
date: Sun, 27 Jul 2025 08:34:14 GMT
content-type: text/html; charset=utf-8
vary: X-PJAX, X-PJAX-Container, Turbo-Visit, Turbo-Frame, X-Requested-With,Accept-Encoding, Accept, X-Requested-With
x-repository-download: git clone https://github.com/opencv/opencv.git
etag: W/"771e18630e4100b5c54db6712c9eae86"
cache-control: max-age=0, private, must-revalidate
strict-transport-security: max-age=31536000; includeSubdomains; preload
x-frame-options: deny
x-content-type-options: nosniff
x-xss-protection: 0
referrer-policy: no-referrer-when-downgrade
content-security-policy: default-src 'none'; base-uri 'self'; child-src github.githubassets.com github.com/assets-cdn/worker/ github.com/assets/ gist.github.com/assets-cdn/worker/; connect-src 'self' uploads.github.com www.githubstatus.com collector.github.com raw.githubusercontent.com api.github.com github-cloud.s3.amazonaws.com github-production-repository-file-5c1aeb.s3.amazonaws.com github-production-upload-manifest-file-7fdce7.s3.amazonaws.com github-production-user-asset-6210df.s3.amazonaws.com *.rel.tunnels.api.visualstudio.com wss://*.rel.tunnels.api.visualstudio.com objects-origin.githubusercontent.com copilot-proxy.githubusercontent.com proxy.individual.githubcopilot.com proxy.business.githubcopilot.com proxy.enterprise.githubcopilot.com *.actions.githubusercontent.com wss://*.actions.githubusercontent.com productionresultssa0.blob.core.windows.net/ productionresultssa1.blob.core.windows.net/ productionresultssa2.blob.core.windows.net/ productionresultssa3.blob.core.windows.net/ productionresultssa4.blob.core.windows.net/ productionresultssa5.blob.core.windows.net/ productionresultssa6.blob.core.windows.net/ productionresultssa7.blob.core.windows.net/ productionresultssa8.blob.core.windows.net/ productionresultssa9.blob.core.windows.net/ productionresultssa10.blob.core.windows.net/ productionresultssa11.blob.core.windows.net/ productionresultssa12.blob.core.windows.net/ productionresultssa13.blob.core.windows.net/ productionresultssa14.blob.core.windows.net/ productionresultssa15.blob.core.windows.net/ productionresultssa16.blob.core.windows.net/ productionresultssa17.blob.core.windows.net/ productionresultssa18.blob.core.windows.net/ productionresultssa19.blob.core.windows.net/ github-production-repository-image-32fea6.s3.amazonaws.com github-production-release-asset-2e65be.s3.amazonaws.com insights.github.com wss://alive.github.com api.githubcopilot.com api.individual.githubcopilot.com api.business.githubcopilot.com api.enterprise.githubcopilot.com; font-src github.githubassets.com; form-action 'self' github.com gist.github.com copilot-workspace.githubnext.com objects-origin.githubusercontent.com; frame-ancestors 'none'; frame-src viewscreen.githubusercontent.com notebooks.githubusercontent.com; img-src 'self' data: blob: github.githubassets.com media.githubusercontent.com camo.githubusercontent.com identicons.github.com avatars.githubusercontent.com private-avatars.githubusercontent.com github-cloud.s3.amazonaws.com objects.githubusercontent.com release-assets.githubusercontent.com secured-user-images.githubusercontent.com/ user-images.githubusercontent.com/ private-user-images.githubusercontent.com opengraph.githubassets.com copilotprodattachments.blob.core.windows.net/github-production-copilot-attachments/ github-production-user-asset-6210df.s3.amazonaws.com customer-stories-feed.github.com spotlights-feed.github.com objects-origin.githubusercontent.com *.githubusercontent.com; manifest-src 'self'; media-src github.com user-images.githubusercontent.com/ secured-user-images.githubusercontent.com/ private-user-images.githubusercontent.com github-production-user-asset-6210df.s3.amazonaws.com gist.github.com; script-src github.githubassets.com; style-src 'unsafe-inline' github.githubassets.com; upgrade-insecure-requests; worker-src github.githubassets.com github.com/assets-cdn/worker/ github.com/assets/ gist.github.com/assets-cdn/worker/
server: github.com
content-encoding: gzip
accept-ranges: bytes
set-cookie: _gh_sess=tkycULi%2BmpYeCBEMQPz6Alu8wzSIbsKhnMK9GT%2BMbSfk2zFaSeXqzZBikuAwR5Mda%2FUq5d3aYpDhjOgYijFgx%2BBW5pIFHSMDBrEBF79HvqaYjdw%2FmY05ihjzicSEGJTnxqyBxMxt98tfrPQ6xE29KpJ2QjxsN3tsVccfR6PTwCoa76MPmc3EIO3HrZ%2BCZ693CJS%2F17YdDR5VRzsOWTDsYiiMrqE%2FMoV3DLMk67F6NxG%2FNVrAJzU8VE0yZEyu904rWFrczN%2BAuZZV2UanHTkI%2Fw%3D%3D--K3CECvkTl%2B%2FtUkLj--M6PXN%2FwOU8j%2Ft6G3arNICQ%3D%3D; Path=/; HttpOnly; Secure; SameSite=Lax
set-cookie: _octo=GH1.1.424570941.1753605254; Path=/; Domain=github.com; Expires=Mon, 27 Jul 2026 08:34:14 GMT; Secure; SameSite=Lax
set-cookie: logged_in=no; Path=/; Domain=github.com; Expires=Mon, 27 Jul 2026 08:34:14 GMT; HttpOnly; Secure; SameSite=Lax
x-github-request-id: D648:3ED739:C4DFD6:1054E74:6885E485
Merge pull request #25630 from fengyuentau:nary-multi-thread · opencv/opencv@a7fd944 · GitHub
Copy file name to clipboardExpand all lines: modules/dnn/src/cuda/eltwise_ops.cu
Copy file name to clipboardExpand all lines: modules/dnn/src/cuda/functors.hpp
Copy file name to clipboardExpand all lines: modules/dnn/src/cuda4dnn/kernels/eltwise_ops.hpp
Copy file name to clipboardExpand all lines: modules/dnn/src/cuda4dnn/primitives/eltwise.hpp
Skip to content
Navigation Menu
{{ message }}
-
-
Notifications
You must be signed in to change notification settings - Fork 56.2k
Commit a7fd944
authored
Merge pull request #25630 from fengyuentau:nary-multi-thread
dnn: parallelize nary elementwise forward implementation & enable related conformance tests #25630
This PR introduces the following changes:
- [x] Parallelize binary forward impl
- [x] Parallelize ternary forward impl (Where)
- [x] Parallelize nary (Operator that can take >=1 operands)
- [x] Enable conformance tests if workable
## Performance
### i7-12700K, RAM 64GB, Ubuntu 22.04
```
Geometric mean (ms)
Name of Test opencv opencv opencv
perf perf perf
core.x64.0606 core.x64.0606 core.x64.0606
vs
opencv
perf
core.x64.0606
(x-factor)
NCHW_C_sum::Layer_NaryEltwise::OCV/CPU 16.116 11.161 1.44
NCHW_NCHW_add::Layer_NaryEltwise::OCV/CPU 17.469 11.446 1.53
NCHW_NCHW_div::Layer_NaryEltwise::OCV/CPU 17.531 11.469 1.53
NCHW_NCHW_equal::Layer_NaryEltwise::OCV/CPU 28.653 13.682 2.09
NCHW_NCHW_greater::Layer_NaryEltwise::OCV/CPU 21.899 13.422 1.63
NCHW_NCHW_less::Layer_NaryEltwise::OCV/CPU 21.738 13.185 1.65
NCHW_NCHW_max::Layer_NaryEltwise::OCV/CPU 16.172 11.473 1.41
NCHW_NCHW_mean::Layer_NaryEltwise::OCV/CPU 16.309 11.565 1.41
NCHW_NCHW_min::Layer_NaryEltwise::OCV/CPU 16.166 11.454 1.41
NCHW_NCHW_mul::Layer_NaryEltwise::OCV/CPU 16.157 11.443 1.41
NCHW_NCHW_pow::Layer_NaryEltwise::OCV/CPU 163.459 15.234 10.73
NCHW_NCHW_ref_div::Layer_NaryEltwise::OCV/CPU 10.880 10.868 1.00
NCHW_NCHW_ref_max::Layer_NaryEltwise::OCV/CPU 10.947 11.058 0.99
NCHW_NCHW_ref_min::Layer_NaryEltwise::OCV/CPU 10.948 10.910 1.00
NCHW_NCHW_ref_mul::Layer_NaryEltwise::OCV/CPU 10.874 10.871 1.00
NCHW_NCHW_ref_sum::Layer_NaryEltwise::OCV/CPU 10.971 10.920 1.00
NCHW_NCHW_sub::Layer_NaryEltwise::OCV/CPU 17.546 11.462 1.53
NCHW_NCHW_sum::Layer_NaryEltwise::OCV/CPU 16.175 11.475 1.41
NHWC_C::Layer_NaryEltwise::OCV/CPU 11.339 11.333 1.00
NHWC_H::Layer_NaryEltwise::OCV/CPU 16.154 11.102 1.46
```
### Apple M1, RAM 16GB, macOS 14.4.1
```
Geometric mean (ms)
Name of Test opencv opencv opencv
perf perf perf
core.m1.0606 core.m1.0606.patch core.m1.0606.patch
vs
opencv
perf
core.m1.0606
(x-factor)
NCHW_C_sum::Layer_NaryEltwise::OCV/CPU 28.418 3.768 7.54
NCHW_NCHW_add::Layer_NaryEltwise::OCV/CPU 6.942 5.679 1.22
NCHW_NCHW_div::Layer_NaryEltwise::OCV/CPU 5.822 5.653 1.03
NCHW_NCHW_equal::Layer_NaryEltwise::OCV/CPU 5.751 5.628 1.02
NCHW_NCHW_greater::Layer_NaryEltwise::OCV/CPU 5.797 5.599 1.04
NCHW_NCHW_less::Layer_NaryEltwise::OCV/CPU 7.272 5.578 1.30
NCHW_NCHW_max::Layer_NaryEltwise::OCV/CPU 5.777 5.562 1.04
NCHW_NCHW_mean::Layer_NaryEltwise::OCV/CPU 5.819 5.559 1.05
NCHW_NCHW_min::Layer_NaryEltwise::OCV/CPU 5.830 5.574 1.05
NCHW_NCHW_mul::Layer_NaryEltwise::OCV/CPU 5.759 5.567 1.03
NCHW_NCHW_pow::Layer_NaryEltwise::OCV/CPU 342.260 74.655 4.58
NCHW_NCHW_ref_div::Layer_NaryEltwise::OCV/CPU 8.338 8.280 1.01
NCHW_NCHW_ref_max::Layer_NaryEltwise::OCV/CPU 8.359 8.309 1.01
NCHW_NCHW_ref_min::Layer_NaryEltwise::OCV/CPU 8.412 8.295 1.01
NCHW_NCHW_ref_mul::Layer_NaryEltwise::OCV/CPU 8.380 8.297 1.01
NCHW_NCHW_ref_sum::Layer_NaryEltwise::OCV/CPU 8.356 8.323 1.00
NCHW_NCHW_sub::Layer_NaryEltwise::OCV/CPU 6.818 5.561 1.23
NCHW_NCHW_sum::Layer_NaryEltwise::OCV/CPU 5.805 5.570 1.04
NHWC_C::Layer_NaryEltwise::OCV/CPU 3.834 4.817 0.80
NHWC_H::Layer_NaryEltwise::OCV/CPU 28.402 3.771 7.53
```
### Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake1 parent a8d1373 commit a7fd944Copy full SHA for a7fd944
File tree
Expand file treeCollapse file tree
14 files changed
+563
-404
lines changedFilter options
- modules/dnn
- src
- cuda4dnn
- kernels
- primitives
- cuda
- layers
- onnx
- test
Expand file treeCollapse file tree
14 files changed
+563
-404
lines changedmodules/dnn/src/cuda/eltwise_ops.cu
Copy file name to clipboardExpand all lines: modules/dnn/src/cuda/eltwise_ops.cu+7Lines changed: 7 additions & 0 deletions
Original file line number | Diff line number | Diff line change | |
---|---|---|---|
| |||
350 | 350 |
| |
351 | 351 |
| |
352 | 352 |
| |
| 353 | + | |
| 354 | + | |
| 355 | + | |
| 356 | + | |
| 357 | + | |
353 | 358 |
| |
354 | 359 |
| |
355 | 360 |
| |
| |||
360 | 365 |
| |
361 | 366 |
| |
362 | 367 |
| |
| 368 | + | |
363 | 369 |
| |
364 | 370 |
| |
365 | 371 |
| |
| |||
370 | 376 |
| |
371 | 377 |
| |
372 | 378 |
| |
| 379 | + | |
373 | 380 |
| |
374 | 381 |
|
modules/dnn/src/cuda/functors.hpp
Copy file name to clipboardExpand all lines: modules/dnn/src/cuda/functors.hpp+15Lines changed: 15 additions & 0 deletions
Original file line number | Diff line number | Diff line change | |
---|---|---|---|
| |||
833 | 833 |
| |
834 | 834 |
| |
835 | 835 |
| |
| 836 | + | |
| 837 | + | |
| 838 | + | |
| 839 | + | |
| 840 | + | |
| 841 | + | |
| 842 | + | |
| 843 | + | |
| 844 | + | |
| 845 | + | |
| 846 | + | |
| 847 | + | |
| 848 | + | |
| 849 | + | |
| 850 | + | |
836 | 851 |
| |
837 | 852 |
| |
838 | 853 |
|
modules/dnn/src/cuda4dnn/kernels/eltwise_ops.hpp
Copy file name to clipboardExpand all lines: modules/dnn/src/cuda4dnn/kernels/eltwise_ops.hpp+3Lines changed: 3 additions & 0 deletions
Original file line number | Diff line number | Diff line change | |
---|---|---|---|
| |||
39 | 39 |
| |
40 | 40 |
| |
41 | 41 |
| |
| 42 | + | |
| 43 | + | |
| 44 | + | |
42 | 45 |
| |
43 | 46 |
| |
44 | 47 |
|
modules/dnn/src/cuda4dnn/primitives/eltwise.hpp
Copy file name to clipboardExpand all lines: modules/dnn/src/cuda4dnn/primitives/eltwise.hpp+8-3Lines changed: 8 additions & 3 deletions
Original file line number | Diff line number | Diff line change | |
---|---|---|---|
| |||
30 | 30 |
| |
31 | 31 |
| |
32 | 32 |
| |
| 33 | + | |
33 | 34 |
| |
34 | 35 |
| |
35 | 36 |
| |
| |||
62 | 63 |
| |
63 | 64 |
| |
64 | 65 |
| |
65 |
| - | |
66 | 66 |
| |
67 | 67 |
| |
68 | 68 |
| |
| |||
94 | 94 |
| |
95 | 95 |
| |
96 | 96 |
| |
| 97 | + | |
97 | 98 |
| |
98 |
| - | |
99 |
| - | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
100 | 104 |
| |
101 | 105 |
| |
102 | 106 |
| |
| |||
128 | 132 |
| |
129 | 133 |
| |
130 | 134 |
| |
| 135 | + | |
131 | 136 |
| |
132 | 137 |
| |
133 | 138 |
| |
|
You can’t perform that action at this time.
0 commit comments