You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Inspired by #24786. This PR keeps the fusion of NaryEltwise and Concat while addressed the data missing problem via supporting broadcasting if a.rank() != b.rank().
It's not true. There are some minor differences in the results between CPU and CUDA/CUDA, which is OK I think, but the differences are much bigger when it comes to the CUDA_FP16 target. I guess we lose some accuracy in Sigmoid and such. Need an in-depth investigation.
It was due to there are inputs of shape [1] (1d mat) in these failed tests. In cuda backend, there are asserts checking rank>=2. So it is not feasible to run these tests with CUDA backend without bypassing the assert checks.
It works previously because it was not actually testing the CUDA backend; If two inputs have different dimensions, it falls back to CPU implementation. So it tests nothing related to the CUDA backend in these case. See below for the fall back (Line 804-805):
It was due to there are inputs of shape [1] (1d mat) in these failed tests. In cuda backend, there are asserts checking rank>=2. So it is not feasible to run these tests with CUDA backend without bypassing the assert checks.
It works previously because it was not actually testing the CUDA backend; If two inputs have different dimensions, it falls back to CPU implementation. So it tests nothing related to the CUDA backend in these case. See below for the fall back (Line 804-805):
It was due to there are inputs of shape [1] (1d mat) in these failed tests. In cuda backend, there are asserts checking rank>=2. So it is not feasible to run these tests with CUDA backend without bypassing the assert checks.
It works previously because it was not actually testing the CUDA backend; If two inputs have different dimensions, it falls back to CPU implementation. So it tests nothing related to the CUDA backend in these case. See below for the fall back (Line 804-805):
auto input_0_shape = inputs[0].dynamicCast<CUDABackendWrapper>()->getShape();
for (int i = 1; i < inputs.size(); i++)
{
auto input_i_shape = inputs[i].dynamicCast<CUDABackendWrapper>()->getShape();
if (input_0_shape.size() != input_i_shape.size())
return Ptr<BackendNode>();
// check if the shape can be supported by `eltwise_ops.cu`, or return the default BackendNode
for (int j = 0; j < input_0_shape.size(); j++)
if (input_0_shape[j] != input_i_shape[j] &&
input_0_shape[j] != 1 && input_i_shape[j] != 1)
return Ptr<BackendNode>();
}
With that being said, I propose to turn off these tests specifically for CUDA backend. @asmorkalov What do you think? @WanliZhong Please join this talk as well.
Or we still fall back to CPU when dimension is 1.
It does not work due to the 1d mat is actually produced during the broadcasting implementation in the CUDA backend. Let me find another solution to this.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Inspired by #24786. This PR keeps the fusion of
NaryEltwise
andConcat
while addressed the data missing problem via supporting broadcasting if a.rank() != b.rank().Resolves #23977
Resolves #24606
Resolves #24635
Resolves #24721
Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
Patch to opencv_extra has the same branch name.