first proposal of cv::remap with relative displacement field (#24603) #24621

chacha21 · 2023-11-30T09:26:46Z

Implements #24603

Currently, remap() is applied as dst(x, y) <- src(mapX(x, y), mapY(x, y)) It means that the maps must be filled with absolute coordinates.

However, if one wants to remap something according to a displacement field ("warp"), the operation should be dst(x, y) <- src(x+displacementX(x, y), y+displacementY(x, y))

It is trivial to build a mapping from a displacement field, but it is an undesirable overhead for CPU and memory.

This PR implements the feature as an experimental option, through the optional flag WARP_RELATIVE_MAP than can be ORed to the interpolation mode.

Since the xy maps might be const, there is no attempt to add the coordinate offset to those maps, and everything is postponed on-the-fly to the very last coordinate computation before fetching src. Interestingly, this let cv::convertMaps() unchanged since the fractional part of interpolation does not care of the integer coordinate offset.

Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

I agree to contribute to the project under Apache 2 License.
To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
The PR is proposed to the proper branch
There is a reference to the original bug report and related work
There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
The feature is well documented and sample code can be built with the project CMake

…24603) Currently, `remap()` is applied as `dst(x, y) <- src(mapX(x, y), mapY(x, y))` It means that the maps must be filled with absolute coordinates. However, if one wants to remap something according to a displacement field ("warp"), the operation should be `dst(x, y) <- src(x+displacementX(x, y), y+displacementY(x, y))` It is trivial to build a mapping from a displacement field, but it is an undesirable overhead for CPU and memory. This PR implements the feature as an experimental option, through the optional flag WARP_RELATIVE_MAP than can be ORed to the interpolation mode. Since the xy maps might be const, there is no attempt to add the coordinate offset to those maps, and everything is postponed on-the-fly to the very last coordinate computation before fetching `src`. Interestingly, this let `cv::concertMaps()` unchanged since the fractional part of interpolation does not care of the integer coordinate offset.

restored a disabled parallel_for that was used for debugging added a check to avoid openvx when WARP_RELATIVE_MAP is used, since there is no implementation

locally on my machine, there is no more performance regression when cv::remap() without WARP_RELATIVE_MAP using WARP_RELATIVE_MAP always performs better than manually preprocessing the maps from displacement fields to absolute coordinates

added missing Neon in place bin_op implementation

…nto remap_relative

This reverts commit 32ebbf1.

operator += is not supported as wide as SSE

avoid operator += for wide intrinsics

use v_add instead of operator+

asmorkalov · 2024-01-12T11:10:02Z

@vpisarev Friendly reminder.

asmorkalov · 2024-02-05T12:43:00Z

@vpisarev Friendly reminder.

asmorkalov

Thanks a lot for the great job! The proposed option is really very useful. I apologize for large delay caused by release and some 5.0 preparation activities. General proposals:

Extend accuracy tests. The current test does not cover all touched cases. Also the test scenario looks very simple. As soon as absolute->relative displacement conversion is very simple the test expansion may be done easily.
Need to add several corner cases, e.g. absolute coordinate is close to type range
There are no performance tests. I propose to patch the existing one as in the first item.

asmorkalov · 2024-02-05T12:50:55Z

modules/imgproc/src/imgwarp.cpp

 {
    Size ssize = _src.size(), dsize = _dst.size();
+    const Point offset = _offset;


no need to create local variable.

What about performance ? don't you think that keeping a reference will prevent the compiler from optimizing access to offset.x|y in the inner loops ?

good question. we need performance test for it ;)

I ran performance test in the PR and do not see visible effect of additional constant inside implementation. I propose to remove it.

asmorkalov · 2024-02-05T12:51:27Z

modules/imgproc/src/imgwarp.cpp

-                    const ushort* FXY, const void* _wtab, int width ) const
+                    const ushort* FXY, const void* _wtab, int width, const Point& _offset ) const
    {
        int cn = _src.channels(), x = 0, sstep = (int)_src.step;
+        Point rel_offset = _offset;


no need to create local variable.

asmorkalov · 2024-02-05T12:54:03Z

modules/imgproc/src/imgwarp.cpp

 {
    typedef typename CastOp::rtype T;
    typedef typename CastOp::type1 WT;
    Size ssize = _src.size(), dsize = _dst.size();
+    const Point offset = _offset;


no need in local variable

asmorkalov · 2024-02-05T12:55:32Z

modules/imgproc/src/imgwarp.cpp

 {
    typedef typename CastOp::rtype T;
    typedef typename CastOp::type1 WT;
    Size ssize = _src.size(), dsize = _dst.size();
+    const Point offset = _offset;


no need in local variable

modules/imgproc/src/imgwarp.cpp

modules/imgproc/test/test_imgwarp.cpp

asmorkalov · 2024-02-05T13:09:26Z

modules/imgproc/test/test_imgwarp.cpp

+    cv::Mat mapRelativeX32F(size, CV_32FC1);
+    mapRelativeX32F.setTo(cv::Scalar::all(-0.33));
+
+    cv::Mat mapRelativeY32F(size, CV_32FC1);
+    mapRelativeY32F.setTo(cv::Scalar::all(-0.33));


I propose to use different values for x and y to highlight x<->y swaps in code and other related offset issues.

modules/imgproc/test/test_imgwarp.cpp

- fixed doc to add INTER_NEAREST_EXACT as not supported - use a bu_ld constant instead of an argument for the OpenCL kernel - add comment to explain why some propably useless local variable are used - extend tests. (the covered test cases were previously just copied from original remap tests.)

chacha21 · 2024-02-06T10:10:56Z

A few comments after the last commit :

do we keep the "WARP_RELATIVE_MAP" flag ? Is there a better strategy to enable that code ? Is the name OK ? (I have not mentioned it in the doc yet)
using the corner case "absolute coordinate is close to type range" : using some saturate_add would be great, but AFAIK, only OpenCL provides such an operator so far
about the "probably useless local variables", their usage will be determined by perf tests, but...
...I don't know how to write and run perf tests. I have never been able to to that on my development machine with Windows7 and broken Python.

asmorkalov · 2024-02-08T10:31:19Z

@chacha21 Thanks a lot! I added performance test for the new case. Also, please take a look on CI issues, e.g.

/Users/opencv-cn/GHA-OCV-1/_work/opencv/opencv/opencv/modules/imgproc/src/imgwarp.cpp:1353:10: warning: private field 'isRelative' is not used [-Wunused-private-field]

modules/imgproc/src/imgwarp.cpp

asmorkalov · 2024-02-08T10:32:40Z

do we keep the "WARP_RELATIVE_MAP" flag ? Is there a better strategy to enable that code ? Is the name OK ? (I have not mentioned it in the doc yet) - Looks good to me.

- fixed typo in comment - removed dead code - added WARP_RELATIVE_MAP to doc

…nto remap_relative

vpisarev · 2024-02-09T00:03:34Z

@chacha21, thank you for the contribution!

I like how OpenCL part is implemented. It's conditionally compiled code, and so it does not affect performance of the standard case. But I don't like that in CPU version there are extra conditions inside the innermost loops. And the extra registers needed to hold the pixel grid coordinates. In subsequent versions of OpenCV we would like to optimize remap further, not to slow it down. We want to keep it clean, we want to avoid any unnecessary overhead.

Here is the proposed solution. If "relative" flag is set, a tile of map(s) should be copied to a temporary buffer (probably stack-allocated buffer) and augmented there prior to calling the remap kernels instead of doing it in the remap kernels themselves. Since the offsets for x and y are integers, such method is compatible with both floating-point and the fixed-point representations of the maps.

opencv-alalek · 2024-02-09T02:13:28Z

But I don't like that in CPU version there are extra conditions inside the innermost loops

There is no such problem because there is template<bool isRelative> in the most critical paths.

to optimize remap further, not to slow it down

Just need to provide performance report for such modifications.
All PRs which provides optimization or modification of implementation should have that report.

chacha21 · 2024-02-09T06:40:38Z

But I don't like that in CPU version there are extra conditions inside the innermost loops

There is no such problem because there is template<bool isRelative> in the most critical paths.

Exactly, my first commit on this PR was a local bool isRelative and I switched to template version after to keep performance.

to optimize remap further, not to slow it down

Just need to provide performance report for such modifications. All PRs which provides optimization or modification of implementation should have that report.

I will try to run the test and report ASAP (not familiar at all with the procedure)

vpisarev · 2024-02-09T12:39:57Z

@chacha21, yes, I'm probably wrong about performance - I still see some unconditional things like vector registers holding the pixel coordinates. Maybe compiler will optimize it out, but maybe not. Besides, it basically duplicates all the remap kernels for such a very rarely used feature. I'd still suggest to do it externally by copying each tile of maps into a temporary buffer and augmenting it there. The kernels then will stay unchanged.

vpisarev

please, do mapx & mapy augmentation as a separate preprocessing step (probably, tile-by-tile, to achieve cache and thread locality), not inside interpolation kernels

chacha21 · 2024-02-10T17:52:57Z

@vpisarev
I understand, but I am not sure that tiling maps to create "augmented" copies will be better.
As far as I understand, the best place would be in an alternative RemapInvoker (perhaps a template<bool isRelative> RemapInvoker) to get rid of the template of the remap kernels.
So the tiling would occur in the virtual void RemapInvoker::operator(), something like that :

Actual :

        int x, y, x1, y1;
        const int buf_size = 1 << 14;
        int brows0 = std::min(128, dst->rows), map_depth = m1->depth();
        int bcols0 = std::min(buf_size/brows0, dst->cols);
        brows0 = std::min(buf_size/bcols0, dst->rows);
        Mat _bufxy(brows0, bcols0, CV_16SC2), _bufa;
        if( !nnfunc )
            _bufa.create(brows0, bcols0, CV_16UC1);

modified :

        int x, y, x1, y1;
        const int buf_size = 1 << 14;
        int brows0 = std::min(128, dst->rows), map_depth = m1->depth();
        int bcols0 = std::min(buf_size/brows0, dst->cols);
        brows0 = std::min(buf_size/bcols0, dst->rows);
        Mat m1AugmentedTile(brows0 , bcols0, m1->type());
        Mat m2AugmentedTile(m2->empty() ? 0 : brows0 , m2->empty() ? 0 : bcols0, m2->type());
        fillByAddingRelativeOffset(m1AugmentedTile, m1);
        fillByAddingRelativeOffset(m2AugmentedTile, m2);
        //then in the code below, use m1AugmentedTile and m2AugmentedTile instead of m1, m2
        Mat _bufxy(brows0, bcols0, CV_16SC2), _bufa;
        if( !nnfunc )
            _bufa.create(brows0, bcols0, CV_16UC1);

As the first step, I just tried to observe the overhead of the allocation of m1AugmentedTile and m2AugmentedTile, with their content copied from m1 and m2 without modification.
brows0 x bcols0 seems a little large for an AutoBuffer, so I relied on a Mat.

And the timing is not good.

original implementation :
1000 x cv::remap((1280x1024)) => 1282.797557ms
1000 x cv::remap((1280x1024)+WARP_RELATIVE_MAP => 1490.316975ms (~+15% you're right it is not negligible)

When allocating m1AugmentedTile/m2AugmentedTile in RemapInvoker::operator() :
1000 x cv::remap((1280x1024)) => 1934.893129ms

The overhead of using m1AugmentedTile/m2AugmentedTile is from the beginning larger than the current proposal for WARP_RELATIVE_MAP.

So I have to admit than WARP_RELATIVE_MAP is not free. But my idea was that the cost of using relative offsets on the fly was still cheaper than creating absolute maps from relative maps before calling remap(). I think it still holds.

What do you think ? Should I try with a stack allocation for m1AugmentedTile/m2AugmentedTile or is it a bad idea ?

asmorkalov

👍

the purpose of the variable was to bring locality but did not show measurable performance improvement

vpisarev · 2024-02-27T17:22:16Z

ok, since @asmorkalov could not reproduce any regressions on his machines, let's merge it in!

First proposal of cv::remap with relative displacement field (opencv#24603) opencv#24621 Implements opencv#24603 Currently, `remap()` is applied as `dst(x, y) <- src(mapX(x, y), mapY(x, y))` It means that the maps must be filled with absolute coordinates. However, if one wants to remap something according to a displacement field ("warp"), the operation should be `dst(x, y) <- src(x+displacementX(x, y), y+displacementY(x, y))` It is trivial to build a mapping from a displacement field, but it is an undesirable overhead for CPU and memory. This PR implements the feature as an experimental option, through the optional flag WARP_RELATIVE_MAP than can be ORed to the interpolation mode. Since the xy maps might be const, there is no attempt to add the coordinate offset to those maps, and everything is postponed on-the-fly to the very last coordinate computation before fetching `src`. Interestingly, this let `cv::convertMaps()` unchanged since the fractional part of interpolation does not care of the integer coordinate offset. ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [X] I agree to contribute to the project under Apache 2 License. - [X] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [X] The PR is proposed to the proper branch - [X] There is a reference to the original bug report and related work - [X] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [ ] The feature is well documented and sample code can be built with the project CMake

first proposal of cv::remap with relative displacement field Relates to [#24621](opencv/opencv#24621), [#24603](opencv/opencv#24603) CUDA implementation of the feature ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [X] I agree to contribute to the project under Apache 2 License. - [X] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [X] The PR is proposed to the proper branch - [X] There is a reference to the original bug report and related work - [X] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [ ] The feature is well documented and sample code can be built with the project CMake

First proposal of cv::remap with relative displacement field (opencv#24603) opencv#24621 Implements opencv#24603 Currently, `remap()` is applied as `dst(x, y) <- src(mapX(x, y), mapY(x, y))` It means that the maps must be filled with absolute coordinates. However, if one wants to remap something according to a displacement field ("warp"), the operation should be `dst(x, y) <- src(x+displacementX(x, y), y+displacementY(x, y))` It is trivial to build a mapping from a displacement field, but it is an undesirable overhead for CPU and memory. This PR implements the feature as an experimental option, through the optional flag WARP_RELATIVE_MAP than can be ORed to the interpolation mode. Since the xy maps might be const, there is no attempt to add the coordinate offset to those maps, and everything is postponed on-the-fly to the very last coordinate computation before fetching `src`. Interestingly, this let `cv::convertMaps()` unchanged since the fractional part of interpolation does not care of the integer coordinate offset. ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [X] I agree to contribute to the project under Apache 2 License. - [X] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [X] The PR is proposed to the proper branch - [X] There is a reference to the original bug report and related work - [X] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [ ] The feature is well documented and sample code can be built with the project CMake

chacha21 mentioned this pull request Nov 30, 2023

first proposal of cv::remap with relative displacement field (#24621,… opencv/opencv_contrib#3600

Merged

6 tasks

chacha21 added 5 commits November 30, 2023 11:04

fixed warnings

312fe49

fixes

f6da583

restored a disabled parallel_for that was used for debugging added a check to avoid openvx when WARP_RELATIVE_MAP is used, since there is no implementation

fixed compilation on some architectures

4845db7

Merge remote-tracking branch 'upstream/4.x' into remap_relative

72f03d9

asmorkalov added feature category: imgproc labels Dec 1, 2023

asmorkalov requested a review from vpisarev December 1, 2023 06:21

chacha21 added 2 commits December 1, 2023 07:55

fixed ARM build

32ebbf1

added missing Neon in place bin_op implementation

Merge branch 'remap_relative' of https://github.com/chacha21/opencv i…

f0d5a57

…nto remap_relative

asmorkalov added this to the 4.10.0 milestone Dec 1, 2023

chacha21 added 4 commits December 1, 2023 12:37

Revert "fixed ARM build"

b3fe19d

This reverts commit 32ebbf1.

fixed ARM64 build

308fb34

operator += is not supported as wide as SSE

working on ARM64 build

2e2f024

avoid operator += for wide intrinsics

work on ARM64 build

6871bcf

use v_add instead of operator+

asmorkalov requested changes Feb 5, 2024

View reviewed changes

asmorkalov self-assigned this Feb 5, 2024

chacha21 added 2 commits February 6, 2024 07:44

Merge branch '4.x' into remap_relative

c88f285

Added performance test for new WARP_RELATIVE_MAP in remap.

5af47f5

asmorkalov reviewed Feb 8, 2024

View reviewed changes

modules/imgproc/src/imgwarp.cpp Outdated Show resolved Hide resolved

chacha21 added 2 commits February 8, 2024 12:36

modificatiosn ass suggested by review

c37eccb

- fixed typo in comment - removed dead code - added WARP_RELATIVE_MAP to doc

Merge branch 'remap_relative' of https://github.com/chacha21/opencv i…

663bd77

…nto remap_relative

chacha21 added 2 commits February 8, 2024 13:43

fixed warning

11ff66e

Merge branch '4.x' into remap_relative

29daa50

vpisarev requested changes Feb 9, 2024

View reviewed changes

asmorkalov approved these changes Feb 27, 2024

View reviewed changes

removed useless temporary variable

3ff6f8b

the purpose of the variable was to bring locality but did not show measurable performance improvement

vpisarev self-requested a review February 27, 2024 17:21

vpisarev approved these changes Feb 27, 2024

View reviewed changes

asmorkalov merged commit 5e5a035 into opencv:4.x Feb 28, 2024

asmorkalov mentioned this pull request Feb 28, 2024

5.x merge 4.x #25119

Merged

cudawarped mentioned this pull request Oct 16, 2024

Arm64 compiles OpenCV GPU version，show：error: ‘WARP_RELATIVE_MAP’ was not declared #26312

Closed

4 tasks

Uh oh!

first proposal of cv::remap with relative displacement field (#24603) #24621

first proposal of cv::remap with relative displacement field (#24603) #24621

Uh oh!

Conversation

chacha21 commented Nov 30, 2023 • edited by asmorkalov Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request Readiness Checklist

Uh oh!

asmorkalov commented Jan 12, 2024

Uh oh!

asmorkalov commented Feb 5, 2024

Uh oh!

asmorkalov left a comment

Choose a reason for hiding this comment

Uh oh!

asmorkalov Feb 5, 2024

Choose a reason for hiding this comment

Uh oh!

chacha21 Feb 5, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

asmorkalov Feb 6, 2024

Choose a reason for hiding this comment

Uh oh!

asmorkalov Feb 27, 2024

Choose a reason for hiding this comment

Uh oh!

asmorkalov Feb 5, 2024

Choose a reason for hiding this comment

Uh oh!

asmorkalov Feb 5, 2024

Choose a reason for hiding this comment

Uh oh!

asmorkalov Feb 5, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

asmorkalov Feb 5, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

chacha21 commented Feb 6, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

asmorkalov commented Feb 8, 2024

Uh oh!

Uh oh!

asmorkalov commented Feb 8, 2024

Uh oh!

vpisarev commented Feb 9, 2024

Uh oh!

opencv-alalek commented Feb 9, 2024

Uh oh!

chacha21 commented Feb 9, 2024

Uh oh!

vpisarev commented Feb 9, 2024

Uh oh!

vpisarev left a comment

Choose a reason for hiding this comment

Uh oh!

chacha21 commented Feb 10, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

asmorkalov left a comment

Choose a reason for hiding this comment

Uh oh!

vpisarev commented Feb 27, 2024

Uh oh!

Uh oh!

chacha21 commented Nov 30, 2023 •

edited by asmorkalov

Loading

chacha21 Feb 5, 2024 •

edited

Loading

chacha21 commented Feb 6, 2024 •

edited

Loading

chacha21 commented Feb 10, 2024 •

edited

Loading