CARVIEW |
Navigation Menu
-
Notifications
You must be signed in to change notification settings - Fork 5.8k
Rework RLOF by using HAL universal instructions #2476
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Update master branch
Update master branch
* SSE2 now implements no paralellized _blendv functiona and allows to compile with SSE2 instructions
* bugfix blendv functions
…v_epi function to fix compiler error
…en compiling with less than SSE 4.1 support
…/opencv_contrib into optimize_performance_rlof
* first tests are OK
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great job! Well done!
Please take a look on some comments below.
* @param val If true M-estimator is used. If false least-square estimator is used. | ||
* @see setNormSigma0, setNormSigma1 | ||
*/ | ||
CV_WRAP void useMEstimator(bool val); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
setUseMEstimator ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes mayby setUseMEstimator is more consistent.
@@ -216,7 +224,7 @@ class CV_EXPORTS_W RLOFOpticalFlowParameter{ | |||
* For the RLOF configuration see optflow::RLOFOpticalFlowParameter for further details. | |||
* Parameters have been described in @cite Senst2012 @cite Senst2013 @cite Senst2014 and @cite Senst2016. | |||
* | |||
* @note SIMD parallelization is only available when compiling with SSE4.1. If the grid size is set to (1,1) and the | |||
* @note Full SIMD parallelization is only available when compiling with SSE4.1. If the grid size is set to (1,1) and the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
parallelization => optimization ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh thank you. Forgot to remove this comment as it is obsolete with the HAL instructions. Comment have been removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment is still correct.
-DCPU_BASELINE=SSE4_1
or higher should be used to generate optimized code.
Currently runtime dispatching is not enabled for this code (.simd.hpp
files).
It is better to do this as a separate patch after collecting of performance data with different CPU_BASELINE
values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With "SIMD parallelization" my intend was to say that the code block with the RLOF_SSE define will not be used. Now with the HAL instructions the CV_SIMD128 block will be used in most cases, even if some instructions woun't be dispatched. Therefore I would like to remove this comment.
/* | ||
*/ | ||
for( y = 0; y < _winSize.height; y++ ) | ||
#ifdef CV_SIMD128 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#if CV_SIMD128
(-Wundef
from GCC would catch if we miss required headers)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replaced all #ifdef CV_SIMD128 with #if CV_SIMD128
|
||
for( ; x <= winSize.width*cn; x += 8, dIptr += 8*2 ) | ||
#ifdef CV_SIMD128 | ||
for(int x = 0 ; x <= winSize.width*cn; x += 8, dIptr += 8*2 ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this loop iterates correctly?
- processed all elements (step is 8). Do we have a "tail"?
- it doesn't access elements outside of allocated buffers. Condition should look like:
x <= fullWidth - elementsPerIteration
There is full generic code pattern which using SIMD or other unrolling approaches.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There will be a tail, e.g. if winSize.width = 9. When I change x <= winSize.width*cn
to x <= winSize.width*cn-8
some elements are not taken into account. In order to avoid access elements outside the winBuffer. This buffer has been allocated with a width computed by int winMaskwidth = roundUp(winSize.width, 16);
(e.g. see line 72 plk_invoker.hpp)
* rename useMEstimator to setUseMEstimator
W_BITS1 - 5); | ||
|
||
int diff = J_val + illValue; | ||
int abss = (diff < 0) ? -diff : diff; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
abss
Looks like this variable is not used here (see "Linux Debug" compiler warning). Is it expected?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right. Removed line 2218 and 2219.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The next is "J_val":
berlof_invoker.hpp:2213:37: warning: unused variable 'J_val'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! I am trying to setup my visual studio, so that it shows me such errors too. Hopfully this was the last unused value
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
berlof_invoker.hpp:1971:40: warning: unused variable 'W_BITS1'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well done! Thank you for contribution 👍
All implementations of the RLOF code (RLOF, BERLOF, BEPLK, etc. ...) that used SIMD instructions have been reworked and are now based on HAL universal instructions.