CARVIEW |
Navigation Menu
-
-
Notifications
You must be signed in to change notification settings - Fork 56.2k
Optimization based on RISC-V P Packed SIMD Extension v0.5.2 #24556
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@mshabunin Is it possible to add P extension to QEMU configuration on CI? It should help a lot. |
@Junyan721113, thank you for the contribution! This is a useful effort. In the long term, however, it will be extremely difficult for our small team to maintain 1000 different branches of the same code. We do it, sometimes, for critical paths in critical modules, such as deep learning convolution etc., but for general-purpose functions using platform-specific intrinsics is too much. Please, consider implementing universal intrinsics backend instead: https://github.com/opencv/opencv/tree/4.x/modules/core/include/opencv2/core/hal. In this case many hundreds of optimized loops in OpenCV can immediately make use of these instructions. Many other backends rely on 128-bit extensions, whereas P-extension is 64-bit, as far as I know. The solution could be to use a pair of registers to emulate 128-bit simd register. |
I have several questions, concerns and suggestions. Lower level or technical:
Higher level or more strategic questions and proposals:
|
Thank you for your guidance! Most of the current optimizations for P extensions are where other platform-specific optimizations already exist (such as int8layers/layers_common.simd.hpp). I would like to know exactly what parts of the code "critical paths in critical modules" refer to, so that P extensions can be optimized in other ways if Universal Intrinsics is not possible.
However, I'm sorry to say that I'm currently having trouble implementing Universal Intrinsics with the P extension for the following reasons:
|
This is my fault. RVP v0.5.2 should use
I'm sorry, but Andes toolchain uses
As a test outside of this PR, A 3rdparty component called
T-Head DSP implementation does not support
Supporting only v0.5.2 might be the best solution of this PR.
Communication has been made with Andes, development board will soon be available for perfromance tests.
I'm sorry, but currently I don't know about any plans related to Andes adding support to mainline. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest simplifying CPU-feature part: instead of adding RVP052 as a separate CPU feature, let's use custom macro defined in cmake toolchain file, like it is done in platforms/linux/riscv64-071-gcc.toolchain.cmake.
Basically you have to revert all core modifications and add some macro definition to the riscv64-andes-gcc.toolchain.cmake (e.g. -D__riscv_andes_rvp052
or maybe there is one built into the compiler already?). Then use plain #ifdef
guard for optimized code sections.
Tricky part is dispatched fastConv
, fastDepthwiseConv
and fastGEMM
- I suggest adding new files conv_depthwise.rvp052.cpp/.hpp
with your implementation and include/call it if that macro is enabled.
Probably some additional cmake variable should be set in the toolchain file, so that dnn/CMakeLists.txt
would know when to add new rvp052.cpp files to the build (or it can be just guarded by the same macro and added to the build unconditionally).
cc @opencv-alalek , what do you think?
CPU features uses common principles for detection / control / compilation / execution and diagnostic.
Could we reuse generic RISC-V toolchains? (with appropriate CPU_BASELINE/CPU_DISPATCH CMake parameters) |
Yes, in general I agree, but in this specific case - limited HW availability, specialized toolchain, non-ratified extension, which is not available in generic toolchains - it looks more like RVV 0.7.1. Also there is no actual runtime check for this extension, so dispatched implementations do not make sense, in this PR dispatching was implemented only because of DNN module specifics (no So, IMHO experimental less-invasive approach similar to early RVV 0.7.1 would fit better than generalized P-extension support. Later, when various implementations converge to some stable form and the extension is supported in the upstream, we will implement it as a full-fledged CPU feature. |
Files with As for marcos, there are 2 marcos called Meanwhile, I wonder if it is acceptable to implement all these 3 convolution functions inside one In total, is the following code acceptable? // modules/core/include/opencv2/core/cv_cpu_dispatch.h
#if defined(__riscv) && defined(__riscv_dsp) && defined(__ANDES)
# include <nds_intrinsic.h>
# define CV_RVP052 1
#endif // modules/dnn/src/int8layers/layers_common.simd.hpp
#include "layers_common.dispatch.hpp" // modules/dnn/src/int8layers/layers_common.dispatch.cpp
namespace cv {
namespace dnn {
namespace opt_RVP052 {
#if CV_RVP052
//RVP Optimizations // modules/dnn/src/int8layers/convolution_layer.cpp
#if CV_RVP052
if(isConv2D)
opt_RVP052::fastDepthwiseConv(wptr, kernel_h, kernel_w,
stride_h, stride_w, dilation_h, dilation_w, pad_t, pad_l,
biasptr, multptr, inptr_, height, width, outptr_, out_d, outH, outW, inpZp, outZp);
else |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest renaming files to something like layers_rvp052.cpp/.hpp
to avoid confusion with .dispatch
files in other modules because they usually serve different purpose.
Disable whole .cpp
body if macro is not defined or is false and include .hpp
file into layers_common.hpp
with the same macro condition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -969,6 +971,13 @@ class ConvolutionLayerInt8Impl CV_FINAL : public BaseConvolutionLayerInt8Impl | |||
stride_h, stride_w, dilation_h, dilation_w, pad_t, pad_l, | |||
biasptr, multptr, inptr_, height, width, outptr_, out_d, outH, outW, inpZp, outZp); | |||
else | |||
#endif | |||
#if CV_RVP052 | |||
if(useRVP052) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
useRVP052
is always the same as CV_RVP052
and does not have external interface, so I suggest removing boolean flag completely. Here and in other files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fully_connected_layer.cpp
this is absolutely right. But in convolution_layer.cpp
, useRVP052
is not always the same as CV_RVP052
, because of line 769
p.useRVP052 = CV_RVP052 && isConv2D;
introducing a little difference.
So change this boolean flag into isConv2D
might be better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest moving these changes to the dnn module, maybe to int8layers/layers_common.hpp
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In layers_rvp052.cpp
, including layers_common.hpp
to get CV_RVP052
could cause HAVE_OPENCL
malfunction as follows:
In file included from /home/junyan/opencv_rvp/modules/dnn/src/int8layers/./layers_common.hpp:17,
from /home/junyan/opencv_rvp/modules/dnn/src/int8layers/layers_rvp052.cpp:5:
/home/junyan/opencv_rvp/modules/dnn/src/int8layers/./../ocl4dnn/include/ocl4dnn.hpp:196:9: error: 'ocl' does not name a type; did you mean 'ogl'?
196 | ocl::Program compileKernel();
| ^~~
| ogl
So maybe moving them into layers_rvp052.hpp
is better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Modifications in this file will not be necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Development boards for accuracy test and performance test have been set up, results will soon come out. |
3rdparty: NDSRVP - A New 3rdparty Library with Optimizations Based on RISC-V P Extension v0.5.2 - Part 1: Basic Functions #25167 # Summary ### Previous context From PR #24556: >> * As you wrote, the P-extension differs from RVV thus can not be easily implemented via Universal Intrinsics mechanism, but there is another HAL mechanism for lower-level CPU optimizations which is used by the [Carotene](https://github.com/opencv/opencv/tree/4.x/3rdparty/carotene) library on ARM platforms. I suggest moving all non-dnn code to similar third-party component. For example, FAST algorithm should allow such optimization-shortcut: see https://github.com/opencv/opencv/blob/4.x/modules/features2d/src/hal_replacement.hpp >> Reference documentation is here: >> >> * https://docs.opencv.org/4.x/d1/d1b/group__core__hal__interface.html >> * https://docs.opencv.org/4.x/dd/d8b/group__imgproc__hal__interface.html >> * https://docs.opencv.org/4.x/db/d47/group__features2d__hal__interface.html >> * Carotene library is turned on here: https://github.com/opencv/opencv/blob/8bbf08f0de9c387c12afefdb05af7780d989e4c3/CMakeLists.txt#L906-L911 > As a test outside of this PR, A 3rdparty component called ndsrvp is created, containing one of the non-dnn code (integral_SIMD), and it works very well. > All the non-dnn code in this PR have been removed, currently this PR can be focused on dnn optinizations. > This HAL mechanism is quite suitable for rvp optimizations, all the non-dnn code is expected to be moved into ndsrvp soon. ### Progress #### Part 1 (This PR) - [Core](https://docs.opencv.org/4.x/d1/d1b/group__core__hal__interface.html) - [x] Element-wise add and subtract - [x] Element-wise minimum or maximum - [x] Element-wise absolute difference - [x] Bitwise logical operations - [x] Element-wise compare - [ImgProc](https://docs.opencv.org/4.x/dd/d8b/group__imgproc__hal__interface.html) - [x] Integral - [x] Threshold - [x] WarpAffine - [x] WarpPerspective - [Features2D](https://docs.opencv.org/4.x/db/d47/group__features2d__hal__interface.html) #### Part 2 (Next PR) **Rough Estimate. Todo List May Change.** - [Core](https://docs.opencv.org/4.x/d1/d1b/group__core__hal__interface.html) - [ImgProc](https://docs.opencv.org/4.x/dd/d8b/group__imgproc__hal__interface.html) - smaller remap HAL interface - AdaptiveThreshold - BoxFilter - Canny - Convert - Filter - GaussianBlur - MedianBlur - Morph - Pyrdown - Resize - Scharr - SepFilter - Sobel - [Features2D](https://docs.opencv.org/4.x/db/d47/group__features2d__hal__interface.html) - FAST ### Performance Tests The optimization does not contain floating point opreations. **Absolute Difference** Geometric mean (ms) |Name of Test|opencv perf core Absdiff|opencv perf core Absdiff|opencv perf core Absdiff vs opencv perf core Absdiff (x-factor)| |---|:-:|:-:|:-:| |Absdiff::OCL_AbsDiffFixture::(640x480, 8UC1)|23.104|5.972|3.87| |Absdiff::OCL_AbsDiffFixture::(640x480, 32FC1)|39.500|40.830|0.97| |Absdiff::OCL_AbsDiffFixture::(640x480, 8UC3)|69.155|15.051|4.59| |Absdiff::OCL_AbsDiffFixture::(640x480, 32FC3)|118.715|120.509|0.99| |Absdiff::OCL_AbsDiffFixture::(640x480, 8UC4)|93.001|19.770|4.70| |Absdiff::OCL_AbsDiffFixture::(640x480, 32FC4)|161.136|160.791|1.00| |Absdiff::OCL_AbsDiffFixture::(1280x720, 8UC1)|69.211|15.140|4.57| |Absdiff::OCL_AbsDiffFixture::(1280x720, 32FC1)|118.762|119.263|1.00| |Absdiff::OCL_AbsDiffFixture::(1280x720, 8UC3)|212.414|44.692|4.75| |Absdiff::OCL_AbsDiffFixture::(1280x720, 32FC3)|367.512|366.569|1.00| |Absdiff::OCL_AbsDiffFixture::(1280x720, 8UC4)|285.337|59.708|4.78| |Absdiff::OCL_AbsDiffFixture::(1280x720, 32FC4)|490.395|491.118|1.00| |Absdiff::OCL_AbsDiffFixture::(1920x1080, 8UC1)|158.827|33.462|4.75| |Absdiff::OCL_AbsDiffFixture::(1920x1080, 32FC1)|273.503|273.668|1.00| |Absdiff::OCL_AbsDiffFixture::(1920x1080, 8UC3)|484.175|100.520|4.82| |Absdiff::OCL_AbsDiffFixture::(1920x1080, 32FC3)|828.758|829.689|1.00| |Absdiff::OCL_AbsDiffFixture::(1920x1080, 8UC4)|648.592|137.195|4.73| |Absdiff::OCL_AbsDiffFixture::(1920x1080, 32FC4)|1116.755|1109.587|1.01| |Absdiff::OCL_AbsDiffFixture::(3840x2160, 8UC1)|648.715|134.875|4.81| |Absdiff::OCL_AbsDiffFixture::(3840x2160, 32FC1)|1115.939|1113.818|1.00| |Absdiff::OCL_AbsDiffFixture::(3840x2160, 8UC3)|1944.791|413.420|4.70| |Absdiff::OCL_AbsDiffFixture::(3840x2160, 32FC3)|3354.193|3324.672|1.01| |Absdiff::OCL_AbsDiffFixture::(3840x2160, 8UC4)|2594.585|553.486|4.69| |Absdiff::OCL_AbsDiffFixture::(3840x2160, 32FC4)|4473.543|4438.453|1.01| **Bitwise Operation** Geometric mean (ms) |Name of Test|opencv perf core Bit|opencv perf core Bit|opencv perf core Bit vs opencv perf core Bit (x-factor)| |---|:-:|:-:|:-:| |Bitwise_and::OCL_BitwiseAndFixture::(640x480, 8UC1)|22.542|4.971|4.53| |Bitwise_and::OCL_BitwiseAndFixture::(640x480, 32FC1)|90.210|19.917|4.53| |Bitwise_and::OCL_BitwiseAndFixture::(640x480, 8UC3)|68.429|15.037|4.55| |Bitwise_and::OCL_BitwiseAndFixture::(640x480, 32FC3)|280.168|59.239|4.73| |Bitwise_and::OCL_BitwiseAndFixture::(640x480, 8UC4)|90.565|19.735|4.59| |Bitwise_and::OCL_BitwiseAndFixture::(640x480, 32FC4)|374.695|79.257|4.73| |Bitwise_and::OCL_BitwiseAndFixture::(1280x720, 8UC1)|67.824|14.873|4.56| |Bitwise_and::OCL_BitwiseAndFixture::(1280x720, 32FC1)|279.514|59.232|4.72| |Bitwise_and::OCL_BitwiseAndFixture::(1280x720, 8UC3)|208.337|44.234|4.71| |Bitwise_and::OCL_BitwiseAndFixture::(1280x720, 32FC3)|851.211|182.522|4.66| |Bitwise_and::OCL_BitwiseAndFixture::(1280x720, 8UC4)|279.529|59.095|4.73| |Bitwise_and::OCL_BitwiseAndFixture::(1280x720, 32FC4)|1132.065|244.877|4.62| |Bitwise_and::OCL_BitwiseAndFixture::(1920x1080, 8UC1)|155.685|33.078|4.71| |Bitwise_and::OCL_BitwiseAndFixture::(1920x1080, 32FC1)|635.253|137.482|4.62| |Bitwise_and::OCL_BitwiseAndFixture::(1920x1080, 8UC3)|474.494|100.166|4.74| |Bitwise_and::OCL_BitwiseAndFixture::(1920x1080, 32FC3)|1907.340|412.841|4.62| |Bitwise_and::OCL_BitwiseAndFixture::(1920x1080, 8UC4)|635.538|134.544|4.72| |Bitwise_and::OCL_BitwiseAndFixture::(1920x1080, 32FC4)|2552.666|556.397|4.59| |Bitwise_and::OCL_BitwiseAndFixture::(3840x2160, 8UC1)|634.736|136.355|4.66| |Bitwise_and::OCL_BitwiseAndFixture::(3840x2160, 32FC1)|2548.283|561.827|4.54| |Bitwise_and::OCL_BitwiseAndFixture::(3840x2160, 8UC3)|1911.454|421.571|4.53| |Bitwise_and::OCL_BitwiseAndFixture::(3840x2160, 32FC3)|7663.803|1677.289|4.57| |Bitwise_and::OCL_BitwiseAndFixture::(3840x2160, 8UC4)|2543.983|562.780|4.52| |Bitwise_and::OCL_BitwiseAndFixture::(3840x2160, 32FC4)|10211.693|2237.393|4.56| |Bitwise_not::OCL_BitwiseNotFixture::(640x480, 8UC1)|22.341|4.811|4.64| |Bitwise_not::OCL_BitwiseNotFixture::(640x480, 32FC1)|89.975|19.288|4.66| |Bitwise_not::OCL_BitwiseNotFixture::(640x480, 8UC3)|67.237|14.643|4.59| |Bitwise_not::OCL_BitwiseNotFixture::(640x480, 32FC3)|276.324|58.609|4.71| |Bitwise_not::OCL_BitwiseNotFixture::(640x480, 8UC4)|89.587|19.554|4.58| |Bitwise_not::OCL_BitwiseNotFixture::(640x480, 32FC4)|370.986|77.136|4.81| |Bitwise_not::OCL_BitwiseNotFixture::(1280x720, 8UC1)|67.227|14.541|4.62| |Bitwise_not::OCL_BitwiseNotFixture::(1280x720, 32FC1)|276.357|58.076|4.76| |Bitwise_not::OCL_BitwiseNotFixture::(1280x720, 8UC3)|206.752|43.376|4.77| |Bitwise_not::OCL_BitwiseNotFixture::(1280x720, 32FC3)|841.638|177.787|4.73| |Bitwise_not::OCL_BitwiseNotFixture::(1280x720, 8UC4)|276.773|57.784|4.79| |Bitwise_not::OCL_BitwiseNotFixture::(1280x720, 32FC4)|1127.740|237.472|4.75| |Bitwise_not::OCL_BitwiseNotFixture::(1920x1080, 8UC1)|153.808|32.531|4.73| |Bitwise_not::OCL_BitwiseNotFixture::(1920x1080, 32FC1)|627.765|129.990|4.83| |Bitwise_not::OCL_BitwiseNotFixture::(1920x1080, 8UC3)|469.799|98.249|4.78| |Bitwise_not::OCL_BitwiseNotFixture::(1920x1080, 32FC3)|1893.591|403.694|4.69| |Bitwise_not::OCL_BitwiseNotFixture::(1920x1080, 8UC4)|627.724|129.962|4.83| |Bitwise_not::OCL_BitwiseNotFixture::(1920x1080, 32FC4)|2529.967|540.744|4.68| |Bitwise_not::OCL_BitwiseNotFixture::(3840x2160, 8UC1)|628.089|130.277|4.82| |Bitwise_not::OCL_BitwiseNotFixture::(3840x2160, 32FC1)|2521.817|540.146|4.67| |Bitwise_not::OCL_BitwiseNotFixture::(3840x2160, 8UC3)|1905.004|404.704|4.71| |Bitwise_not::OCL_BitwiseNotFixture::(3840x2160, 32FC3)|7567.971|1627.898|4.65| |Bitwise_not::OCL_BitwiseNotFixture::(3840x2160, 8UC4)|2531.476|540.181|4.69| |Bitwise_not::OCL_BitwiseNotFixture::(3840x2160, 32FC4)|10075.594|2181.654|4.62| |Bitwise_or::OCL_BitwiseOrFixture::(640x480, 8UC1)|22.566|5.076|4.45| |Bitwise_or::OCL_BitwiseOrFixture::(640x480, 32FC1)|90.391|19.928|4.54| |Bitwise_or::OCL_BitwiseOrFixture::(640x480, 8UC3)|67.758|14.740|4.60| |Bitwise_or::OCL_BitwiseOrFixture::(640x480, 32FC3)|279.253|59.844|4.67| |Bitwise_or::OCL_BitwiseOrFixture::(640x480, 8UC4)|90.296|19.802|4.56| |Bitwise_or::OCL_BitwiseOrFixture::(640x480, 32FC4)|373.972|79.815|4.69| |Bitwise_or::OCL_BitwiseOrFixture::(1280x720, 8UC1)|67.815|14.865|4.56| |Bitwise_or::OCL_BitwiseOrFixture::(1280x720, 32FC1)|279.398|60.054|4.65| |Bitwise_or::OCL_BitwiseOrFixture::(1280x720, 8UC3)|208.643|45.043|4.63| |Bitwise_or::OCL_BitwiseOrFixture::(1280x720, 32FC3)|850.042|180.985|4.70| |Bitwise_or::OCL_BitwiseOrFixture::(1280x720, 8UC4)|279.363|60.385|4.63| |Bitwise_or::OCL_BitwiseOrFixture::(1280x720, 32FC4)|1134.858|243.062|4.67| |Bitwise_or::OCL_BitwiseOrFixture::(1920x1080, 8UC1)|155.212|33.155|4.68| |Bitwise_or::OCL_BitwiseOrFixture::(1920x1080, 32FC1)|634.985|134.911|4.71| |Bitwise_or::OCL_BitwiseOrFixture::(1920x1080, 8UC3)|474.648|100.407|4.73| |Bitwise_or::OCL_BitwiseOrFixture::(1920x1080, 32FC3)|1912.049|414.184|4.62| |Bitwise_or::OCL_BitwiseOrFixture::(1920x1080, 8UC4)|635.252|132.587|4.79| |Bitwise_or::OCL_BitwiseOrFixture::(1920x1080, 32FC4)|2544.471|560.737|4.54| |Bitwise_or::OCL_BitwiseOrFixture::(3840x2160, 8UC1)|634.574|134.966|4.70| |Bitwise_or::OCL_BitwiseOrFixture::(3840x2160, 32FC1)|2545.129|561.498|4.53| |Bitwise_or::OCL_BitwiseOrFixture::(3840x2160, 8UC3)|1910.900|419.365|4.56| |Bitwise_or::OCL_BitwiseOrFixture::(3840x2160, 32FC3)|7662.603|1685.812|4.55| |Bitwise_or::OCL_BitwiseOrFixture::(3840x2160, 8UC4)|2548.971|560.787|4.55| |Bitwise_or::OCL_BitwiseOrFixture::(3840x2160, 32FC4)|10201.407|2237.552|4.56| |Bitwise_xor::OCL_BitwiseXorFixture::(640x480, 8UC1)|22.718|4.961|4.58| |Bitwise_xor::OCL_BitwiseXorFixture::(640x480, 32FC1)|91.496|19.831|4.61| |Bitwise_xor::OCL_BitwiseXorFixture::(640x480, 8UC3)|67.910|15.151|4.48| |Bitwise_xor::OCL_BitwiseXorFixture::(640x480, 32FC3)|279.612|59.792|4.68| |Bitwise_xor::OCL_BitwiseXorFixture::(640x480, 8UC4)|91.073|19.853|4.59| |Bitwise_xor::OCL_BitwiseXorFixture::(640x480, 32FC4)|374.641|79.155|4.73| |Bitwise_xor::OCL_BitwiseXorFixture::(1280x720, 8UC1)|67.704|15.008|4.51| |Bitwise_xor::OCL_BitwiseXorFixture::(1280x720, 32FC1)|279.229|60.088|4.65| |Bitwise_xor::OCL_BitwiseXorFixture::(1280x720, 8UC3)|208.156|44.426|4.69| |Bitwise_xor::OCL_BitwiseXorFixture::(1280x720, 32FC3)|849.501|180.848|4.70| |Bitwise_xor::OCL_BitwiseXorFixture::(1280x720, 8UC4)|279.642|59.728|4.68| |Bitwise_xor::OCL_BitwiseXorFixture::(1280x720, 32FC4)|1129.826|242.880|4.65| |Bitwise_xor::OCL_BitwiseXorFixture::(1920x1080, 8UC1)|155.585|33.354|4.66| |Bitwise_xor::OCL_BitwiseXorFixture::(1920x1080, 32FC1)|634.090|134.995|4.70| |Bitwise_xor::OCL_BitwiseXorFixture::(1920x1080, 8UC3)|474.931|99.598|4.77| |Bitwise_xor::OCL_BitwiseXorFixture::(1920x1080, 32FC3)|1910.519|413.138|4.62| |Bitwise_xor::OCL_BitwiseXorFixture::(1920x1080, 8UC4)|635.026|135.155|4.70| |Bitwise_xor::OCL_BitwiseXorFixture::(1920x1080, 32FC4)|2560.167|560.838|4.56| |Bitwise_xor::OCL_BitwiseXorFixture::(3840x2160, 8UC1)|634.893|134.883|4.71| |Bitwise_xor::OCL_BitwiseXorFixture::(3840x2160, 32FC1)|2548.166|560.831|4.54| |Bitwise_xor::OCL_BitwiseXorFixture::(3840x2160, 8UC3)|1911.392|419.816|4.55| |Bitwise_xor::OCL_BitwiseXorFixture::(3840x2160, 32FC3)|7646.634|1677.988|4.56| |Bitwise_xor::OCL_BitwiseXorFixture::(3840x2160, 8UC4)|2560.637|560.805|4.57| |Bitwise_xor::OCL_BitwiseXorFixture::(3840x2160, 32FC4)|10227.044|2249.458|4.55| ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [ ] The feature is well documented and sample code can be built with the project CMake
Summary
Provides OpenCV optimizations for the RISC-V P extension (v0.5.2).
The writer of the code and the author of the PR is an intern at ISCAS (Institute of Software, Chinese Academy of Sciences).
List of RVP optimizations
Correctness validation (QEMU)
opencv_test_dnn_rvp Consistent with control (before adding RVP optimization)
opencv_test_imgproc_rvp Consistent with controls
opencv_test_features2d_rvp Consistent with controls
Q&A
Why RVP ?
As a lightweight extension, there is some potential for P extensions to be used in the embedded domain.
Why v0.5.2 ?
Although RVP is not frozen, Andes has massive plans based on version 0.5.2, just like T-Head and RVV071.
Why not Universal Intrinsics ?
RVP052 has no floating-point arithmetic and only supports parallel arithmetic up to 64 bits, which makes it less capable of implementing Universal Intrinsics, and thus most of its optimizations refer to existing function-specific optimizations.
How to perform tests ?
The correctness tests are as follows. (Due to hardware issues, performance test results are not available at this time)
Environment
Toolchain
nds-gnu-toolchain
build_linux_toolchain.sh
TARGET=riscv64-linux PREFIX=/opt/andes ARCH=rv64imafdcxandes ABI=lp64d CPU=andes-25-series XLEN=64 BUILD=`pwd`/build-nds64le-linux-glibc-v5d
Qemu
qemu
Build
Related Tests
dnn module test
imgproc module test
features2d module test
Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
Patch to opencv_extra has the same branch name.