CARVIEW |
Navigation Menu
-
-
Notifications
You must be signed in to change notification settings - Fork 56.2k
OpenCV RISC V
RISC-V is an Instruction Set Architecture (ISA) that is gaining popularity as an alternative to traditional ISAs such as x86/x86_64 and ARM/AArch64. It is covered by an open-source license, which allows for royalty-free usage by both hardware and software providers.
Besides base integer instruction set RISC-V processors can implement various architecture extensions, for example F for single-precision floating point support or M for integer multiplication and division. There exists an extension for vector operations (aka SIMD - single instruction multiple data) - V (RVV), which is beneficial for high-performance computing applications like image processing, machine learning and deep learning. This extension can be leveraged by the OpenCV to achieve significant performance improvement across many algorithms. V-extension analogs in other platforms are SSE/AVX for x86_64 and NEON/SVE for ARM/AArch64.
Major difference between the V extension and other popular SIMD extensions is non-fixed vector length: while SSE instructions operate on 128-bit registers and AVX2 on 256-bit registers, instructions in the V extension can operate on whatever register width is provided by an actual hardware. This kind of SIMD specification is also called Scalable SIMD. Similar approach is used by SVE (Scalable Vector Extension) on ARM platforms.
In this document we will focus mainly on RVV extension usage in general and in OpenCV specifically.
Links:
- Wikipedia / RISC-V
- RISC-V / about
- RISC-V / Vector Extension spec v1.0
- RISC-V / Vector Extension spec v0.7.1
- ARM - What is SVE
Historically, the first V specification version implemented in hardware was v0.7.1. It differs from the finalized v1.0 in several ways. Below is a list of devices known to support RVV extension and being used for OpenCV optimization testing recently:
- RVV 1.0
- CanMV K230
- Banana Pi BPI-F3
- Muse Pi
- LicheePi 3A
- RVV 0.7.1
- LicheePi 4A
In order to use V extension one should use Linux system which includes
kernel built with RVV support. Often it might be the kernel provided by SoC/core
manufacturer or mainline kernel with corresponding patches. To check V
extension support run the following command and check that isa
line contains
letter v
after rv64
base specifier:
cat /proc/cpuinfo
Example output:
...
isa : rv64imafdcvu
... ^
here
While it is possible to write vectorized code using RVV assembly, C/C++ libraries and applications often use vector intrinsics - set of types and functions built into the compiler, which corresponds to machine instructions.
Usually software for RISC-V is built on regular Linux or Window platforms using cross-compilation process. Cross-compiling toolchains include compiler and other libraries and tools required for development. Following toolchains are known to include intrinsics for RVV extension:
- Mainline compilers (RVV 1.0)
- GCC 13-14 (https://github.com/riscv-collab/riscv-gnu-toolchain) - uses recent intrinsics specification, supports v1.0 of vector extension
- LLVM/Clang 17-20 (https://github.com/llvm/llvm-project) - uses recent intrinsics specification, supports v1.0 of vector extension
-
XuanTie compiler (RVV 0.7.1 and RVV 1.0)
- xuantie-gnu-toolchain 2.x is based on GCC 10 (https://github.com/XUANTIE-RV/xuantie-gnu-toolchain, see "Releases")
Links:
See https://llvm.org/docs/GettingStarted.html#getting-the-source-code-and-building-llvm
git clone --depth 1 https://github.com/llvm/llvm-project
cd llvm-project
cmake -S llvm -B build -G Ninja
cmake --build build --target install
Note: use -DCMAKE_INSTALL_PREFIX
CMake option to change install location
# get 'riscv-collab' repository
git clone --depth=1 https://github.com/riscv-collab/riscv-gnu-toolchain
cd riscv-gnu-toolchain
git submodule init && git submodule update
# update 'gcc' subfolder
GCC_TAG=releases/gcc-14.2.0
git -C gcc remote update
git -C gcc fetch origin ${GCC_TAG}
git -C gcc checkout ${GCC_TAG}
# build
./configure --enable-linux
CPUNUM=4
make -j${CPUNUM} linux
make -j${CPUNUM} build-qemu
Note: use --prefix
configure option to change install location
Note: you can also update qemu
subfolder to specific version or omit
build-qemu
command if you don't need it
# get repository
git clone https://github.com/XUANTIE-RV/xuantie-gnu-toolchain
cd xuantie-gnu-toolchain
# build
./configure --enable-linux
make -j8 linux
make -j8 build-qemu
Note: use --prefix
configure option to change install location
Note: omit QEMU build if you don't need it
In order to test applications without hardware one can use emulation software, for example QEMU. Often QEMU emulator is included in toolchain package or can be built together with compiler. There are two operating modes for QEMU: full system emulation and user-mode emulation. In the first case user has to describe and prepare full virtual system with disks and other peripherals, install operating system and then work with it like with standalone machine - boot, install software, interact. In the second case user can straight-away run their RISC-V application and QEMU will proxy system calls to the host OS (Linux) - this mode is the best suitable for an application development and debugging, so we will review it in more details.
QEMU user-mode applications have names like qemu-arm
, qemu-aarch64
,
qemu-riscv64
- they do not have word system
. In order to launch an
application using QEMU one should pass their command line to the QEMU program
like this:
qemu-riscv64 <qemu options> ./my-app <app arguments>
Following QEMU options are most important to run RISC-V application:
-
-cpu <model>
- select CPU model and feature, for example QEMU provided by T-Head allows selecting specific core to emulate:c906
andc906fdv
(withf
,d
andv
extensions),c908
andc908v
,c910
andc910v
. Generic RISC-V emulation also allows extension selection:-cpu rv64,v=true,vext_spec=v1.0
will enable RVV v1.0. -
-L <path>
- set path where LD interpreter will be rooted. Usually this folder is part of toolchain distribution, so this parameter might look like this-L <toolchain root>/sysroot
.
Other options might be useful for debugging and fine tuning:
-
-help
- show all options and their descriptions -
-cpu list
- show all supported CPU models -
-E <var>=<value>
- set environment variables -
-g <port>
- wait for GDB connection on selected port
Links:
Usually most convenient way is to debug user application remotely, because either target system do not have a debugger, or the one it has in packages does not match the compiler used for build (e.g. does not support RVV). Remote debugging process with GDB is as follows:
- Build your application with debugging information enabled, use
-g -Og
compiler options or-DCMAKE_BUILD_TYPE=Debug
cmake option in case of OpenCV - On remote machine run your application with the gdbserver (port can be chosen
arbitrarily, e.g. 1234):
Program will start and pause immediately waiting for remote connection.
gdbserver :<port> ./my-app <args>
- On host machine run your application using the GDB from toolchain:
Program will be loaded and GDB will wait for further instructions
<toolchain root>/bin/riscv64-unknown-linux-gnu-gdb ./my-app <args>
- Setup remote connection on the host machine using GDB command
target remote <address>:<port>
, where address is your remote machine IP address or hostname and port is the same as on step 2 - Debug application from host as usual, e.g. enter the
continue
command to continue execution until it crashes and examine program state afterwards.
Similar procedure can be used with the QEMU emulation - set -g <port>
option
to start server (step 2) and connect using taget remote :<port>
on the GDB
side (step 4).
OpenCV support RISC-V platform since 2020 and each year it grows and improves.
Major contribution has been made by the T-Head
(平头哥半导体有限公司) (intrin_rvv071.hpp
) and by the Chinese Academy of
Sciences (intrin_rvv.hpp
, intrin_rvv_scalable.hpp
).
Note: in the latest OpenCV versions intrin_rvv.hpp
implementation has been
removed
Most CPU optimizations in OpenCV are achieved through the use of Universal Intrinsics, which act as wrappers over platform-specific SIMD compiler intrinsics. Currently, OpenCV supports implementations for SSE/AVX/NEON/RVV/VSX/MSA/LSX/WASM intrinsics.
Historically, the Universal Intrinsics have undergone three generations:
- Fixed size intrinsics: types have indication of element size and count, e.g.
v_int8x16
- vectors with 16 8-bit elements (128-bit registers). - Wide intrinsics: types have indication of element size only, element count is
selected at compile-time, e.g.
v_int8
- vector with 8-bitVTraits<v_int8>::nlanes
elements, wherenlanes
can be 16, 32 or 64 depending on vector register size (128, 256, 512 bits). - Scalable intrinsics: types have indication of element size, element count is
selected at run-time depending on platform where it is executed, e.g.
v_int8
- for 8-bit vectors withV_Traits<v_int8>::vlanes()
elements.
Scalable intrinsics implementation matches well with RVV sizeless vectors paradigm. During summer 2023 and 2024, as part of OpenCV Summer of Code project, the library has been refactored by @hanliutong using semi-automated approach to support scalable intrinsics in most areas.
There are 2 implementations of universal intrinsics for RISC-V RVV in OpenCV
(files located in modules/core/include/opencv2/core/hal
):
-
intrin_rvv_scalable.hpp
- uses latest RVV intrinsics, limited to RVV v1.0. Can be built with recent versions of GCC and LLVM toolchains. First real implementation of Scalable Universal Intriniscs. Can be run on the T-Head QEMU (c908v
), mainline QEMU with RVV 1.0 enabled or HW supporting RVV 1.0. -
intrin_rvv071.hpp
- soon after OpenCV 4.9.0 release this implementation has been reworked by the T-Head to support modern intrinsics dialect, toolchains (2.6.x - 2.8.x) and both 0.7.1 and 1.0 RVV versions (see PR#24841). It uses fixed vector length (128 bit) by setting the compiler option:-mrvv-vector-bits=128
. At the time of writing this implementation might show worse efficiency than previous one, but further improvements in this are should be possible.
We will describe both build variants, one for each Universal Intrinsics implementation listed above.
We assume the following directory structure:
<root>/
- opencv/ - OpenCV repository
- opencv_extra/ - OpenCV Extra repository with testdata (not required for build)
- build/ - build location (empty)
Prerequisites are:
cmake
-
ninja-build
(Makefiles generator can be used as well) -
python3
(?) - selected RISC-V toolchain installed somewhere (
TOOLCHAIN_ROOT
)
We will use static builds (BUILD_SHARED_LIBS
option) for deployment
convenience, but dynamic builds can be used as well. We also disable OpenCL
during builds (WITH_OPENCL
option) to avoid loading of an experimental OpenCL
runtime during OpenCV test execution and reduce testing time and non-relevant
failures, this option is not necessary for regular use.
Main difference between build variants is the <toolchain>.cmake
file being
used and some specific options.
Use mainline GCC or LLVM toolchains.
Build with GCC:
cd build
PATH=${TOOLCHAIN_ROOT}/bin:${PATH} \
cmake -GNinja \
-DCMAKE_BUILD_TYPE=Release \
-DBUILD_SHARED_LIBS=OFF \
-DWITH_OPENCL=OFF \
-DCMAKE_TOOLCHAIN_FILE=../opencv/platforms/linux/riscv64-gcc.toolchain.cmake \
-DRISCV_RVV_SCALABLE=ON \
../opencv
ninja
Build with LLVM (also requires GCC for standard libraries):
cd build
cmake -GNinja \
-DCMAKE_BUILD_TYPE=Release \
-DBUILD_SHARED_LIBS=OFF \
-DWITH_OPENCL=OFF \
-DCMAKE_TOOLCHAIN_FILE=../opencv/platforms/linux/riscv64-clang.toolchain.cmake \
-DRISCV_CLANG_BUILD_ROOT=${LLVM_TOOLCHAIN_ROOT} \
-DRISCV_GCC_INSTALL_ROOT=${GCC_TOOLCHAIN_ROOT} \
-DRISCV_RVV_SCALABLE=ON \
../opencv
ninja
Run OpenCV core test using regular QEMU:
cd build
OPENCV_TEST_DATA_PATH=../opencv_extra/testdata \
${QEMU_DIR}/bin/qemu-riscv64 \
-L ${TOOLCHAIN_ROOT}/sysroot \
-cpu rv64,v=true,vext_spec=v1.0 \
./bin/opencv_test_core
Note: OpenCV uses flexible CPU feature detection during configuration
process, so if the compiler does not support RVV optimizations it will be turned
off and build will proceed. To avoid this behavior and fail build process in
case when compiler does not support RVV, the following option options should be
added: -DCPU_BASELINE_REQUIRE=RVV
Note: CMake option can be used to change default CPU option used for RVV
detection: -DCMAKE_CXX_FLAGS="-march=rv64gcv1p0"
. It can be useful if your
want to enable extra RISC-V features or your compiler requires specific feature
description syntax.
OpenCV > 4.9.0
Use T-Head 2.x toolchain.
Build:
cd build
PATH=${TOOLCHAIN_ROOT}/bin:${PATH} \
cmake -GNinja \
-DCMAKE_BUILD_TYPE=Release \
-DBUILD_SHARED_LIBS=OFF \
-DWITH_OPENCL=OFF \
-DCMAKE_TOOLCHAIN_FILE=../opencv/platforms/linux/riscv64-071-gcc.toolchain.cmake \
-DCORE=C910V \
../opencv
ninja
Run OpenCV core test using T-Head QEMU (select CPU model corresponding to the build CPU option):
cd build
OPENCV_TEST_DATA_PATH=../opencv_extra/testdata \
${QEMU_DIR}/bin/qemu-riscv64 \
-L ${TOOLCHAIN_ROOT}/sysroot \
-cpu c910v \
./bin/opencv_test_core
Note: see all supported CPU models accepted by the -DCORE=
option in the
platforms/linux/riscv64-071-gcc.toolchain.cmake
. RVV version and its
availability depend on selected CPU model.
© Copyright 2019-2025, OpenCV team
- Home
- Deep Learning in OpenCV
- Running OpenCV on Various Platforms
- OpenCV 5
- OpenCV 4
- OpenCV 3
- Development process
- OpenCV GSoC
- Archive