OpenCV RISC V

RISC-V introduction

RISC-V is an Instruction Set Architecture (ISA) that is gaining popularity as an alternative to traditional ISAs such as x86/x86_64 and ARM/AArch64. It is covered by an open-source license, which allows for royalty-free usage by both hardware and software providers.

Besides base integer instruction set RISC-V processors can implement various architecture extensions, for example F for single-precision floating point support or M for integer multiplication and division. There exists an extension for vector operations (aka SIMD - single instruction multiple data) - V (RVV), which is beneficial for high-performance computing applications like image processing, machine learning and deep learning. This extension can be leveraged by the OpenCV to achieve significant performance improvement across many algorithms. V-extension analogs in other platforms are SSE/AVX for x86_64 and NEON/SVE for ARM/AArch64.

Major difference between the V extension and other popular SIMD extensions is non-fixed vector length: while SSE instructions operate on 128-bit registers and AVX2 on 256-bit registers, instructions in the V extension can operate on whatever register width is provided by an actual hardware. This kind of SIMD specification is also called Scalable SIMD. Similar approach is used by SVE (Scalable Vector Extension) on ARM platforms.

In this document we will focus mainly on RVV extension usage in general and in OpenCV specifically.

Links:

RISC-V hardware with vector support

Historically, the first V specification version implemented in hardware was v0.7.1. It differs from the finalized v1.0 in several ways. Below is a list of devices known to support RVV extension and being used for OpenCV optimization testing recently:

RVV 1.0
- CanMV K230
- Banana Pi BPI-F3
- Muse Pi
- LicheePi 3A
RVV 0.7.1
- LicheePi 4A

RISC-V software with vector support

In order to use V extension one should use Linux system which includes kernel built with RVV support. Often it might be the kernel provided by SoC/core manufacturer or mainline kernel with corresponding patches. To check V extension support run the following command and check that isa line contains letter v after rv64 base specifier:

cat /proc/cpuinfo

Example output:

...
isa             : rv64imafdcvu
...                         ^
                          here

Development for RISC-V with vector support

While it is possible to write vectorized code using RVV assembly, C/C++ libraries and applications often use vector intrinsics - set of types and functions built into the compiler, which corresponds to machine instructions.

Usually software for RISC-V is built on regular Linux or Window platforms using cross-compilation process. Cross-compiling toolchains include compiler and other libraries and tools required for development. Following toolchains are known to include intrinsics for RVV extension:

Mainline compilers (RVV 1.0)
- GCC 13-14 (https://github.com/riscv-collab/riscv-gnu-toolchain) - uses recent intrinsics specification, supports v1.0 of vector extension
- LLVM/Clang 17-20 (https://github.com/llvm/llvm-project) - uses recent intrinsics specification, supports v1.0 of vector extension
XuanTie compiler (RVV 0.7.1 and RVV 1.0)
- xuantie-gnu-toolchain 2.x is based on GCC 10 (https://github.com/XUANTIE-RV/xuantie-gnu-toolchain, see "Releases")

Links:

RISC-V RVV intrinsics specification

Building LLVM/Clang toolchain

See https://llvm.org/docs/GettingStarted.html#getting-the-source-code-and-building-llvm

git clone --depth 1 https://github.com/llvm/llvm-project
cd llvm-project
cmake -S llvm -B build -G Ninja
cmake --build build --target install

Note: use -DCMAKE_INSTALL_PREFIX CMake option to change install location

Building GCC toolchain for RISC-V with RVV support

# get 'riscv-collab' repository
git clone --depth=1 https://github.com/riscv-collab/riscv-gnu-toolchain
cd riscv-gnu-toolchain
git submodule init && git submodule update
# update 'gcc' subfolder
GCC_TAG=releases/gcc-14.2.0
git -C gcc remote update
git -C gcc fetch origin ${GCC_TAG}
git -C gcc checkout ${GCC_TAG}
# build
./configure --enable-linux
CPUNUM=4
make -j${CPUNUM} linux
make -j${CPUNUM} build-qemu

Note: use --prefix configure option to change install location

Note: you can also update qemu subfolder to specific version or omit build-qemu command if you don't need it

Building XuanTie toolchain

# get repository
git clone https://github.com/XUANTIE-RV/xuantie-gnu-toolchain
cd xuantie-gnu-toolchain
# build
./configure --enable-linux
make -j8 linux
make -j8 build-qemu

Note: use --prefix configure option to change install location

Note: omit QEMU build if you don't need it

Emulating RISC-V RVV hardware

In order to test applications without hardware one can use emulation software, for example QEMU. Often QEMU emulator is included in toolchain package or can be built together with compiler. There are two operating modes for QEMU: full system emulation and user-mode emulation. In the first case user has to describe and prepare full virtual system with disks and other peripherals, install operating system and then work with it like with standalone machine - boot, install software, interact. In the second case user can straight-away run their RISC-V application and QEMU will proxy system calls to the host OS (Linux) - this mode is the best suitable for an application development and debugging, so we will review it in more details.

QEMU user-mode applications have names like qemu-arm, qemu-aarch64, qemu-riscv64 - they do not have word system. In order to launch an application using QEMU one should pass their command line to the QEMU program like this:

qemu-riscv64 <qemu options> ./my-app <app arguments>

Following QEMU options are most important to run RISC-V application:

-cpu <model> - select CPU model and feature, for example QEMU provided by T-Head allows selecting specific core to emulate: c906 and c906fdv (with f, d and v extensions), c908 and c908v, c910 and c910v. Generic RISC-V emulation also allows extension selection: -cpu rv64,v=true,vext_spec=v1.0 will enable RVV v1.0.
-L <path> - set path where LD interpreter will be rooted. Usually this folder is part of toolchain distribution, so this parameter might look like this -L <toolchain root>/sysroot.

Other options might be useful for debugging and fine tuning:

-help - show all options and their descriptions
-cpu list - show all supported CPU models
-E <var>=<value> - set environment variables
-g <port> - wait for GDB connection on selected port

Links:

Debugging RISC-V applications

Usually most convenient way is to debug user application remotely, because either target system do not have a debugger, or the one it has in packages does not match the compiler used for build (e.g. does not support RVV). Remote debugging process with GDB is as follows:

Build your application with debugging information enabled, use -g -Og compiler options or -DCMAKE_BUILD_TYPE=Debug cmake option in case of OpenCV
On remote machine run your application with the gdbserver (port can be chosen arbitrarily, e.g. 1234):
```
gdbserver :<port> ./my-app <args>
```
Program will start and pause immediately waiting for remote connection.
On host machine run your application using the GDB from toolchain:
```
<toolchain root>/bin/riscv64-unknown-linux-gnu-gdb ./my-app <args>
```
Program will be loaded and GDB will wait for further instructions
Setup remote connection on the host machine using GDB command target remote <address>:<port>, where address is your remote machine IP address or hostname and port is the same as on step 2
Debug application from host as usual, e.g. enter the continue command to continue execution until it crashes and examine program state afterwards.

Similar procedure can be used with the QEMU emulation - set -g <port> option to start server (step 2) and connect using taget remote :<port> on the GDB side (step 4).

RISC-V support in OpenCV

OpenCV support RISC-V platform since 2020 and each year it grows and improves. Major contribution has been made by the T-Head (平头哥半导体有限公司) (intrin_rvv071.hpp) and by the Chinese Academy of Sciences (intrin_rvv.hpp, intrin_rvv_scalable.hpp).

Note: in the latest OpenCV versions intrin_rvv.hpp implementation has been removed

Universal intrinsics

Most CPU optimizations in OpenCV are achieved through the use of Universal Intrinsics, which act as wrappers over platform-specific SIMD compiler intrinsics. Currently, OpenCV supports implementations for SSE/AVX/NEON/RVV/VSX/MSA/LSX/WASM intrinsics.

Historically, the Universal Intrinsics have undergone three generations:

Fixed size intrinsics: types have indication of element size and count, e.g. v_int8x16 - vectors with 16 8-bit elements (128-bit registers).
Wide intrinsics: types have indication of element size only, element count is selected at compile-time, e.g. v_int8 - vector with 8-bit VTraits<v_int8>::nlanes elements, where nlanes can be 16, 32 or 64 depending on vector register size (128, 256, 512 bits).
Scalable intrinsics: types have indication of element size, element count is selected at run-time depending on platform where it is executed, e.g. v_int8 - for 8-bit vectors with V_Traits<v_int8>::vlanes() elements.

Scalable intrinsics implementation matches well with RVV sizeless vectors paradigm. During summer 2023 and 2024, as part of OpenCV Summer of Code project, the library has been refactored by @hanliutong using semi-automated approach to support scalable intrinsics in most areas.

Universal intrinsics for RISC-V RVV

There are 2 implementations of universal intrinsics for RISC-V RVV in OpenCV (files located in modules/core/include/opencv2/core/hal):

intrin_rvv_scalable.hpp - uses latest RVV intrinsics, limited to RVV v1.0. Can be built with recent versions of GCC and LLVM toolchains. First real implementation of Scalable Universal Intriniscs. Can be run on the T-Head QEMU (c908v), mainline QEMU with RVV 1.0 enabled or HW supporting RVV 1.0.
intrin_rvv071.hpp - soon after OpenCV 4.9.0 release this implementation has been reworked by the T-Head to support modern intrinsics dialect, toolchains (2.6.x - 2.8.x) and both 0.7.1 and 1.0 RVV versions (see PR#24841). It uses fixed vector length (128 bit) by setting the compiler option: -mrvv-vector-bits=128. At the time of writing this implementation might show worse efficiency than previous one, but further improvements in this are should be possible.

Building OpenCV with RVV support

We will describe both build variants, one for each Universal Intrinsics implementation listed above.

We assume the following directory structure:

<root>/
  - opencv/ - OpenCV repository
  - opencv_extra/ - OpenCV Extra repository with testdata (not required for build)
  - build/ - build location (empty)

Prerequisites are:

cmake
ninja-build (Makefiles generator can be used as well)
python3 (?)
selected RISC-V toolchain installed somewhere (TOOLCHAIN_ROOT)

We will use static builds (BUILD_SHARED_LIBS option) for deployment convenience, but dynamic builds can be used as well. We also disable OpenCL during builds (WITH_OPENCL option) to avoid loading of an experimental OpenCL runtime during OpenCV test execution and reduce testing time and non-relevant failures, this option is not necessary for regular use.

Main difference between build variants is the <toolchain>.cmake file being used and some specific options.

Build for new RVV 1.0 intrinsics

Use mainline GCC or LLVM toolchains.

Build with GCC:

cd build
PATH=${TOOLCHAIN_ROOT}/bin:${PATH} \
cmake -GNinja \
   -DCMAKE_BUILD_TYPE=Release \
   -DBUILD_SHARED_LIBS=OFF \
   -DWITH_OPENCL=OFF \
   -DCMAKE_TOOLCHAIN_FILE=../opencv/platforms/linux/riscv64-gcc.toolchain.cmake \
   -DRISCV_RVV_SCALABLE=ON \
   ../opencv
ninja

Build with LLVM (also requires GCC for standard libraries):

cd build
cmake -GNinja \
   -DCMAKE_BUILD_TYPE=Release \
   -DBUILD_SHARED_LIBS=OFF \
   -DWITH_OPENCL=OFF \
   -DCMAKE_TOOLCHAIN_FILE=../opencv/platforms/linux/riscv64-clang.toolchain.cmake \
   -DRISCV_CLANG_BUILD_ROOT=${LLVM_TOOLCHAIN_ROOT} \
   -DRISCV_GCC_INSTALL_ROOT=${GCC_TOOLCHAIN_ROOT} \
   -DRISCV_RVV_SCALABLE=ON \
   ../opencv
ninja

Run OpenCV core test using regular QEMU:

cd build
OPENCV_TEST_DATA_PATH=../opencv_extra/testdata \
${QEMU_DIR}/bin/qemu-riscv64 \
   -L ${TOOLCHAIN_ROOT}/sysroot \
   -cpu rv64,v=true,vext_spec=v1.0 \
   ./bin/opencv_test_core

Note: OpenCV uses flexible CPU feature detection during configuration process, so if the compiler does not support RVV optimizations it will be turned off and build will proceed. To avoid this behavior and fail build process in case when compiler does not support RVV, the following option options should be added: -DCPU_BASELINE_REQUIRE=RVV

Note: CMake option can be used to change default CPU option used for RVV detection: -DCMAKE_CXX_FLAGS="-march=rv64gcv1p0". It can be useful if your want to enable extra RISC-V features or your compiler requires specific feature description syntax.

Build intrin_rvv071, RVV 0.7.1 and 1.0, T-Head toolchain 2.x

OpenCV > 4.9.0

Use T-Head 2.x toolchain.

Build:

cd build
PATH=${TOOLCHAIN_ROOT}/bin:${PATH} \
cmake -GNinja \
   -DCMAKE_BUILD_TYPE=Release \
   -DBUILD_SHARED_LIBS=OFF \
   -DWITH_OPENCL=OFF \
   -DCMAKE_TOOLCHAIN_FILE=../opencv/platforms/linux/riscv64-071-gcc.toolchain.cmake \
   -DCORE=C910V \
   ../opencv
ninja

Run OpenCV core test using T-Head QEMU (select CPU model corresponding to the build CPU option):

cd build
OPENCV_TEST_DATA_PATH=../opencv_extra/testdata \
${QEMU_DIR}/bin/qemu-riscv64 \
   -L ${TOOLCHAIN_ROOT}/sysroot \
   -cpu c910v \
   ./bin/opencv_test_core

Note: see all supported CPU models accepted by the -DCORE= option in the platforms/linux/riscv64-071-gcc.toolchain.cmake. RVV version and its availability depend on selected CPU model.

Uh oh!

OpenCV RISC V

RISC-V introduction

RISC-V hardware with vector support

RISC-V software with vector support

Development for RISC-V with vector support

Building LLVM/Clang toolchain

Building GCC toolchain for RISC-V with RVV support

Building XuanTie toolchain

Emulating RISC-V RVV hardware

Debugging RISC-V applications

RISC-V support in OpenCV

Universal intrinsics

Universal intrinsics for RISC-V RVV

Building OpenCV with RVV support

Build for new RVV 1.0 intrinsics

Build intrin_rvv071, RVV 0.7.1 and 1.0, T-Head toolchain 2.x

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!