CARVIEW |
Navigation Menu
-
-
Notifications
You must be signed in to change notification settings - Fork 56.2k
RISC-V/AArch64: disable CPU features detection #25901
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
||
set(CPU_DISPATCH "" CACHE STRING "${HELP_CPU_DISPATCH}") | ||
set(CPU_BASELINE "DETECT" CACHE STRING "${HELP_CPU_BASELINE}") | ||
|
||
if(NOT CPU_BASELINE STREQUAL "DETECT") | ||
message(FATAL_ERROR "RISC-V only supports CPU_BASELINE=DETECT, use CMAKE_CXX_FLAGS or cmake toolchain file options to change platform") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I propose to revert changes in this files and use:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main idea here is to avoid modifying arch compiler flags at all. If I understand correctly, changes in #25930 will still try to append custom flags and run compiler checks.
For example, with that patch, if we compile OpenCV with -DCMAKE_CXX_FLAGS="-mcpu=cortex-a72" -DCPU_BASELINE=NEON_FP16
options (cortex-a72
is armv8-a CPU), then our scripts will silently append -march=armv8.2-a+fp16
and make this build incompatible with explicitly requested CPU.
This behavior can cause compatibility, performance issues and compilation errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
avoid modifying arch compiler flags at all
For dispatched optimization we still need to modify compiler flags anyway.
make this build incompatible with explicitly requested CPU
If compiler doesn't fail on try_compile()
in that case then it could be workaronded through -DCPU_NEON_FP16_FLAGS_ON=
CMake override.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Besides of DETECT, empty or other values of CPU_BASELINE are used for limiting of used features in OpenCV without touching compiler flags.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For dispatched optimization we still need to modify compiler flags anyway.
Yes, perhaps users will be required to manually provide flags for dispatched files via CPU_XXX_FLAGS_ON
variables. Examples with our cmake-toolchain file from this PR:
- Build with RVV as a baseline:
-DENABLE_RVV=ON
- Build without RVV in a baseline, but with dispatched RVV files:
-DCPU_DISPATCH=RVV -DCPU_RVV_FLAGS_ON=-march=rv64gcv
empty or other values of CPU_BASELINE are used for limiting of used features
Doesn't CPU_BASELINE_DISABLE
do this? I tried to use it with this PR and enabled RVV compiler flags: -DENABLE_RVV=ON -DCPU_BASELINE_DISABLE=RVV
, but I'm not sure if correct macros are being set in cv_cpu_config.h:
// OpenCV CPU baseline features
#define CV_CPU_COMPILE_RVV 1
#define CV_CPU_BASELINE_COMPILE_RVV 1
#define CV_CPU_BASELINE_FEATURES 0 \
, CV_CPU_RVV \
// OpenCV supported CPU dispatched features
#define CV_CPU_DISPATCH_FEATURES 0 \
This comment was marked as spam.
This comment was marked as spam.
Sorry, something went wrong.
platforms/linux/arm.toolchain.cmake
Outdated
@@ -57,6 +57,16 @@ if(NOT DEFINED CMAKE_CXX_FLAGS) | |||
|
|||
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -fdata-sections -Wa,--noexecstack -fsigned-char -Wno-psabi") | |||
set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -fdata-sections -Wa,--noexecstack -fsigned-char -Wno-psabi") | |||
if(CMAKE_SYSTEM_PROCESSOR STREQUAL "aarch64") | |||
# see https://gcc.gnu.org/onlinedocs/gcc/AArch64-Options.html | |||
if(ENABLE_BF16) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ENABLE_BF16
Toolchain files have issue with cached variables (not available to them - just add message()
),
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've checked this variable, it seem to be working. I build with -DENABLE_BF16=ON
and got -march=armv8.6-a
in the compiler string. We use this approach in XuanTie cmake toolchain: https://github.com/opencv/opencv/blob/4.x/platforms/linux/riscv64-071-gcc.toolchain.cmake (CORE
variable)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just add message()
and run.
Also add message() for modified
CMAKE_CXX_FLAGSbecause you got the
else()` branch in case of undefined value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I remembered, it was an issue with CMAKE_CXX_FLAGS. As I understand, modern cmake (3.7+) have respective _INIT
variables for toolchain files specifically (https://cmake.org/cmake/help/latest/variable/CMAKE_LANG_FLAGS_INIT.html). In new commit I've updated some of our toolchains to use them too.
platforms/linux/arm.toolchain.cmake
Outdated
# see https://gcc.gnu.org/onlinedocs/gcc/AArch64-Options.html | ||
if(ENABLE_BF16) | ||
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -march=armv8.6-a") | ||
set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -march=armv8.6-a") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Toolchain files are not used for native compilation.
We need a robust solution for such cases too, e.g. through OPENCV_EXTRA_*
flags (currently they are applied after the feature checks)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For native compilation we assume that toolchain have architecture flags set by default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For example on Armbian 12 (OrangePi) I have:
g++ -march=native -Q --target-help | grep march
-march= armv8.2-a+crypto+fp16+rcpc+dotprod
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-march=native
It is not a default actually.
"Native" case is already properly handled by the current scripts: https://github.com/opencv/opencv/blob/4.10.0/cmake/OpenCVCompilerOptimizations.cmake#L167
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For native compilation we can:
- leave it to user - set
CMAKE_CXX_FLAGS
or write your own cmake toolchain file - make our own cmake toolchain files similar to those used for cross-compilation (feature controlling flags should be the same)
BTW, we don't handle -mcpu=native
(equivalent to -march=native -mtune=native
if I understand correctly). Example from Armbian:
orangepi@orangepi5:~$ gcc -Q --help=target | egrep 'mtune|mcpu|march'
-march= armv8-a
-mcpu= generic
-mtune= generic
orangepi@orangepi5:~$ gcc -march=native -Q --help=target | egrep 'mtune|mcpu|march'
-march= armv8.2-a+crypto+fp16+rcpc+dotprod
-mcpu= generic
-mtune= generic
orangepi@orangepi5:~$ gcc -mcpu=native -Q --help=target | egrep 'mtune|mcpu|march'
-march= armv8.2-a
-mcpu= cortex-a76.cortex-a55+crypto
-mtune= cortex-a76.cortex-a55
cf9abe2
to
dd797be
Compare
platforms/linux/arm.toolchain.cmake
Outdated
elseif(CMAKE_SYSTEM_PROCESSOR STREQUAL aarch64) | ||
set(ARM_LINKER_FLAGS "-Wl,--no-undefined -Wl,--gc-sections -Wl,-z,noexecstack -Wl,-z,relro -Wl,-z,now") | ||
# == Compiler flags | ||
set(common_cc_opt "-fdata-sections -Wa,--noexecstack -fsigned-char -Wno-psabi") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is because cmake version on buildbot is too old (3.5.1) for this feature (CMAKE_<LANG>_FLAGS_INIT
). I propose raising minimum cmake version to 3.7.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My experiments has shown that unfortunately FLAGS_INIT
vars do not work with regular FLAGS
variables as I expected. They are not appended to initial value, but override it:
cmake \
-DCMAKE_TOOLCHAIN_FILE=/work/opencv/platforms/linux/riscv64-clang.toolchain.cmake \
../opencv
# result: -march=rv64gc ...
# this is flag from our toolchain file
cmake \
-DCMAKE_TOOLCHAIN_FILE=/work/opencv/platforms/linux/riscv64-clang.toolchain.cmake \
-DCMAKE_CXX_FLAGS="-Werror" \
../opencv
# result: -Werror ...
# no -march flags from toolchain file
Correct way to add custom flags to cross-compilation in this case will be via
CMAKE_<LANG>_FLAGS_<CONFIG>
variable, or<LANG>FLAGS
environment variable, or- OpenCV-specific variables
OPENCV_EXTRA_<LANG>_FLAGS
, or - custom toolchain-file.
I believe this behavior is acceptable because this is how cmake works.
9e9783a
to
d96068e
Compare
set(CPU_BASELINE "DETECT" CACHE STRING "${HELP_CPU_BASELINE}") | ||
set(CPU_DISPATCH "" CACHE STRING "${HELP_CPU_DISPATCH}") | ||
if(NOT CPU_BASELINE STREQUAL "DETECT") | ||
message(WARNING "ARM64 only supports CPU_BASELINE=DETECT, use CMAKE_CXX_FLAGS or cmake toolchain file options to change platform") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CPU_BASELINE=NATIVE
should be supported.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
f5b0682
to
025dfc4
Compare
# ok | ||
elseif(NOT CPU_BASELINE STREQUAL "DETECT") | ||
message(WARNING "Platform ${CMAKE_SYSTEM_PROCESSOR} only supports CPU_BASELINE=DETECT|NATIVE, use CMAKE_CXX_FLAGS or cmake toolchain file options to modify compiler options") | ||
set(CPU_BASELINE "DETECT") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still against forcing of DETECT
. It could be default, but it should not be forced.
We should not block user to disable some OpenCV intrinsics, e.g. for debug/testing purposes. Even if they are "enabled in compiler".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added support of an empty CPU_BASELINE
.
Currently this configuration is broken due to incorrect macros being used somewhere in DNN inline optimizations: compiler has feature enabled and some macros are enabled too, but some other macros from generated cv_cpu_config.hpp are disabled, thus causing missing symbols.
/usr/bin/ld: modules/dnn/CMakeFiles/opencv_dnn.dir/src/layers/cpu_kernels/conv_winograd_f63.cpp.o: in function `cv::dnn::runWinograd63(cv::_InputArray const&, cv::_InputArray const&, cv::_OutputArray const&, cv::Ptr<cv::dnn::FastConv> const&, int, float, float, cv::dnn::dnn4_v20240521::ActivationLayer*, bool)::{lambda(cv::Range const&)#1}::operator()(cv::Range const&) const':
/work/opencv/modules/dnn/src/layers/cpu_kernels/conv_winograd_f63.cpp:205: undefined reference to `cv::dnn::winofunc_BtXB_8x8_F32(float const*, int, float*, int, int, int)'
/usr/bin/ld: modules/dnn/CMakeFiles/opencv_dnn.dir/src/layers/cpu_kernels/conv_winograd_f63.cpp.o: in function `cv::dnn::runWinograd63(cv::_InputArray const&, cv::_InputArray const&, cv::_OutputArray const&, cv::Ptr<cv::dnn::FastConv> const&, int, float, float, cv::dnn::dnn4_v20240521::ActivationLayer*, bool)::{lambda(cv::Range const&)#2}::operator()(cv::Range const&) const':
/work/opencv/modules/dnn/src/layers/cpu_kernels/conv_winograd_f63.cpp:302: undefined reference to `cv::dnn::winofunc_accum_F32(float const*, float const*, float*, int, int, int, int, int, int)'
/usr/bin/ld: /work/opencv/modules/dnn/src/layers/cpu_kernels/conv_winograd_f63.cpp:368: undefined reference to `cv::dnn::winofunc_AtXA_8x8_F32(float const*, int, float*, int, float*, int, float, float, float, bool)'
collect2: error: ld returned 1 exit status
It is not related to this PR and we'll need to resolve this issue separately, my quick fix attempt has failed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just "empty" handling is not enough.
We should not block user to disable some OpenCV intrinsics, e.g. for debug/testing purposes. Even if they are "enabled in compiler".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
User can disable OpenCV intrinsics on affected platforms in two ways:
- set
-DCPU_BASELINE=
- disable all intrinsics - set
-DCPU_BASELINE_DISABLE=RVV
- disable selected categories
Example:
cmake \
-DRISCV_CLANG_BUILD_ROOT=/chains/llvm-main -DRISCV_GCC_INSTALL_ROOT=/chains/riscv-gcc-14.1.0 -DCMAKE_TOOLCHAIN_FILE=/opencv/platforms/linux/riscv64-clang.toolchain.cmake -DBUILD_SHARED_LIBS=OFF -DWITH_OPENCL=OFF \
-DENABLE_RVV=ON \ # choose RVV flags in the toolchain
-DCPU_BASELINE= \ # disable all intrinsics
-DBUILD_opencv_dnn=OFF \ # know issue
-DCMAKE_BUILD_TYPE=Release \
../opencv
CMake output:
...
-- CPU/HW features:
-- Baseline:
...
-- C++ flags (Release): -march=rv64gcv --gcc-toolchain=/chains/riscv-gcc-14.1.0 -w -fsigned-char -W -Wall -Wreturn-type -Wnon-virtual-dtor -Waddress -Wsequence-point -Wformat -Wformat-security -Wmissing-declarations -Wmissing-prototypes -Wstrict-prototypes -Wundef -Winit-self -Wpointer-arith -Wshadow -Wsign-promo -Wuninitialized -Winconsistent-missing-override -Wno-delete-non-virtual-dtor -Wno-unnamed-type-template-args -Wno-comment -Wno-deprecated-enum-enum-conversion -Wno-deprecated-anon-enum-enum-conversion -fdiagnostics-show-option -pthread -Qunused-arguments -ffunction-sections -fdata-sections -fvisibility=hidden -fvisibility-inlines-hidden -O3 -DNDEBUG -DNDEBUG
...
Generated cv_cpu_config.h:
// OpenCV CPU baseline features
#define CV_CPU_BASELINE_FEATURES 0 \
// OpenCV supported CPU dispatched features
#define CV_CPU_DISPATCH_FEATURES 0 \
If we don't change CPU_BASELINE
(leave it DETECT
), then OpenCV intrinsics will be enabled, cv_cpu_config.h:
// OpenCV CPU baseline features
#define CV_CPU_COMPILE_RVV 1
#define CV_CPU_BASELINE_COMPILE_RVV 1
#define CV_CPU_BASELINE_FEATURES 0 \
, CV_CPU_RVV \
// OpenCV supported CPU dispatched features
#define CV_CPU_DISPATCH_FEATURES 0 \
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets remove set(CPU_BASELINE "DETECT")
.
Warning message is enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
ocv_update(CPU_FP16_IMPLIES "NEON") | ||
ocv_update(CPU_NEON_DOTPROD_IMPLIES "NEON") | ||
ocv_update(CPU_NEON_FP16_IMPLIES "NEON") | ||
ocv_update(CPU_NEON_BF16_IMPLIES "NEON") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we lost _IMPLIES
configuration?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had concerns that it might affect compilation flags for dispatched or baseline files. I'll try to restore it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Restored.
CMakeLists.txt
Outdated
# Use same flags for native AArch64 and RISC-V compilation as for cross-compile (Linux) | ||
if((CV_GCC OR CV_CLANG) AND UNIX AND NOT APPLE AND NOT CMAKE_CROSSCOMPILING AND NOT CMAKE_TOOLCHAIN_FILE) | ||
if(CMAKE_SYSTEM_PROCESSOR STREQUAL "aarch64") # Maybe use AARCH64 variable? | ||
include(${CMAKE_CURRENT_LIST_DIR}/platforms/linux/flags-aarch64.cmake) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is already something for platforms: https://github.com/opencv/opencv/tree/4.x/cmake/platforms
We could add similar for processors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we can move this specific block to Linux.cmake
if it has all used variables being set at the time of importing (e.g. CMAKE_CROSSCOMPILING
). I'll try it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
platforms/linux/flags-aarch64.cmake
Outdated
if(ENABLE_BF16) | ||
set(OCV_FLAGS "-march=armv8.6-a") | ||
elseif(ENABLE_FP16 OR ENABLE_DOTPROD) | ||
set(OCV_FLAGS "-march=armv8.4-a") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Google uses -march=armv8.2-a+bf16
for dispatched code:
https://github.com/google/XNNPACK/blob/f416ab3d2781c8eb53eaa999239138296146b5e0/CMakeLists.txt#L836
Why do we need 8.4
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used this table from GCC documentation to establish baselines with assumption that actual hardware also lines up with it and there are no platforms which have dotprod
on top of armv8.2-a
, plus an assumption that compiler can optimize code better for such "baseline" arch.
arch value Architecture Includes by default
‘armv8-a’ Armv8-A ‘+fp’, ‘+simd’
‘armv8.1-a’ Armv8.1-A ‘armv8-a’, ‘+crc’, ‘+lse’, ‘+rdma’
‘armv8.2-a’ Armv8.2-A ‘armv8.1-a’
‘armv8.3-a’ Armv8.3-A ‘armv8.2-a’, ‘+pauth’
‘armv8.4-a’ Armv8.4-A ‘armv8.3-a’, ‘+flagm’, ‘+fp16fml’, ‘+dotprod’
‘armv8.5-a’ Armv8.5-A ‘armv8.4-a’, ‘+sb’, ‘+ssbs’, ‘+predres’
‘armv8.6-a’ Armv8.6-A ‘armv8.5-a’, ‘+bf16’, ‘+i8mm’
Should I change it back to armv8.2-a+...
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed it to use armv8.2-a+...
combination with feature flags. It became slightly more complicated, but I believe it's OK. Please let me know if we need to revert it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also added to the toolchain ability to disable NEON for AArch64 by explicitly setting ENABLE_NEON
to OFF
.
CMakeLists.txt
Outdated
# Use same flags for native AArch64 and RISC-V compilation as for cross-compile (Linux) | ||
if((CV_GCC OR CV_CLANG) AND UNIX AND NOT APPLE AND NOT CMAKE_CROSSCOMPILING AND NOT CMAKE_TOOLCHAIN_FILE) | ||
if(CMAKE_SYSTEM_PROCESSOR STREQUAL "aarch64") # Maybe use AARCH64 variable? | ||
include(${CMAKE_CURRENT_LIST_DIR}/platforms/linux/flags-aarch64.cmake) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
platforms/linux
OSX/iOS?
android?
other OS?
We should not depend on "linux" here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This block's purpose is to use the same flags for native Linux compilation as for cross-compilation with our toolchain files.
- It is not applicable to Android, because it uses own toolchain files from NDK and doesn't have native compilation (or does it?).
- Also not applicable to OSX/iOS because we rely on CMake/XCode to set platform flags for specific target.
- Not so sure about Windows, but it was not handled specifically by OpenCVCompilerOptimizations.cmake (we have only GCC/Clang flags there for AArch64).
- Other platforms are not officially supported by us and I think we should recommend users to use there own cmake-toolchain files for these cases.
Should I move required cmake version raise to 3.7 together with |
I think we can stay with CMake 3.7 as minimal requirement. Ubuntu 18.04 provides 3.10. Older distributions may provide older versions, but it's easy to update. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Purpose of this PR is declared as support for various user-defined compiler flags.
But current version contains several restrictions on that.
# ok | ||
elseif(NOT CPU_BASELINE STREQUAL "DETECT") | ||
message(WARNING "Platform ${CMAKE_SYSTEM_PROCESSOR} only supports CPU_BASELINE=DETECT|NATIVE, use CMAKE_CXX_FLAGS or cmake toolchain file options to modify compiler options") | ||
set(CPU_BASELINE "DETECT") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just "empty" handling is not enough.
We should not block user to disable some OpenCV intrinsics, e.g. for debug/testing purposes. Even if they are "enabled in compiler".
CMakeLists.txt
Outdated
@@ -684,6 +680,19 @@ if(NOT IOS AND NOT XROS) | |||
include(cmake/OpenCVDetectPython.cmake) | |||
endif() | |||
|
|||
# Use same flags for native AArch64 and RISC-V compilation as for cross-compile (Linux) | |||
if((CV_GCC OR CV_CLANG) AND UNIX AND NOT APPLE AND NOT CMAKE_CROSSCOMPILING AND NOT CMAKE_TOOLCHAIN_FILE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets move out that from the root CMake file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved to cmake/platforms/OpenCV-Linux.cmake
…nd native compilation
7073af3
to
4bfbcb6
Compare
cmake/platforms/OpenCV-Linux.cmake
Outdated
@@ -1 +1,10 @@ | |||
# empty |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets remove this comment as file is not "intentionally empty" anymore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
RISC-V/AArch64: disable CPU features detection opencv#25901 This PR is the first step in fixing current issues with NEON/RVV, FP16, BF16 and other CPU features on AArch64 and RISC-V platforms. On AArch64 and RISC-V platforms we usually have the platform set by default in the toolchain when we compile it or in the cmake toolchain file or in CMAKE_CXX_FLAGS by user. Then, there are two ways to set platform options: a) "-mcpu=<some_cpu>" ; b) "-march=<arch description>" (e.g. "rv64gcv"). Furthermore, there are no similar "levels" of optimizations as for x86_64, instead we have features (RVV, FP16,...) which can be enabled or disabled. So, for example, if a user has "rv64gc" set by the toolchain and we want to enable RVV. Then we need to somehow parse their current feature set and append "v" (vector optimizations) to this string. This task is quite hard and the whole procedure is prone to errors. I propose to use "CPU_BASELINE=DETECT" by default on AArch64 and RISC-V platforms. And somehow remove other features or make them read-only/detect-only, so that OpenCV wouldn't add any extra "-march" flags to the default configuration. We would rely only on the flags provided by the compiler and cmake toolchain file. We can have some predefined configurations in our cmake toolchain files. Changes made by this PR: - `CMakeLists.txt`: - use `CMAKE_CROSSCOMPILING` instead of `CMAKE_TOOLCHAIN_FILE` to detect cross-compilation. This might be useful in cases of native compilation with a toolchain file - removed obsolete variables `ENABLE_NEON` and `ENABLE_VFPV3`, the first one have been turned ON by default on AArch64 platform which caused setting `CPU_BASELINE=NEON` - raise minimum cmake version allowed to 3.7 to allow using `CMAKE_CXX_FLAGS_INIT` in toolchain files - added separate files with arch flags for native compilation on AArch64 and RISC-V, these files will be used in our toolchain files and in regular cmake - use `DETECT` as default value for `CPU_BASELINE` also allow `NATIVE`, warn user if other values were used (only for AArch64 and RISC-V) - for each feature listed in `CPU_DISPATCH` check if corresponding `CPU_${opt}_FLAGS_ON` has been provided, warn user if it is empty (only for AArch64 and RISC-V) - use `CPU_BASELINE_DISABLE` variable to actually turn off macros responsible for corresponding features even if they are enabled by compiler - removed Aarch64 feature merge procedure (it didn't support `-mcpu` and built-in `-march`) - reworked AArch64 and two RISC-V cmake toolchain files (does not affect Android/OSX/iOS/Win): - use `CMAKE_CXX_FLAGS_INIT` to set compiler flags - use variables `ENABLE_BF16`, `ENABLE_DOTPROD`, `ENABLE_RVV`, `ENABLE_FP16` to control `-march` - AArch64: removed other compiler and linker flags - `-fdata-sections`, `-fsigned-char`, `-Wl,--no-undefined`, `-Wl,--gc-sections` - already set by OpenCV - `-Wa,--noexecstack`, `-Wl,-z,noexecstack`, `-Wl,-z,relro`, `-Wl,-z,now` - can be enabled by OpenCV via `ENABLE_HARDENING` - `-Wno-psabi` - this option used to disable some warnings on older ARM platforms, shouldn't harm - ARM: removed same common flags as for AArch64, but left `-mthumb` and `--fix-cortex-a8`, `-z nocopyreloc`
This PR is the first step in fixing current issues with NEON/RVV, FP16, BF16 and other CPU features on AArch64 and RISC-V platforms.
On AArch64 and RISC-V platforms we usually have the platform set by default in the toolchain when we compile it or in the cmake toolchain file or in CMAKE_CXX_FLAGS by user. Then, there are two ways to set platform options: a) "-mcpu=<some_cpu>" ; b) "-march=" (e.g. "rv64gcv"). Furthermore, there are no similar "levels" of optimizations as for x86_64, instead we have features (RVV, FP16,...) which can be enabled or disabled. So, for example, if a user has "rv64gc" set by the toolchain and we want to enable RVV. Then we need to somehow parse their current feature set and append "v" (vector optimizations) to this string. This task is quite hard and the whole procedure is prone to errors.
I propose to use "CPU_BASELINE=DETECT" by default on AArch64 and RISC-V platforms. And somehow remove other features or make them read-only/detect-only, so that OpenCV wouldn't add any extra "-march" flags to the default configuration. We would rely only on the flags provided by the compiler and cmake toolchain file. We can have some predefined configurations in our cmake toolchain files.
Changes made by this PR:
CMakeLists.txt
:CMAKE_CROSSCOMPILING
instead ofCMAKE_TOOLCHAIN_FILE
to detect cross-compilation. This might be useful in cases of native compilation with a toolchain fileENABLE_NEON
andENABLE_VFPV3
, the first one have been turned ON by default on AArch64 platform which caused settingCPU_BASELINE=NEON
CMAKE_CXX_FLAGS_INIT
in toolchain filesDETECT
as default value forCPU_BASELINE
also allowNATIVE
, warn user if other values were used (only for AArch64 and RISC-V)CPU_DISPATCH
check if correspondingCPU_${opt}_FLAGS_ON
has been provided, warn user if it is empty (only for AArch64 and RISC-V)CPU_BASELINE_DISABLE
variable to actually turn off macros responsible for corresponding features even if they are enabled by compiler-mcpu
and built-in-march
)CMAKE_CXX_FLAGS_INIT
to set compiler flagsENABLE_BF16
,ENABLE_DOTPROD
,ENABLE_RVV
,ENABLE_FP16
to control-march
-fdata-sections
,-fsigned-char
,-Wl,--no-undefined
,-Wl,--gc-sections
- already set by OpenCV-Wa,--noexecstack
,-Wl,-z,noexecstack
,-Wl,-z,relro
,-Wl,-z,now
- can be enabled by OpenCV viaENABLE_HARDENING
-Wno-psabi
- this option used to disable some warnings on older ARM platforms, shouldn't harm-mthumb
and--fix-cortex-a8
,-z nocopyreloc