CARVIEW |
Navigation Menu
-
-
Notifications
You must be signed in to change notification settings - Fork 56.2k
Description
Introduction
Core module is the crucial module in OpenCV. All other modules depend on it.
It must form a solid foundation for the future-proof OpenCV.
What are the desirable key properties of the module:
- enough of essential functionality for other modules so that they don't have to reimplement it.
- basic data structures and infrastructure to be used jointly by other modules, to connect them together and provide smooth data flow. By infrastructure we mean memory management, parallel computing framework, HAL, error handling, logging & tracing, basic I/O etc.
- highly efficient with small overhead.
- multi-level API to enable both high-level pipeline-style use and more advanced use where efficient parallel kernels in other modules can reuse low-level primitives from the core module.
- compact size and small footprint (including low compile-time overhead)
A bit of history and why core module should be somewhat like Python's numpy
OpenCV's Core module in its current form has been created in ~2009, where cv::Mat, a multi-dimensional dense array, has been introduced as a complete replacement for CvMat, CvMatND and IplImage. The whole OpenCV API has been reconstructed (before 2009 it was a C API) around this cv::Mat and a few other basic structures like std::vector<> (to handle point clouds etc.). The idea behind combinding image, matrix and multi-dimensional array (tensor) has been borrowed from Matlab, where toolboxes, including image processing toolbox, basic linear algebra toolbox, Jean-Yves Bouguet camera calibration toolbox etc. all happily use Matlab matices and so it's super-easy to create pipelines that use algorithms from different areas.
It seems that Python's famous numerical extension numpy borrowed the same idea and also implemented ubiquitous matrix/array type called ndarray there. On top of numpy some bigger packages have been developed like scipy, scikit-learn etc. An efficient and yet quite comprehensive set of basic operations extended by the derived packages mostly eliminated the problem of very low speed of manually-written Python code (because all the kernels in numpy are implemented in efficient C or Fortran). That suddenly made Python a sound substitution for Fortran & Matlab in the new century.
With the rise of Deep Learning the idea has been greatly extended. Efficient, GPU-accelerated, comprehensive set of operations (very similar to numpy) that can be put together into graphs, together with automatic differentiation tools, formed the foundation of the modern Deep Learning technology. If one looks at PyTorch, Tensorflow, JAX, ONNX specification etc., he/she will find many similarities with numpy. In particular, many ONNX operations follow numpy quite closely and use numpy for illustration of those operations. Of course, there are some deep learning-specific operations like Convolution or SoftMax or Dropout or Attention, but most of ONNX operations have numpy counterparts.
Python community (since all aforementioned frameworks, except for OpenCV, mainly use Python) noticed this close resemblance of many array processing frameworks and decided to introduce so called Python array API standard. It's clear that this is emerging standard, as its API lacks some important numpy functionality, some important PyTorch/ONNX operations, it lacks the notion of an external accelerator (like GPU or NPU) where user may want to transfer array/tensor to, perform a set of operations there and transfer the results back. This is crucial functionality for deep learning frameworks, for OpenCV, its deep learning module and its GPU-accelerated image processing functionality etc. So the standard will definitely evolve, but it makes sense for us in OpenCV 5 to comply with it more or less even now. Besides implementation of already specified API, for us it's opportunity to offer extra kernels to the community that are important for computer vision and image processing use cases.
The list of functions to implement/improve in Core module in OpenCV 5.0
Basically, OpenCV's core module should implement a big subset of "Python array API standard" with certain extensions that we consider useful.
The list of functions below has been directly copied from https://data-apis.org/array-api/latest/API_specification/index.html. Probably, the following content should be presented in a table.
-
Unary/binary arithmetic, math and logic operations. We have implementation of many of those operations already (sometimes under slightly different names), but we need to support broadcasting for binary operations.
abs // cv::absdiff with 0 as a second parameter acos acosh // via cv::log() add // cv::add asin asinh // via cv::log() atan atan2 // cv::polarToCart atanh // via cv::log() bitwise_and // cv::bitwise_and bitwise_left_shift bitwise_invert // cv::bitwise_not bitwise_or // cv::bitwise_or bitwise_right_shift bitwise_xor // cv::bitwise_xor ceil conj cos // cv::cartToPolar cosh // via cv::exp() divide // cv::divide equal // cv::compare(..., CMP_EQ) exp // cv::exp expm1 floor floor_divide greater // cv::compare(..., CMP_GT) greater_equal // cv::compare(..., CMP_GE) imag isfinite isinf isnan less // cv::compare(..., CMP_LT) less_equal // cv::compare(..., CMP_LE) log // cv::log log1p log2 log10 logaddexp logical_and logical_not logical_or logical_xor multiply // cv::multiply negative not_equal // compare(...,CMP_NE) positive // copyTo() pow // cv::pow real remainder round // convertTo() sign sin // only cartToPolar sinh // via cv::exp() square // via cv::multiply() sqrt // cv::sqrt() subtract // cv::subtract() tan tanh // no direct function. Can be computed via cv::exp() trunc
-
Linear algebra functions. Same situation.
matmul // cv::gemm() matrix_transpose // cv::transpose() tensordot vecdot
-
Array permutation functions. Same situation:
broadcast_arrays broadcast_to // + concat // in OpenCV we have 2D hconcat and vconcat expand_dims flip // 2D only for now permute_dims // in cv::dnn we have general Transpose. in core we have 2D transpose reshape roll squeeze stack
-
Statistical functions. Same situation:
max // minMaxIdx() computes min, max and their indices. mean // cv::mean min // via minMaxIdx() prod std // cv::meanStdDev() computes both mean and standard deviation sum // cv::sum var // via cv::meanStdDev()
-
Misc functions from several other groups. Mostly implemented as well in one form or another:
// searching functions argmax argmin nonzero // ~ cv::countNonZero() where // element-wise ternary operator ?: // set functions unique_all unique_counts unique_inverse unique_values // sorting argsort // called cv::sortIdx() in OpenCV sort // cv::sort() // utility functions all // as cv::countNonZero(m) == m.total() any // cv::hasNonZero // initialization functions arange asarray // many non-mat array can be converted to cv::Mat using getMat(). // in Python bindings Mat is implicitly constructed from ndarray and vice versa empty // Mat() empty_like eye // Mat::eye() from_dlpack full full_like linspace meshgrid ones // Mat::ones() ones_like tril triu zeros // Mat::zeros() zeros_like
-
Some useful extra operations not included into "Python array API standard", but included into numpy and/or ONNX specifications:
einsum // already implemented in cv::dnn einops.* // a family of operations from excellent einops package: // https://github.com/arogozhnikov/einops reduce(..., sum|min|max|avg|...) // in core we already have 2D reduce(), // need to extend it to ND, as in ONNX or cv::dnn
Also, in Core we already have a bunch of functions implemented in numpy, but missing in the standard, like various matrix decomposition and backward substitution algorithms (LU, Cholesky, SVD, QR), FFT etc.
As you can see, many of the operations are already implemented in Core or in DNN module.
What needs to be done basically:
- implement the rest of API (those are mostly element-wise operations)
- support parallel implementation to take into account Amdahl law (i.e. even a cheap operation with single-thread implementation may become a bottleneck in a data processing pipeline on a multi-core machine if all other operations in the same pipeline are parallel)
- support broadcasting in binary operations. We partially support broadcasting in core (
A op A
andA op scalar
) and fully support broadcasting in dnn. Need to merge dnn implementation into core. - support multi-dimensional arrays. In most element-wise operations we already support multi-dimensional arrays, but in reduce(), flip(), transpose() and a few others we still don't.
- support FP16 and BF16 where possible (partly done in core already). Those 2 types become the main types for extensive data processing nowadays. Thankfully, even architectures w/o full support for FP16 and BF16 arithmetics still provide instructions to efficiently convert FP16 and BF16 to/from FP32, e.g. in Intel/AMD AVX2 there are such instructions. And on ARM v8.2 as well as on many modern GPUs FP16 arithmetic is supported natively, and that could give us ~2x acceleration in pipelines that do many arithmetic and matrix operations and/or transfer a lot of data.
So, once again, why is it important, besides the declaration that we 'sort of implemented' the emerging standard?
-
Efficient, high-quality implementations of basic array processing functions will allow us to reduce code duplication and use those functions in DNN module and maybe more efficiently implement higher-level image processing algorithms in imgproc, photo and maybe other modules.
-
We can introduce more or less future-proof HAL for vendors who would like to accelerate OpenCV 5+. They will see that we ask for the same API (at least at semantic level) as the whole Python+numpy+PyTorch+... community, which is a huge number of people, many companies.
-
The goal for OpenCV 5 is to introduce not just the new CPU HAL, but also non-CPU HAL. All above-mentioned functions should be able to use such a HAL. And then all the functionality that is built on top of this basic API (which we and community will gradually extend) will automatically run on GPU or other HAL-supporting accelerators. See a dedicated feature requests (New CPU HAL for OpenCV 5.0Β #25019, Introducing non-CPU HAL for OpenCV 5+Β #25025) where this HAL is described.
Metadata
Metadata
Assignees
Type
Projects
Status