You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
tsimd - Fundamental C++ SIMD types for Intel CPUs (sse to avx512)
This library is header-only and is implemented according to which Intel ISA
flags are enabled in the translation unit for which they are used (e.g. -mavx
with gcc or clang).
Master Status:
TODOs (contributions welcome!)
unsigned integer pack<> types
support for other CPU ISAs
Build Requirements
Using tsimd
C++11 compiler
(unofficial list of compilers, not all are tested)
GCC >= 4.8.1
clang >= 3.4
ICC >= 16
Visual Studio 2015 (64-bit target)
Building tsimd's examples/benchmarks/tests and installing from soure
cmake >= 3.2
Library layout and usage
The library is logically composed of 3 different components:
The pack<T, W> class, which is a logical SIMD register
Functions which can load and store packs in and out of larger arrays.
Operators and functions to manipulate packs.
While there does not yet exist any true documentation, users are encouraged to
see what type aliases are defined in tsimd/detail/pack.h, as well as what
operators and functions are available in tsimd/detail/operators/ and
tsimd/detail/functions/ respectively. Generally speaking, each header
found in detail/ encapsulates exactly one type, operator, or function to
aide in discovery.
Example
SAXPY
Consider the following function (kernel) taking values from two input arrays
and storing in an output array.
// NOTE: n is the length of all 3 arraysvoidsaxpy(float a, int n, float x[], float y[], float out[])
{
for (int i = 0; i < n; ++i) {
constfloat xi = x[i];
constfloat yi = y[i];
constfloat result = a * xi + yi;
out[i] = result;
}
}
This kernel ends up applying the exact same formula to every element in the
data. SIMD instructions enable us to reduce the total number of iterations by a
factor of the CPU's SIMD register size. We do this by using tsimd types
instead of builtin types.
// NOTE: n is the length of all 3 arraysvoidsaxpy_tsimd(float a, int n, float x[], float y[], float out[])
{
usingnamespacetsimd;for (int i = 0; i < n; i += vfloat::static_size) {
const vfloat xi = load<vfloat>(&x[i]);
const vfloat yi = load<vfloat>(&y[i]);
const vfloat result = a * xi + yi; // same formula!store(result, &out[i]);
}
}
The advantage to this version (instead of using a specific SIMD width, say
vfloat4 or vfloat8) is that the kernel function will be "widened"
to the best available width based on how it gets compiled. In other words:
4-wide for SSE, 8-wide for AVX/AVX2, and 16-wide for AVX512.
About
Fundamental C++ SIMD types for Intel CPUs (sse, avx, avx2, avx512)