You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Loopy: Transformation-Based Generation of High-Performance CPU/GPU Code
Loopy lets you easily generate the tedious, complicated code that is necessary
to get good performance out of GPUs and multi-core CPUs.
Loopy's core idea is that a computation should be described simply and then
transformed into a version that gets high performance. This transformation
takes place under user control, from within Python.
It can capture the following types of optimizations:
Vector and multi-core parallelism in the OpenCL/CUDA model
Data layout transformations (structure of arrays to array of structures)
Loop unrolling
Loop tiling with efficient handling of boundary cases
Prefetching/copy optimizations
Instruction level parallelism
and many more!
Loopy targets array-type computations, such as the following:
dense linear algebra,
convolutions,
n-body interactions,
PDE solvers, such as finite element, finite difference, and
Fast-Multipole-type computations.
It is not (and does not want to be) a general-purpose programming language.
Loopy is licensed under the liberal MIT license and free for commercial, academic,
and private use. All of Loopy's dependencies can be automatically installed from
the package index after using:
pip install loopy
In addition, Loopy is compatible with and enhances
pyopencl.