You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
FoldsCUDA.jl provides
Transducers.jl-compatible
fold (reduce) implemented using
CUDA.jl. This brings the
transducers and reducing function combinators implemented in
Transducers.jl to GPU. Furthermore, using
FLoops.jl, you can write
parallel for loops that run on GPU.
API
FoldsCUDA exports CUDAEx, a parallel loop
executor.
It can be used with the parallel for loop created with
FLoops.@floop,
Base-like high-level parallel API in
Folds.jl, and extensible
transducers provided by
Transducers.jl.
Examples
findmax using FLoops.jl
You can pass CUDA executor FoldsCUDA.CUDAEx() to @floop to run a
parallel for loop on GPU:
julia>using FoldsCUDA, CUDA, FLoops
julia>using GPUArrays:@allowscalar
julia> xs = CUDA.rand(10^8);
julia>@allowscalar xs[100] =2;
julia>@allowscalar xs[200] =2;
julia>@floopCUDAEx() for (x, i) inzip(xs, eachindex(xs))
@reduce() do (imax =-1; i), (xmax =-Inf32; x)
if xmax < x
xmax = x
imax = i
endendend
julia> xmax
2.0f0
julia> imax # the *first* position for the largest value100
extrema using Transducers.TeeRF
julia>using Transducers, Folds
julia>@allowscalar xs[300] =-0.5;
julia> Folds.reduce(TeeRF(min, max), xs, CUDAEx())
(-0.5f0, 2.0f0)
julia> Folds.reduce(TeeRF(min, max), (2x for x in xs), CUDAEx()) # iterator comprehension works
(-1.0f0, 4.0f0)
julia> Folds.reduce(TeeRF(min, max), Map(x ->2x)(xs), CUDAEx()) # equivalent, using a transducer
(-1.0f0, 4.0f0)