| CARVIEW |
webgpu-dawn: Haskell bindings to WebGPU Dawn for GPU computing and graphics
This package provides Haskell bindings to Google's Dawn WebGPU implementation, enabling GPU computing and graphics programming from Haskell. It wraps the gpu.cpp library which provides a high-level C++ interface to Dawn.
[Skip to Readme]
Flags
Manual Flags
| Name | Description | Default |
|---|---|---|
| glfw | Enable GLFW support for windowed graphics applications | Enabled |
Use -f <flag> to enable a flag, or -f -<flag> to disable that flag. More info
Downloads
- webgpu-dawn-0.1.1.0.tar.gz [browse] (Cabal source package)
- Package description (as included in the package)
Maintainer's Corner
For package maintainers and hackage trustees
Candidates
- No Candidates
| Versions [RSS] | 0.1.0.0, 0.1.1.0 |
|---|---|
| Dependencies | aeson (>=2.0 && <2.3), base (>=4.14 && <5), base64-bytestring (>=1.0 && <1.3), binary (>=0.8 && <0.9), bytestring (>=0.10 && <0.13), clock (>=0.8 && <0.9), containers (>=0.6 && <0.8), filepath (>=1.4 && <1.6), mtl (>=2.2 && <2.4), stm (>=2.5 && <2.6), text (>=1.2 && <2.1), transformers (>=0.5 && <0.7), unordered-containers (>=0.2.14 && <0.3), vector (>=0.12 && <0.14), webgpu-dawn [details] |
| License | MIT |
| Author | Junji Hashimoto |
| Maintainer | junji.hashimoto@gmail.com |
| Uploaded | by junjihashimoto at 2025-12-30T08:54:13Z |
| Category | Graphics, GPU |
| Home page | https://github.com/junjihashimoto/webgpu-dawn |
| Source repo | head: git clone https://github.com/junjihashimoto/webgpu-dawn |
| Distributions | |
| Executables | bench-async-matmul, bench-optimized-matmul, bench-subgroup-matmul, bench-linear, async-pipeline-demo, chrome-tracing-demo, high-level-api, struct-field-offset, particle-system, kernel-fusion, layout-demo, struct-generics-dsl, vector-add-dsl, matmul-subgroup-dsl, shared-memory-reduction |
| Downloads | 10 total (10 in the last 30 days) |
| Rating | (no votes yet) [estimated by Bayesian average] |
| Your Rating |
|
| Status | Docs uploaded by user Build status unknown [no reports yet] |
Readme for webgpu-dawn-0.1.1.0
[back to package description]webgpu-dawn
High-level, type-safe Haskell bindings to Google's Dawn WebGPU implementation.
This library enables portable GPU computing with a Production-Ready DSL designed for high-throughput inference (e.g., LLMs), targeting 300 TPS (Tokens Per Second) performance.
⚡ Core Design Principles
To achieve high performance and type safety, this library adheres to the following strict patterns:
- Type-Safe Monadic DSL: No raw strings. We use
ShaderMfor composability and type safety. - Natural Math & HOAS: Standard operators (
+,*) and Higher-Order Abstract Syntax (HOAS) for loops (loop ... $ \i -> ...). - Profile-Driven: Performance tuning is based on Roofline Analysis.
- Async Execution: Prefer
AsyncPipelineto hide CPU latency and maximize GPU occupancy. - Hardware Acceleration: Mandatory use of Subgroup Operations and F16 precision for heavy compute (MatMul/Reduction).
🏎️ Performance & Profiling
We utilize a Profile-Driven Development (PDD) workflow to maximize throughput.
1. Standard Benchmarks & Roofline Analysis
Run the optimized benchmark to determine TFLOPS and check the Roofline classification (Compute vs Memory Bound).
# Run 2D Block-Tiling MatMul Benchmark (FP32)
cabal run bench-optimized-matmul -- --size 4096 --iters 50
Output Example:
[Compute] 137.4 GFLOPs
[Memory] 201.3 MB
[Status] COMPUTE BOUND (limited by GPU FLOPs)
[Hint] Use F16 and Subgroup Operations to break the roofline.
2. Visual Profiling (Chrome Tracing)
Generate a trace file to visualize CPU/GPU overlap and kernel duration.
cabal run bench-optimized-matmul -- --size 4096 --trace
- Load: Open
chrome://tracingor ui.perfetto.dev - Analyze: Import
trace.jsonto identify gaps between kernel executions (CPU overhead).
3. Debugging
Use the GPU printf-style debug buffer to inspect values inside kernels.
-- In DSL:
debugPrintF "intermediate_val" val
🚀 Quick Start
1. High-Level API (Data Parallelism)
Zero boilerplate. Ideal for simple map/reduce tasks.
import WGSL.API
import qualified Data.Vector.Storable as V
main :: IO ()
main = withContext $ \ctx -> do
input <- toGPU ctx (V.fromList [1..100] :: V.Vector Float)
result <- gpuMap (\x -> x * 2.0 + 1.0) input
out <- fromGPU' result
print out
2. Core DSL (Explicit Control)
Required for tuning Shared Memory, Subgroups, and F16.
import WGSL.DSL
shader :: ShaderM ()
shader = do
input <- declareInputBuffer "in" (TArray 1024 TF16)
output <- declareOutputBuffer "out" (TArray 1024 TF16)
-- HOAS Loop: Use lambda argument 'i', NOT string "i"
loop 0 1024 1 $ \i -> do
val <- readBuffer input i
-- f16 literals for 2x throughput
let res = val * litF16 2.0 + litF16 1.0
writeBuffer output i res
📚 DSL Syntax Cheatsheet
Types & Literals
| Haskell Type | WGSL Type | Literal Constructor | Note |
|---|---|---|---|
Exp F32 |
f32 |
litF32 1.0 or 1.0 |
Standard float |
Exp F16 |
f16 |
litF16 1.0 |
Half precision (Fast!) |
Exp I32 |
i32 |
litI32 1 or 1 |
Signed int |
Exp U32 |
u32 |
litU32 1 |
Unsigned int |
Exp Bool_ |
bool |
litBool True |
Boolean |
Casting Helpers: i32(e), u32(e), f32(e), f16(e)
Control Flow (HOAS)
-- For Loop
loop start end step $ \i -> do ...
-- If Statement
if_ (val > 10.0)
(do ... {- then block -} ...)
(do ... {- else block -} ...)
-- Barrier
barrier -- workgroupBarrier()
🧩 Kernel Fusion
For maximum performance, fuse multiple operations (Load -> Calc -> Store) into a single kernel to reduce global memory traffic.
import WGSL.Kernel
-- Fuse: Load -> Process -> Store
let pipeline = loadK inBuf >>> mapK (* 2.0) >>> mapK relu >>> storeK outBuf
-- Execute inside shader
unKernel pipeline i
📚 Architecture & Modules
Execution Model (Latency Hiding)
To maximize GPU occupancy, encoding is separated from submission.
WGSL.Async.Pipeline: Use for main loops. Allows CPU to encode TokenN+1while GPU processes TokenN.WGSL.Execute: Low-level synchronous execution (primarily for debugging).
Module Guide
| Feature | Module | Description |
|---|---|---|
| Subgroup Ops | WGSL.DSL |
subgroupMatrixLoad, mma, subgroupMatrixStore |
| F16 Math | WGSL.DSL |
litF16, vec4<f16> for 2x throughput |
| Structs | WGSL.Struct |
Generic derivation for std430 layout compliance |
| Analysis | WGSL.Analyze |
Roofline analysis logic |
📦 Installation
Pre-built Dawn binaries are downloaded automatically during installation.
cabal install webgpu-dawn
License
MIT License - see LICENSE file for details.
Acknowledgments
- Dawn (Google): Core WebGPU runtime.
- gpu.cpp (Answer.AI): High-level C++ API wrapper inspiration.
- GLFW: Window management.
Contact
Maintainer: Junji Hashimoto junji.hashimoto@gmail.com