Matmul benchmark — cljrs vs numpy / jax / pytorch

Square float32 matrix multiplication: C = A · B with A, B ∈ ℝN×N. We build random matrices via (ml/randn N N), then time (ml/matmul A B) and (ml/matmul-gpu A B) via performance.now() around the wasm REPL call. Each configuration runs 3 times; we report the median. GFLOPS = 2·N³ / time / 1e9.

The reference columns are conservative numbers from published f32 matmul benchmarks on a typical mid-grade machine (Apple M2, 8-core, single-threaded numpy 1.26 + OpenBLAS; JAX 0.4 with XLA JIT on the same CPU). They are not measured live in your browser — they're a fixed yardstick so you can read the cljrs columns in context. Your machine, your browser's wasm SIMD support, and thermal state will all move the cljrs numbers around.

idle
N cljrs CPU
ms (median)
cljrs CPU
GFLOPS
cljrs GPU
ms (median)
cljrs GPU
GFLOPS
numpy ref
GFLOPS
JAX ref
GFLOPS
PyTorch ref
GFLOPS
click Run benchmark to start

Chart

Bars are GFLOPS (higher is better). cljrs columns appear once a run finishes.

Reference baselines

For honest comparison you should run those same benchmarks on your machine; the cljrs numbers above are real and live, the reference numbers are not. We'll publish a per-machine script in a follow-up.

Notes on the cljrs path