Optimizing a Rust GPU matmul kernel
I read the excellent post Optimizing a WebGPU Matmul Kernel for 1TFLOP+ Performance by Zach Nussbaum and thought it might be fun to reimplement it with Rust GPU.
We'll follow Zach's original post closely, comparing and contrasting using Rust vs the WGSL and Typescript from his post.
At the end, I'll show some unique benefits of using Rust on the GPU.