Building high-performance, reproducible kernels forĀ AMD ROCmĀ just got a lot easier.
I've put together a guide on building, testing, and sharingĀ ROCm-compatible kernelsĀ using the Hugging FaceĀ kernel-builder and kernelsĀ libraries; so you can focus onĀ optimizing performanceĀ rather than spending time on setup.
Learn how to:
- Use Nix for reproducible builds - Integrate kernels as native PyTorch operators - Share your kernels on the Hub for anyone to use withĀ kernels.get_kernel()
We use the š award-winning RadeonFlow GEMM kernel as a practical example.