Porting CUDA FFT to Mojo: Achieving Bit-Exact Precision

Fri, 17 Oct 2025 00:00:00 +0000

Porting a CUDA Fast Fourier Transform (FFT) implementation to Mojo for the LeetGPU Fast Fourier Transform challenge presented an unexpected challenge: achieving bit-exact precision matching between CUDA’s sinf()/cosf() functions and their Mojo equivalents. This required PTX assembly analysis, cross-platform testing, and ultimately upgrading to Float64 precision for deterministic results.

Challenge Constraints 🔗

N range: $1 \leq N \leq 262,144$ (power-of-2 FFT sizes)
Data type: All values are 32-bit floating point numbers
Accuracy requirements: Absolute error $\leq 10^{-3}$, Relative error $\leq 10^{-3}$
Array format: Input and output arrays have length $2N$ (interleaved real/imaginary)

Initial Problem: Accuracy Mismatch 🔗

The initial Mojo FFT implementation failed correctness tests with a maximum absolute difference of 0.023 compared to the reference CUDA implementation. For a coding challenge requiring exact equality, this was unacceptable.

Numerical Computing on Matt Suiche

Porting CUDA FFT to Mojo: Achieving Bit-Exact Precision

Challenge Constraints 🔗

Initial Problem: Accuracy Mismatch 🔗