Gluon: When Triton Isn't Low-Level Enough

Tue, 23 Sep 2025 00:00:00 +0000

My Journey from PyTorch to Gluon 🔗

After spending the last month diving into PyTorch, learning Triton, understanding CUDA, and even peeking at PTX/SASS assembly, I’ve come to a surprising realization: I’ve yet to meet anyone who’s actually writing raw CUDA code in production anymore. Everyone I’ve talked to – from ML engineers at startups to researchers at big tech companies – seems to have converged on Triton as their go-to solution for custom GPU kernels. And honestly? The fused kernels performance they’re getting is impressive enough that I understand why.

Learning Journey on Matt Suiche

Gluon: When Triton Isn't Low-Level Enough

My Journey from PyTorch to Gluon 🔗