<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Learning Journey on Matt Suiche</title><link>https://www.msuiche.com/categories/learning-journey/</link><description>Recent content in Learning Journey on Matt Suiche</description><generator>Hugo</generator><language>en-us</language><lastBuildDate>Tue, 23 Sep 2025 00:00:00 +0000</lastBuildDate><atom:link href="https://www.msuiche.com/categories/learning-journey/index.xml" rel="self" type="application/rss+xml"/><item><title>Gluon: When Triton Isn't Low-Level Enough</title><link>https://www.msuiche.com/posts/gluon-when-triton-isnt-low-level-enough/</link><pubDate>Tue, 23 Sep 2025 00:00:00 +0000</pubDate><guid>https://www.msuiche.com/posts/gluon-when-triton-isnt-low-level-enough/</guid><description>&lt;h1 id="my-journey-from-pytorch-to-gluon"&gt;My Journey from PyTorch to Gluon &lt;a href="#my-journey-from-pytorch-to-gluon" class="anchor"&gt;🔗&lt;/a&gt;&lt;/h1&gt;&lt;p&gt;After spending the last month diving into PyTorch, learning Triton, understanding CUDA, and even peeking at PTX/SASS assembly, I&amp;rsquo;ve come to a surprising realization: I&amp;rsquo;ve yet to meet anyone who&amp;rsquo;s actually writing &lt;a href="https://siboehm.com/articles/22/CUDA-MMM" target="_blank" rel="noopener"&gt;raw CUDA code in production anymore&lt;/a&gt;. Everyone I&amp;rsquo;ve talked to – from ML engineers at startups to researchers at big tech companies – seems to have converged on Triton as their go-to solution for custom GPU kernels. And honestly? The &lt;a href="https://www.gpumode.com/v2/leaderboard/496?tab=rankings" target="_blank" rel="noopener"&gt;fused kernels performance they&amp;rsquo;re getting is impressive enough&lt;/a&gt; that I understand why.&lt;/p&gt;</description></item></channel></rss>