<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Software Engineering on Matt Suiche</title><link>https://www.msuiche.com/categories/software-engineering/</link><description>Recent content in Software Engineering on Matt Suiche</description><generator>Hugo</generator><language>en-us</language><lastBuildDate>Sun, 28 Sep 2025 00:00:00 +0000</lastBuildDate><atom:link href="https://www.msuiche.com/categories/software-engineering/index.xml" rel="self" type="application/rss+xml"/><item><title>Multi-GPU Programming with AMD's Iris Framework for Triton</title><link>https://www.msuiche.com/posts/multi-gpu-programming-with-amds-iris-framework-for-triton/</link><pubDate>Sun, 28 Sep 2025 00:00:00 +0000</pubDate><guid>https://www.msuiche.com/posts/multi-gpu-programming-with-amds-iris-framework-for-triton/</guid><description>&lt;p&gt;GPU production constraints are creating infrastructure bottlenecks. Multi-GPU programming, particularly vendor-agnostic implementations, has become essential. In their &lt;a href="https://www.youtube.com/watch?v=H2bzSn5ZPks" target="_blank" rel="noopener"&gt;GPU Mode presentation&lt;/a&gt;, AMD Research engineers Muhammad Awad, Muhammad Osama, and Brandon Potter introduced Iris—a Python library that enables fine-grained multi-GPU programming in Triton. Similarly to my previous &lt;a href="https://www.msuiche.com/posts/gluon-when-triton-isnt-low-level-enough/"&gt;Gluon blogpost&lt;/a&gt;, this post captures my understanding and interpretation of their work, serving as both technical documentation and personal reference for this emerging multi-GPU programming paradigm.&lt;/p&gt;
&lt;h2 id="technical-problem"&gt;Technical Problem &lt;a href="#technical-problem" class="anchor"&gt;🔗&lt;/a&gt;&lt;/h2&gt;&lt;p&gt;Current multi-GPU programming uses bulk synchronous models (BSP) through libraries like NCCL. This model enforces sequential phases:&lt;/p&gt;</description></item></channel></rss>