Local Models Within Reach: Everything That Changed in Eight Months

Sun, 05 Apr 2026 00:00:00 +0000

Eight months ago I published Building Agents for Small Language Models, a set of hard-won notes from shipping agents on 270M–32B parameter models. At the time, running useful local models meant embracing constraints: small context windows, CPU-only fallbacks, broken UTF-8 streams, and reasoning that fell apart past two steps.

I stand by that post. But the ground has shifted fast. What was a set of careful workarounds in August 2025 is starting to look like the default architecture for a large class of workloads. Local models are no longer the constrained sibling of cloud APIs — for many agent use cases, they are the better answer. Here is what has changed.

MoE on Matt Suiche

Local Models Within Reach: Everything That Changed in Eight Months