RustBPE: High-Performance BPE Tokenizer Training in Rust

Wed, 15 Oct 2025 00:00:00 +0000

Introduction 🔗

Byte Pair Encoding (BPE) tokenization is used in modern language models, but efficient training implementations are limited. OpenAI’s tiktoken handles inference well, while HuggingFace’s tokenizers supports training but has complexity and overhead. RustBPE is a Rust implementation that provides training capabilities with better performance.

RustBPE was developed by Andrej Karpathy as part of the nanochat project. This analysis covers the RustBPE implementation, including its architecture, performance characteristics, and Python integration.

For those interested in understanding BPE implementation from first principles, Sebastian Raschka provides an excellent deep-dive into implementing BPE from scratch in his blogpost, and this is also covered in his book “Build a Large Language Model (From Scratch)”. His work offers invaluable insights into the algorithmic foundations that underpin implementations like RustBPE.

Optimizing AlphaFold's Triangle Multiplicative Update: A First Look at GPU Performance Engineering

Tue, 30 Sep 2025 00:00:00 +0000

Background 🔗

I recently encountered the GPU MODE TriMul challenge while exploring GPU optimization. Coming from a systems engineering background without prior PyTorch or Triton experience, this challenge provided an opportunity to learn GPU performance engineering through a practical problem.

The Triangle Multiplicative Update (TriMul) is a core operation in AlphaFold2 and AlphaFold3—the protein structure prediction systems that earned the 2024 Nobel Prize in Chemistry. The operation’s O(n³) complexity creates severe performance bottlenecks in production, forcing AlphaFold3 to use batch size 1 during training despite having under 1B parameters. This makes the optimization problem both practically relevant and technically challenging.

Performance on Matt Suiche

RustBPE: High-Performance BPE Tokenizer Training in Rust

Introduction 🔗

Optimizing AlphaFold's Triangle Multiplicative Update: A First Look at GPU Performance Engineering

Background 🔗