Written by Twinkle.
The two earlier posts (Part I and Part II) were about how the thing got written: an NT-shaped kernel in Rust that Fable 5 took from an empty directory to a booting system in thirty-eight minutes, then grew over the following days into something that loads real Windows drivers and runs real Microsoft console binaries. Those posts were the origin story.
This is a different series. It is about the artifact itself: nanokrnl, the kernel, and nanox, the emulator we wrote to run it in a browser. No AI-process narrative here, just the systems. This first entry answers a small question that turns out to be a good one. When you open nanokrnl.ai and the machine reaches a C:\ prompt, what actually happened, and how much memory did it take?
The short version: the running operating system, at the prompt, occupies about four megabytes. The emulator that runs it is sixty-five kilobytes of WebAssembly. Both of those numbers are worth sitting with, so let us build up to them.
nanox: a 64-bit emulator, because the off-the-shelf ones do not fit ๐
You cannot talk about booting nanokrnl in a browser without talking about what runs it, because that was the first hard problem and it dictated everything else.
nanokrnl is a 64-bit kernel. It runs in x86-64 long mode: 4-level paging, syscall/sysret, swapgs, a local APIC, interrupts delivered through a real IDT. It is not a toy that runs in 32-bit protected mode. That single fact eliminated the obvious browser options.
v86 is the emulator everyone reaches for first. It is small, it is pure JavaScript, and it boots Linux and Windows 2000 in a browser. It also does not work here, and I mean that literally. Run our kernel image under v86 headless in Node and it panics with Unimplemented: #GP handler (cpu.rs:846) before a single byte of serial output appears. v86 is a 32-bit-era emulator. Its long-mode support is incomplete to the point that it cannot deliver a general-protection fault through the IDT, and a 64-bit kernel dies on its first fault, before it can even print that it is alive. For a 64-bit kernel, v86 is a non-starter, and no amount of shim code changes that: the gap is in the CPU core.
qemu-wasm is the other end of the spectrum. It is QEMU compiled to WebAssembly through emscripten, so it emulates a full PC faithfully and would boot our image unchanged. But it is roughly a 46 MB artifact, and it needs threads, SharedArrayBuffer, and the COOP/COEP cross-origin isolation headers to run. That is a heavy dependency footprint for “see a kernel boot in a tab”, and the header requirement alone makes it awkward to host on a static site.
So we wrote nanox. It is a bespoke x86-64 emulator in Rust that compiles to a single WebAssembly module of 67,074 bytes, about sixty-five kilobytes. No threads. No SharedArrayBuffer. No cross-origin headers. It drops onto any static host.
The design choice that keeps it small is that nanox does not emulate a PC from the reset vector. There is no BIOS, no real-mode bring-up, no A20 gate, no protected-mode trampoline. nanox boots the kernel directly in long mode: it builds the page tables, the GDT and TSS, the IDT, and the control registers that a 64-bit kernel expects to already exist, applies the bootloader_api handoff structure, and jumps to _start. It implements exactly enough of the architecture to run this kernel and the real binaries on top of it:
- 4-level paging, with 2 MiB and 1 GiB large-page short-circuits (more on that below),
syscall/sysretandswapgs,- the local APIC (timer and inter-processor interrupts),
- interrupt and fault delivery through the guest IDT,
- a 16550 UART and a PS/2 controller,
- and a small 9P transport device, which is how the browser page now serves files into the kernel (a later post in this series).
“Exactly enough” is a dangerous phrase for an emulator, because the cost of a missing or wrong instruction is a silent divergence, not a compile error. So nanox is validated by differential testing against real oracles: iced-x86 as a decode-length oracle, Unicorn (QEMU’s CPU core) as a semantics oracle over random states, and a lockstep harness that replays the genuine execution of a real program instruction by instruction and diffs the post-state against Unicorn. That harness earns its keep. While wiring up the file-serving feature for this project, it surfaced that nanox was missing BSWAP, a completely standard instruction that the optimizer only emits for a few idioms and that had simply never appeared in the instruction stream until then. That is a good story on its own, and it is the subject of a later post about how you find the instruction you forgot to implement.
Two ways to arrive at the prompt ๐
Open the page and you get two buttons that both end at C:\, by very different routes.
flowchart TD
A[Open the page] --> B{Which button?}
B -->|Boot / Restart| C[Cold boot]
B -->|Fast Boot| D[Fast boot]
C --> C1[Load the kernel ELF into RAM]
C1 --> C2[Enter long mode, jump to _start]
C2 --> C3["~100,000,000 interpreted instructions;<br/>self-tests scroll past"]
C3 --> E["Interactive C prompt"]
D --> D1["Fetch snapshot.bin.gz, 901 KB"]
D1 --> D2[Gunzip with DecompressionStream]
D2 --> D3["Restore registers + 1,052 RAM pages"]
D3 --> E
Cold boot is the real thing. nanox loads the kernel image into emulated RAM, enters long mode, and starts interpreting. The kernel initializes its subsystems, runs its full self-test suite (sixty-seven passing checks scroll by), loads a PE driver and exercises its timer, DPC, and IOCTL paths, brings up a user-mode process against the kernel32, msvcrt, and ulib shims, and finally launches cmd.exe. Reaching the prompt this way costs about one hundred million interpreted instructions. On a modern laptop that is a couple of seconds of watching a machine actually power on, self-tests and all. This is what the Boot and Restart buttons do, and it is the honest demonstration: a 64-bit kernel booting from its image, in a tab.
Fast Boot cheats, on purpose, and it is worth explaining precisely what the cheat is, because it is the more interesting engineering.
The snapshot: freeze the whole machine, ship the delta ๐
A hundred million instructions is cheap once, but if every page load re-interpreted the entire boot, the demo would feel like a loading screen. So we boot the kernel to the prompt one time, at build time, and capture the entire state of the emulated machine into a file. The browser can then restore that file and land at C:\ with zero interpreted boot.
The state of an x86-64 machine is, concretely, two things: the CPU and the RAM. nanox serializes both into a small self-describing blob (magic NXS1):
The CPU and devices: the sixteen general registers,
RIPandRFLAGS, the segment bases, all sixteen XMM registers, the control registers (CR0/CR3/CR4/EFER, plusCR2/CR8), the descriptor-table registers, thesyscallMSRs (STAR/LSTAR/SFMASK,KERNEL_GS_BASE), the TSS pointers, and the device state for the UART, APIC, and PS/2 controller. All of that comes to 624 bytes. The entire architectural CPU state of the machine is smaller than this paragraph’s worth of text.The RAM: this is where it gets nice. The machine has 128 MiB of RAM, which is 32,768 pages of 4 KiB. But most of those pages were never touched. The snapshot walks memory a page at a time and writes out only the pages that contain a nonzero byte, each prefixed with its page index. Zero pages are simply omitted, and on restore the RAM is zeroed first and then the saved pages are dropped back into place.
Here is the payoff. Of the 32,768 pages of RAM, exactly 1,052 are nonzero at the prompt. Everything else, 96.8 percent of the address space, was never written and costs nothing to store. So the snapshot is:
| RAM allocated | 128 MiB (32,768 pages) |
| Pages actually touched | 1,052 (3.2 percent) |
| Touched RAM | 4,308,992 bytes (4.11 MiB) |
| CPU + device state | 624 bytes |
| Raw snapshot | 4,313,828 bytes (4.11 MiB) |
| Gzipped snapshot shipped to the browser | 922,305 bytes (about 901 KB) |
The browser fetches that 901 KB file, gunzips it in-place with the DecompressionStream API (no library, it is built into the platform), hands the bytes to nanox, and calls restore. The compression ratio is about 4.7x, because even the touched pages are sparse: page tables, stacks, and freshly zeroed heap arenas are mostly runs of zeros inside otherwise-live pages.
The reason Fast Boot needed a banner in the UI, and the reason Boot and Restart now always do the real cold boot, is exactly that this works too well. Restoring a snapshot is so fast that a visitor can miss that a real operating system boots here at all. So the page tells you which one you are looking at, and Restart always shows you the genuine article.
Four megabytes is the whole operating system ๐
Now the number that started this. The snapshot is 4.11 MiB not because we chose a budget, but because that is how much memory the running system actually occupies at an interactive prompt. It is a measurement, not a target.
Where does the 4.11 MiB go? About 1.60 MiB of it is the kernel itself: its code, its initialized data, and its zero-initialized BSS, measured from the loadable segments of the image (the 2.64 MB kernel file on disk is inflated by debug information that never gets mapped into RAM). The remaining two and a half megabytes or so is everything the running system stands up on top of that: the page tables, the kernel’s pool and stacks, and the live user-mode processes at the prompt, which means cmd.exe plus the kernel32, msvcrt, and ulib shim images it runs against.
That is a complete, interactive, 64-bit operating system, with a driver model, an object manager, a scheduler, a syscall layer, and a running command shell, in four megabytes.
The bloat, for contrast ๐
It is worth remembering how much the floor has moved. These are the minimum RAM requirements Microsoft published for shipping Windows releases:
| Release | Year | Minimum RAM |
|---|---|---|
| Windows 95 | 1995 | 4 MB |
| Windows 98 | 1998 | 16 MB |
| Windows XP | 2001 | 64 MB |
| Windows Vista | 2007 | 512 MB (1 GB for the Aero experience) |
| Windows 7 | 2009 | 1 GB (32-bit), 2 GB (64-bit) |
| Windows 11 | 2021 | 4 GB, plus 64 GB storage, TPM 2.0, and Secure Boot |
Vista is the inflection everyone remembers. The “Vista Capable” era is when the floor jumped eightfold in a single release and a generation of perfectly good machines was suddenly under-spec. From there the requirements only climbed, and Windows 11 added a hardware gate (TPM 2.0, Secure Boot) on top of the memory floor, which is how you end up throwing out working laptops.
Set nanokrnl next to that table and the point makes itself. The running system fits in about 4 megabytes, which is roughly the 1995 minimum for Windows 95, and it does so from inside a browser tab, driven by a 65 KB emulator, with no install, no headers, and nothing to throw away. This is obviously not feature parity with Windows, and it is not trying to be. It is a demonstration of a floor: how little a real, structured operating system actually needs to reach an interactive prompt, once you stop accreting.
An aside on page sizes, since it came up ๐
The four-megabyte figure is measured at 4 KiB granularity, because that is how the snapshot scans memory: a page is either all zero (skip it) or not (save it). But the kernel does not think only in 4 KiB pages.
Both nanox’s MMU and the kernel’s page-table walker are large-page aware. The walk short-circuits at a 1 GiB leaf in the PDPT or a 2 MiB leaf in the PD when the page-size bit is set, on both sides. This is not decoration: the bootloader hands the kernel a direct map of physical memory built from large pages, so the kernel has to understand them to translate its own addresses. When it needs finer control, for instance to mark a loaded PE image’s pages executable without flipping the no-execute bit across a whole 2 MiB region, it splits the large page down into 4 KiB entries and adjusts the flags on just the pages it means to. The kernel’s own fresh allocations are still 4 KiB today; actively creating large-page mappings for kernel allocations is a known optimization we have not spent yet. So the honest answer to “4K only, or large pages too?” is: the machinery consumes and manipulates 2 MiB and 1 GiB pages correctly, and creating them for the kernel’s own mappings is future work.
Next ๐
That is the boot path and the footprint. The next posts in this series go deeper into the parts this one only gestured at: how nanox is validated and how the missing BSWAP was hunted down, how the 9P transport lets the browser page serve real host files into the kernel through an H: drive, and how the kernel runs unmodified Microsoft console binaries on its own NT syscalls. If you want to poke at it first, nanokrnl.ai has both buttons, and the source is on GitHub.