···11+---
22+atroot: true
33+template:
44+slug: spindle-microvm
55+title: Spindle's new microVM engine
66+subtitle: How we built the new QEMU-based microVM engine
77+date: 2026-06-16
88+image: https://assets.tangled.network/blog/microvm.png
99+authors:
1010+ - name: dawn
1111+ email: dawn@tangled.org
1212+ handle: ptr.pet
1313+---
1414+1515+Spindle gains a second engine: `microvm`. Each workflow gets its own little
1616+virtual machine, a whole real environment you can do anything inside. It's an
1717+upgrade from the Nixery engine while staying fully compatible with it, so if you
1818+already have a working Nixery workflow, just change `nixery` to `microvm` and it
1919+will work!
2020+2121+The interesting part is NixOS images: you configure the machine directly from
2222+the workflow file. A few things you can do:
2323+2424+You can bring services up:
2525+2626+```yaml
2727+services:
2828+ postgresql:
2929+ enable: true
3030+ ensureDatabases: ["spindle-workflow"]
3131+ ensureUsers:
3232+ - name: spindle-workflow
3333+ ensureDBOwnership: true
3434+```
3535+3636+You can build Docker containers:
3737+3838+```yaml
3939+virtualisation:
4040+ docker: true
4141+steps:
4242+ - name: "do the thing!"
4343+ command: docker build ...
4444+```
4545+4646+And you can use non-NixOS images too:
4747+4848+```yaml
4949+image: alpine
5050+steps:
5151+ - name: install golang
5252+ command: apk add go
5353+```
5454+5555+It's quick on the second run, too, because it caches aggressively: your
5656+dependencies, your services, and any other Nix derivation built inside the
5757+microVM get pushed to spindle's Nix cache, so the next workflow that needs them
5858+doesn't rebuild those. More on that [below](#the-nix-cache-both-ways).
5959+6060+And like everything else in tangled, the whole thing is self-hostable, so you
6161+can run your own spindle with the microVM engine on your own hardware (see the
6262+[self-hosting
6363+guide](https://docs.tangled.org/spindles.html#self-hosting-guide)). If you want
6464+fuller examples, there are [recipes in the
6565+docs](https://docs.tangled.org/spindles.html#recipes) too.
6666+6767+## What's in a microVM
6868+6969+A microVM is just a VM with most of the boring parts removed. There's no BIOS,
7070+no PCI bus to probe, no emulated graphics card, none of the slow legacy stuff a
7171+normal QEMU machine drags along. You get virtio devices and not much
7272+else, which means it boots very quickly and uses very little memory. Right now
7373+QEMU is the only runner we support, but the engine is written so that other
7474+runners (firecracker for example) can slot in later.
7575+7676+Inside the guest there's a small piece of software we call the agent. Spindle
7777+never SSHes in or runs commands "from the outside"; instead the agent dials back
7878+to spindle over vsock the moment it boots, says hello, and from then on every
7979+step of your workflow is sent to it as a message. The agent runs the command as
8080+an unprivileged user, streams stdout and stderr back, and reports the exit code.
8181+The host side of this lives in
8282+[`spindle`](https://tangled.org/tangled.org/core/tree/master/spindle/engines/microvm/agent.go)
8383+and the guest side is a little Rust binary called
8484+[`shuttle`](https://tangled.org/tangled.org/core/tree/master/shuttle).
8585+(`shuttle` implements
8686+[`agentproto`](https://tangled.org/tangled.org/core/tree/master/spindle/) which
8787+is the protocol used by `spindle`. Technically speaking anyone could implement
8888+this and, assuming side effects hold, you could have your own agent!)
8989+9090+<!-- DIAGRAM 1 (the big picture): A box representing a machine that spindle and
9191+guests run on, both spindle service and guests are inside. Draw spindle as a
9292+service running on the server (its own box). Inside that spindle box, draw the
9393+small throwaway virtual machine your workflow actually runs in (the guest).
9494+spindle is the parent that creates, owns and tears down the guest, so the guest
9595+lives wholly inside it. Between spindle and the guest it contains, draw a
9696+private direct wire (not the normal network) with a few labelled arrows showing
9797+what travels across it: spindle sends each step's command in, the guest streams
9898+the logs back out, the guest asks for software packages and gets them, and the
9999+guest asks to look up website addresses (DNS). Make the point that the guest is
100100+sealed off / isolated: it can only talk to the outside through spindle via this
101101+one wire, and it can't reach anything else on the server's network on its own.
102102+Probably show multiple guests?
103103+-->
104104+105105+## Two kinds of images
106106+107107+There are two "flavours" of image you can boot, and they're aimed at fairly
108108+different people.
109109+110110+The first is **NixOS images**. These are the interesting ones: because the whole
111111+guest is built with Nix, you can configure it from your workflow file directly.
112112+Things like `dependencies`, `services`, `virtualisation` (e.g. Docker),
113113+`registry` and `caches` are all written right there in the YAML, and the guest
114114+agent builds and activates that config before any of your steps run. If we've
115115+built that exact base plus config before, spindle can just hand the guest a
116116+store path to realize (fetching from whatever cache `spindle` has configured)
117117+instead of rebuilding it, so the second run is quick.
118118+119119+The second is **non-NixOS images**, which today just means Alpine, but can be
120120+anything. You don't get the workflow-level NixOS config here (there's no NixOS
121121+to configure), but if Nix happens to exist inside the image, like it does in our
122122+Alpine one, it can still talk to the spindle Nix cache just fine.
123123+124124+## An example NixOS workflow
125125+126126+If you've used spindle before, this will look familiar: it's the same manifest
127127+you already know, just with a few extra keys that the NixOS image understands.
128128+Here's a workflow that needs Postgres to test against and Docker to build an
129129+image:
130130+131131+```yaml
132132+# .tangled/workflows/test.yaml
133133+engine: microvm
134134+135135+when:
136136+ - event: ["push", "pull_request"]
137137+ branch: ["master"]
138138+139139+image: nixos
140140+141141+dependencies:
142142+ - go
143143+ - github:nixos/nixpkgs#hello
144144+145145+registry:
146146+ nixpkgs: github:nixos/nixpkgs/nixos-unstable
147147+148148+caches:
149149+ https://nix-community.cachix.org: "nix-community.cachix.org-1:mB9FSh9qf2dCimDSUo8Zy7bkq5CX+/rkCWyvRCYg3Fs="
150150+151151+services:
152152+ postgresql:
153153+ enable: true
154154+ ensureDatabases: ["spindle-workflow"]
155155+ ensureUsers:
156156+ - name: spindle-workflow
157157+ ensureDBOwnership: true
158158+159159+virtualisation:
160160+ docker: true
161161+162162+steps:
163163+ - name: run tests
164164+ environment:
165165+ PGHOST: /run/postgresql
166166+ command: |
167167+ docker build -t app .
168168+ psql -c "select 1"
169169+ go test ./...
170170+```
171171+172172+The new keys each do one job:
173173+174174+- **`dependencies`** are the packages your steps get to use. They go into a
175175+ `mkShellNoCC` devshell that every step sources before it runs, so you get the
176176+ whole stdenv environment (setup hooks like `pkg-config` wiring up
177177+ `PKG_CONFIG_PATH`, etc.) and not just the bare binaries. That means you can
178178+ use a dependency like `openssl` and compile the `openssl-sys` Rust crate
179179+ without pain! A bare name like `go` is looked up in nixpkgs (same as Nixery),
180180+ but you can also point at any flake with the `flakeref#attr` syntax, so
181181+ `github:nixos/nixpkgs#hello` pulls `hello` straight out of that flake.
182182+- **`registry`** is how you remap the global refs. Here we pin `nixpkgs` to
183183+ `nixos-unstable`, so now the bare `go` above resolves from unstable. You can
184184+ alias your own flakes the same way (`myflake: github:me/x`, then
185185+ `myflake#tool` in `dependencies`).
186186+- **`caches`** is a map of binary cache URL to its trusted public key. They get
187187+ wired into the read proxy (more on that just below), so the guest can
188188+ substitute prebuilt paths from them instead of building everything from
189189+ scratch.
190190+191191+`services` and `virtualisation` are the interesting parts: they're passed
192192+straight through to NixOS, so anything you could write in a NixOS config you can
193193+write here. `services.postgresql.enable` brings Postgres up before any of your
194194+steps run.
195195+196196+Since steps run as the `spindle-workflow` user, naming a database after that
197197+user with `ensureDBOwnership` is the easy path to a working DB -- Postgres peer
198198+auth maps the unix user straight to the matching role, so `psql` connects over
199199+the socket with no password and no extra setup (this name-matching is a NixOS
200200+requirement for `ensureDBOwnership`, if you want a differently named DB you'd
201201+grant access yourself).
202202+203203+`virtualisation.docker: true` is shorthand for `virtualisation.docker.enable =
204204+true`, which gets you a real Docker daemon inside the VM. By the time your first
205205+step runs, Postgres is listening and the Docker socket is there, no sidecar
206206+dance, it's just part of the machine.
207207+208208+(`true` works as shorthand for `.enable = true` anywhere an `enable` option
209209+exists, so most "just turn this on" services are a one-liner!)
210210+211211+## The architecture
212212+213213+<!-- DIAGRAM 2 (the life of a workflow?): a left-to-right timeline (?) showing the
214214+stages a workflow passes through, one after another. (1) Read the workflow file
215215+and figure out which machine image it wants. (2) Wait in line. The workflow
216216+asks for some memory, CPU and disk, and has to wait its turn until that much is
217217+free. (3) Set up and boot the little virtual machine. (4) Run your steps, one at
218218+a time, inside it. (5) Tear everything down and clean up afterwards.
219219+220220+There is also two mermaid diagrams in spindle/engines/microvm/README.md that could
221221+be useful for this as source?
222222+-->
223223+224224+### Nix cache, both ways
225225+226226+Spindle talks to its Nix cache through two proxies that run on the host, so the
227227+guest never needs credentials or direct network access to reach it. Like the
228228+agent, they use vsock to talk to spindle.
229229+230230+The read proxy fans out to the configured substituters plus any caches you
231231+listed in your workflow, so when the guest needs to realize a store path it asks
232232+the proxy and the proxy fetches it. The request is sent concurrently to the read
233233+caches, so the one that answers it first wins.
234234+235235+The upload proxy goes the other way: any path built inside the guest gets pushed
236236+back out to spindle's Nix cache (if one is configured), so the next workflow
237237+that needs it doesn't have to build it again. Any paths that already exist on
238238+any of the configured read caches won't be uploaded. As the agent reports
239239+built paths, they're queued and uploaded in the background while the rest of the
240240+workflow keeps running, so uploads overlap with work instead of blocking it. If
241241+any are still in flight when we reach VM teardown, the workflow waits until
242242+everything has drained.
243243+244244+Spindle can be configured to use `http`, `ssh-ng` or `ssh` URLs as a binary
245245+cache to upload to, so for example, `ssh-ng://localhost` would just upload to
246246+the local Nix store on the machine that the spindle runs on! `ssh-ng` and `ssh`
247247+require Nix to be present in PATH so that the spindle can use `nix copy` to
248248+upload to them, but if you are using a binary cache that supports `http` (for
249249+example, [ncps](https://github.com/kalbasit/ncps)) Nix does not need to be
250250+present.
251251+252252+### Building the images
253253+254254+Image builds are done with Nix. For NixOS we lean on
255255+[microvm.nix](https://github.com/microvm-nix/microvm.nix) and layer our own bits
256256+on top (stripping down kernel modules, configuring users, etc.). For Alpine
257257+there's a smallish Nix definition that fetches the kernel, the initrd and the
258258+kernel modules, sets up an init script that configures the machine on boot,
259259+copies in the dependencies we want (`nix`, `git`, etc.) and compresses the whole
260260+rootfs into a squashfs.
261261+262262+None of this *has* to be Nix, though. As far as spindle is concerned an image is
263263+valid as long as a few things hold: a guest agent (that implements `agentproto`)
264264+is present and gets started on boot, a `spindle-workflow` user exists, and the
265265+work directory is set up at `/workspace`. That can be built however you like.
266266+267267+### Finding an image
268268+269269+Every built image ships a `spec.json` next to its artifacts. The spec is the
270270+whole contract: where the kernel and initrd and read-only store disk live, the
271271+boot args, how much memory and how many vCPUs to give it, the shell to run steps
272272+in, the writable volumes, the network interfaces, and the runner-specific knobs
273273+(machine type, CPU, extra QEMU args). NixOS images also carry a `baseConfigHash`
274274+identifying the base config baked in (this is the hash of
275275+`nixosSystem.config.system.build.toplevel.outPath`).
276276+277277+A workflow picks an image with the `image` key at the top level. The name is
278278+matched literally against what's on disk, we look for a directory called
279279+`<name>` with a `spec.json` in it, then fall back to a flat `<name>.json`. The
280280+nice property here is that resolution depends *only* on the name and what's on
281281+disk, never on the host doing the resolving, so the same workflow resolves to
282282+the same image on every spindle. If an operator keeps multiple arches side by
283283+side they can name them `nixos-x86_64`, `alpine-aarch64` and so on (that suffix
284284+is just part of the name, it's not handled specially). If you want, for example,
285285+`nixos` to work, you can just symlink `nixos` to `nixos-x86_64`.
286286+287287+Right before launch we double-check the referenced files actually exist
288288+and that the host has the tools we need: `mkfs.ext4` for the volumes, the
289289+QEMU binary for the spec's arch, `/dev/kvm` and `/dev/vhost-vsock`, plus
290290+the `ip` / `mount` / `slirp4netns` / `unshare` toolchain if the image
291291+wants networking.
292292+293293+### The life of a workflow
294294+295295+A workflow moves through a handful of stages: it gets parsed and its
296296+image resolved, it waits for a slot, it gets set up, its steps run, and
297297+then everything is torn down.
298298+299299+The waiting bit matters a lot. Each image declares how much memory, how many
300300+vCPUs and how much disk it needs, and a workflow has to acquire a slot from a
301301+resource scheduler before anything boots. The scheduler is work-conserving with
302302+aging and per-user fairness, so one person submitting a hundred jobs won't
303303+starve everyone else, and slots don't sit idle if there's work that fits in the
304304+budget.
305305+306306+Once a slot is acquired, we do the setup. Spindle allocates a random vsock CID
307307+for the guest and registers it with the agent hub. It creates the per-workflow
308308+work directory, starts the two cache proxies (described earlier), a DNS proxy
309309+that resolves through the host and filters out private/special-use addresses,
310310+then creates the VM: writable volumes become sparse files formatted ext4, the
311311+store disk is attached read-only, and QEMU is started with `-sandbox on`,
312312+`-nodefaults`, no display, no monitor, etc. with serial (on boot) /
313313+`virtio_console` output to a log file and a QMP socket for control.
314314+315315+Then we wait for the machine. We poll QMP until QEMU says the guest is running,
316316+then wait for the agent's handshake to arrive over vsock from the CID we expect.
317317+The agent tells us its protocol and versions, and spindle sends back the job id,
318318+the trusted cache public keys, and the cache and DNS proxy ports. From there
319319+steps run one at a time as `$shell -lc <command>`, as the unprivileged workflow
320320+user in `/workspace/repo`, with the right environment and any unlocked secrets.
321321+If the workflow activates a NixOS config and we've already built that exact base
322322+plus config, the activation step can realize a cached toplevel store path instead
323323+of rebuilding. Either way, whether it's building the config fresh or pulling a
324324+cached toplevel down, that output streams straight into the activation step's log
325325+as it happens, so you can watch the closure come in instead of staring at a blank
326326+screen wondering if anything's happening.
327327+328328+Timeouts are cooperative: we work out a deadline from the workflow timeout
329329+and send it to the guest, with a little grace on our side so the guest
330330+gets a chance to report the timeout itself rather than us just yanking the
331331+machine out from under it. And if the VM crashes mid-step we tail the
332332+serial and QEMU logs into the step's stderr, because "guest agent
333333+connection lost: EOF" is a genuinely useless thing to read at 2am...
334334+335335+Teardown is the same whether the workflow passed, failed or timed out:
336336+drain any pending Nix cache uploads, ask the agent to power off, wait for
337337+QEMU to exit (falling back to a QMP `system_powerdown`, and finally a
338338+kill if it's being stubborn), then close the proxies and remove the work
339339+directory.
340340+341341+### Locking down the network
342342+343343+A VM that can reach the host's local network is a VM that can reach things it
344344+has no business reaching. So QEMU doesn't run in the host's network namespace at
345345+all. We `unshare` into fresh user, net and mount namespaces first. Inside that
346346+namespace a small wrapper bind-mounts a resolv.conf pointing at `127.0.0.1` so
347347+that QEMU's built-in slirp DNS isn't used, then installs blackhole routes for
348348+every special-use IP range (RFC 6890, so private networks, link-local, loopback,
349349+etc.) before it execs QEMU. `slirp4netns` then provides the namespace's outbound
350350+internet connection, with `--disable-host-loopback`, sandbox and seccomp all on.
351351+QEMU runs *inside* that namespace, and the guest's network card is attached to
352352+QEMU's own built-in user-mode networking. So every packet from the guest takes
353353+two hops: guest → QEMU's slirp → the namespace's `slirp4netns` → the internet.
354354+The guest never sees the host's network and the host's network never sees the
355355+guest. All of this is done without needing any privileges!
356356+357357+Guest DNS doesn't use either slirp layer. The guest's `/etc/resolv.conf` points
358358+at shuttle on `127.0.0.1:53`, and shuttle forwards DNS packets over vsock to
359359+the host-side DNS proxy. That proxy resolves through the host's real resolver
360360+and strips any answers that point at private or special-use addresses, so guest
361361+traffic can only ever reach the outside world, never the host or anything on its
362362+local networks.
363363+364364+### Budgets and cgroups
365365+366366+The scheduler's budget is bookkeeping on its own, it tracks what it's handed
367367+out, and the runner (QEMU) will ensure that a workflow only gets those. But
368368+optionally the whole thing (QEMU and slirp4netns both) gets placed in a
369369+per-workflow cgroup with memory, swap etc. limits, which is an extra enforcement
370370+layer on top, considering QEMU and slirp4netns themselves also use resources. A
371371+nice side effect is that when the cgroup OOM-kills the VM we can see that it was
372372+an OOM and report it as such, instead of surfacing it as a generic crash and
373373+leaving you guessing.
374374+375375+The spindle itself also gets a cgroup with `memory.min` set, which means that in
376376+a host OOM situation, it should be the workflows that die first, not the spindle
377377+itself.
378378+379379+## On the roadmap
380380+381381+A few things that are coming next:
382382+383383+- [firecracker](https://github.com/firecracker-microvm/firecracker) runner
384384+ support. QEMU microVMs are good and all, but firecracker VMs are more
385385+ efficient to run concurrently and are leaner overall.
386386+- ssh-on-fail: when a workflow fails, you should be able to ssh in to debug why.
387387+ This can be really useful in situations where you need just *a little* bit
388388+ more info if something unexpected fails so you don't sit around there running
389389+ the workflow 10 times over.
390390+391391+Feel free to come and ask any questions you might have on https://chat.tangled.sh!