···11----
22-atroot: true
33-template:
44-slug: spindle-microvm
55-title: How the microVM engine comes together
66-subtitle: spindle has a microVM engine now!
77-date: 2026-06-16
88-image: https://assets.tangled.network/blog/seed.png
99-authors:
1010- - name: dawn
1111- email: dawn@tangled.org
1212- handle: ptr.pet
1313----
1414-1515-Since launching, [spindle](/ci) has run your CI inside Docker containers created
1616-with nixery. That's been mostly okay if you are doing simple things, but if you
1717-wanted to do anything more outside the box (maybe you wanted some services, or
1818-to build & test containers inside), or if you wanted to use Nix inside it (which
1919-is rough :P), it wouldn't meet your needs. That changes today!
2020-2121-spindle gains a microVM engine. Each workflow gets its own little virtual
2222-machine. You get a full environment inside your workflows that you can do
2323-whatever you want with without any of the roughness of nixery containers.
2424-Alongside this, you also get the ability to configure *services* that a workflow
2525-will have (on the NixOS image), so that means you can easily have postgres,
2626-Docker, and so on that will be alive through the workflow.
2727-2828-## what's in a microVM
2929-3030-A microVM is just a VM with most of the boring parts removed. There's no BIOS,
3131-no PCI bus to probe, no emulated graphics card, none of the slow legacy stuff a
3232-normal QEMU machine drags along for example. You get virtio devices and not much
3333-else, which means it boots very quickly and uses very little memory. Right now
3434-QEMU is the only runner we support, but the engine is written so that other
3535-runners (firecracker for example) can slot in later.
3636-3737-Inside the guest there's a small piece of software we call the agent. Spindle
3838-never SSHes in or runs commands "from the outside"; instead the agent dials back
3939-to spindle over vsock the moment it boots, says hello, and from then on every
4040-step of your workflow is sent to it as a message. The agent runs the command as
4141-an unprivileged user, streams stdout and stderr back, and reports the exit code.
4242-The host side of this lives in
4343-[`spindle`](https://tangled.org/tangled.org/core/tree/master/spindle/engines/microvm/agent.go)
4444-and the guest side is a little Rust binary called
4545-[`shuttle`](https://tangled.org/tangled.org/core/tree/master/shuttle).
4646-(`shuttle` implements
4747-[`agentproto`](https://tangled.org/tangled.org/core/tree/master/spindle/) which
4848-is the protocol used by `spindle`. Technically speaking anyone could implement
4949-this and, assuming side effects hold, you could have your own agent!)
5050-5151-## two kinds of images
5252-5353-There are two "flavours" of image you can boot, and they're aimed at fairly
5454-different people.
5555-5656-The first is **NixOS images**. These are the interesting ones: because the whole
5757-guest is built with Nix, you can configure it from your workflow file directly.
5858-Things like `dependencies`, `services`, `virtualisation` (e.g. Docker),
5959-`registry` and `caches` are all written right there in the YAML, and the guest
6060-agent builds and activates that config before any of your steps run. If we've
6161-built that exact base plus config before, spindle can just hand the guest a
6262-store path to realize (fetching from whatever cache `spindle` has configured)
6363-instead of rebuilding it, so the second run is quick.
6464-6565-The second is **non-NixOS images**, which today just means Alpine, but can be
6666-anything. You don't get the workflow-level NixOS config here (there's no NixOS
6767-to configure), but if Nix happens to exist inside the image, like it does in our
6868-Alpine one, it can still talk to the spindle Nix cache just fine.
6969-7070-### example nixos workflow
7171-7272-If you've used spindle before this will look familiar, it's the same manifest you
7373-already know, just with a few extra keys that the NixOS image understands. Here's
7474-a workflow that needs postgres to test against and Docker to build an image:
7575-7676-```yaml
7777-# .tangled/workflows/test.yaml
7878-engine: microvm
7979-8080-when:
8181- - event: ["push", "pull_request"]
8282- branch: ["master"]
8383-8484-image: nixos
8585-8686-dependencies:
8787- - go
8888- - github:nixos/nixpkgs#hello
8989-9090-registry:
9191- nixpkgs: github:nixos/nixpkgs/nixos-unstable
9292-9393-caches:
9494- https://nix-community.cachix.org: "nix-community.cachix.org-1:mB9FSh9qf2dCimDSUo8Zy7bkq5CX+/rkCWyvRCYg3Fs="
9595-9696-services:
9797- postgresql:
9898- enable: true
9999- ensureDatabases: ["spindle-workflow"]
100100- ensureUsers:
101101- - name: spindle-workflow
102102- ensureDBOwnership: true
103103-104104-virtualisation:
105105- docker: true
106106-107107-steps:
108108- - name: run tests
109109- environment:
110110- PGHOST: /run/postgresql
111111- command: |
112112- docker build -t app .
113113- psql -c "select 1"
114114- go test ./...
115115-```
116116-117117-`dependencies` are packages that are added to `environment.systemPackages` (so,
118118-`PATH`). A bare name like `go` is looked up in nixpkgs (same as regular
119119-spindle), but you can also point at any flake with the `flakeref#attr` syntax,
120120-so `github:nixos/nixpkgs#hello` pulls `hello` straight out of that flake.
121121-`registry` is how you remap the global refs: here we pin `nixpkgs` to
122122-`nixos-unstable`, so now the bare `go` above resolves from unstable. You can
123123-alias your own flakes the same way (`myflake: github:me/x`, then `myflake#tool`
124124-in `dependencies`). `caches` is a map of binary cache URL to its trusted public
125125-key, and they get wired into the read proxy (more on that later), so the guest
126126-can substitute prebuilt paths from them instead of building everything from
127127-scratch.
128128-129129-`services` and `virtualisation` are the interesting parts: they're passed
130130-straight through to NixOS, so anything you could write in a NixOS config you can
131131-write here. `services.postgresql.enable` brings postgres up before any of your
132132-steps run. Since steps run as the `spindle-workflow` user, naming a database
133133-after that user with `ensureDBOwnership` is the easy path to a working db -
134134-postgres peer auth maps the unix user straight to the matching role, so `psql`
135135-connects over the socket with no password and no extra setup (this name-matching
136136-is a NixOS requirement for `ensureDBOwnership`, if you want a differently named
137137-db you'd grant access yourself). `virtualisation.docker: true` is shorthand for
138138-`virtualisation.docker.enable = true`, which gets you a real Docker daemon
139139-inside the VM. By the time your first step runs, postgres is listening and the
140140-Docker socket is there, no sidecar dance, it's just part of the machine.
141141-142142-(`true` works as shorthand for `.enable = true` anywhere an `enable` option
143143-exists, so most "just turn this on" services are a one-liner!)
144144-145145-## building the images
146146-147147-Image builds are done with Nix. For NixOS we lean on
148148-[microvm.nix](https://github.com/microvm-nix/microvm.nix) and layer our own bits
149149-on top (stripping down kernel modules, configuring users, etc.). For Alpine
150150-there's a smallish Nix definition that fetches the kernel, the initrd and the
151151-kernel modules, sets up an init script that configures the machine on boot,
152152-copies in the dependencies we want (`nix`, `git`, etc.) and compresses the whole
153153-rootfs into a squashfs.
154154-155155-None of this *has* to be Nix, though. As far as spindle is concerned an image is
156156-valid as long as a few things hold: a guest agent (that implements `agentproto`)
157157-is present and gets started on boot, a `spindle-workflow` user exists, and the
158158-work directory is set up at `/workspace`. That can be built however you like.
159159-160160-## finding an image
161161-162162-Every built image ships a `spec.json` next to its artifacts. The spec is the
163163-whole contract: where the kernel and initrd and read-only store disk live, the
164164-boot args, how much memory and how many vCPUs to give it, the shell to run steps
165165-in, the writable volumes, the network interfaces, and the runner-specific knobs
166166-(machine type, CPU, extra QEMU args). NixOS images also carry a `baseConfigHash`
167167-identifying the base config baked in (this is the hash of
168168-`nixosSystem.config.system.build.toplevel.outPath`).
169169-170170-A workflow picks an image with the `image` key at the top level. The name is
171171-matched literally against what's on disk, we look for a directory called
172172-`<name>` with a `spec.json` in it, then fall back to a flat `<name>.json`. The
173173-nice property here is that resolution depends *only* on the name and what's on
174174-disk, never on the host doing the resolving, so the same workflow resolves to
175175-the same image on every spindle. If an operator keeps multiple arches side by
176176-side they can name them `nixos-x86_64`, `alpine-aarch64` and so on (that suffix
177177-is just part of the name, it's not handled specially). If you want, for example,
178178-`nixos` to work, you can just symlink `nixos` to `nixos-x86_64`.
179179-180180-Right before launch we double-check the referenced files actually exist
181181-and that the host has the tools we need: `mkfs.ext4` for the volumes, the
182182-QEMU binary for the spec's arch, `/dev/kvm` and `/dev/vhost-vsock`, plus
183183-the `ip` / `mount` / `slirp4netns` / `unshare` toolchain if the image
184184-wants networking.
185185-186186-## the life of a workflow
187187-188188-A workflow moves through a handful of stages: it gets parsed and its
189189-image resolved, it waits for a slot, it gets set up, its steps ran, and
190190-then everything is torn down.
191191-192192-The waiting bit matters a lot. Each image declares how much memory, how many
193193-vCPUs and how much disk it needs, and a workflow has to acquire a slot from a
194194-resource scheduler before anything boots. The scheduler is work-conserving with
195195-aging and per-user fairness, so one person submitting a hundred jobs won't
196196-starve everyone else, and slots don't sit idle if there's work that fits in the
197197-budget.
198198-199199-Once a slot is acquired, we do the setup. Spindle allocates a random vsock CID
200200-for the guest and registers it with the agent hub. It creates the per-workflow
201201-work directory, starts the two cache proxies (more on those later), then creates
202202-the VM: writable volumes become sparse files formatted ext4, the store disk is
203203-attached read-only, and QEMU is started with `-sandbox on`, `-nodefaults`, no
204204-display, no monitor, etc. with serial / `virtio_console` output to a log file
205205-and a QMP socket for control.
206206-207207-Then we wait for the machine. We poll QMP until QEMU says the guest is running,
208208-then wait for the agent's handshake to arrive over vsock from the CID we expect.
209209-The agent tells us its protocol and versions, and spindle sends back the job id,
210210-the trusted cache public keys and the cache proxy ports, NixOS config if already
211211-cached... From there steps run one at a time as `$shell -lc <command>`, as the
212212-unprivileged workflow user in `/workspace/repo`, with the right environment and
213213-any unlocked secrets.
214214-215215-Timeouts are cooperative: we work out a deadline from the workflow timeout
216216-and ship it to the guest, with a little grace on our side so the guest
217217-gets a chance to report the timeout itself rather than us just yanking the
218218-machine out from under it. And if the VM crashes mid-step we tail the
219219-serial and QEMU logs into the step's stderr, because "guest agent
220220-connection lost: EOF" is a genuinely useless thing to read at 2am.
221221-222222-Teardown is the same whether the workflow passed, failed or timed out:
223223-drain any pending Nix cache uploads, ask the agent to power off, wait for
224224-QEMU to exit (falling back to a QMP `system_powerdown`, and finally a
225225-kill if it's being stubborn), then close the proxies and remove the work
226226-directory.
227227-228228-## locking down the network
229229-230230-A VM that can reach the host's local network is a VM that can reach things it
231231-has no business reaching. So QEMU doesn't run in the host's network namespace at
232232-all. We `unshare` into fresh user, net and mount namespaces first. Inside that
233233-namespace a small wrapper bind-mounts a resolv.conf pointing at the slirp DNS
234234-and installs blackhole routes for every special-use IP range (RFC 6890, so
235235-private networks, link-local, loopback, etc.) before it execs QEMU.
236236-`slirp4netns` then provides outbound connectivity for that namespace, with
237237-`--disable-host-loopback`, sandbox and seccomp all on. The guest itself sits
238238-behind a *second* layer of QEMU user-mode networking inside that namespace. All
239239-of this is done without needing any privileges!
240240-241241-## budgets and cgroups
242242-243243-The scheduler's budget is bookkeeping on its own, it tracks what it's handed
244244-out, and the runner (QEMU) will ensure that a workflow only gets those. But
245245-optionally the whole thing (QEMU and slirp4netns both) gets placed in a
246246-per-workflow cgroup with memory, swap etc. limits, which is an extra
247247-enforcement layer on top. A nice side effect is when the cgroup OOM-kills the VM
248248-we can see that it was an OOM and report it as such, instead of surfacing it as
249249-a generic crash and leaving you guessing.
250250-251251-The spindle itself also gets a cgroup, which means that in a host OOM situation,
252252-it should be the workflows that die first, not the spindle itself.
253253-254254-## the nix cache, both ways
255255-256256-The two proxies I mentioned during setup are how the guest talks to spindle's
257257-Nix cache, and they run on the host so the guest never needs credentials or
258258-direct network access to do it. Like the agent, they also use vsock to
259259-communicate with the spindle.
260260-261261-The read proxy fans out to the configured substituters plus any caches you
262262-listed in your workflow, so when the guest needs to realize a store path it asks
263263-the proxy and the proxy fetches it. The request is sent concurrently to the read
264264-caches, so the one that answers it first wins.
265265-266266-The upload proxy goes the other way: any path built inside the guest gets pushed
267267-back out to spindle's Nix cache (if one is configured), so the next workflow
268268-that needs it doesn't have to build it again. Any paths that already exist on
269269-any of the configured read caches won't be uploaded. Built paths are queued by
270270-the agent and are immediately uploaded. If any paths are still left when we
271271-reach VM teardown, the workflow will wait until everything is uploaded.
272272-273273-## in the future
274274-275275-todo
276276-277277-Feel free to come and ask any questions you might have on https://chat.tangled.sh!