blog/posts/spindle-microvm.md at afd329ee545e2cb628bc48eef8638df7fe4a5583 · tangled.org/core

atroot: true template: slug: spindle-microvm title: How the microVM engine comes together subtitle: spindle has a microVM engine now! date: 2026-06-16 image: https://assets.tangled.network/blog/seed.png authors:

name: dawn email: dawn@tangled.org handle: ptr.pet

Since launching, spindle has run your CI inside Docker containers created with nixery. That's been mostly okay if you are doing simple things, but if you wanted to do anything more outside the box (maybe you wanted some services, or to build & test containers inside), or if you wanted to use Nix inside it (which is rough :P), it wouldn't meet your needs. That changes today!

spindle gains a microVM engine. Each workflow gets its own little virtual machine. You get a full environment inside your workflows that you can do whatever you want with without any of the roughness of nixery containers. Alongside this, you also get the ability to configure services that a workflow will have (on the NixOS image), so that means you can easily have postgres, Docker, and so on that will be alive through the workflow.

what's in a microVM#

A microVM is just a VM with most of the boring parts removed. There's no BIOS, no PCI bus to probe, no emulated graphics card, none of the slow legacy stuff a normal QEMU machine drags along for example. You get virtio devices and not much else, which means it boots very quickly and uses very little memory. Right now QEMU is the only runner we support, but the engine is written so that other runners (firecracker for example) can slot in later.

Inside the guest there's a small piece of software we call the agent. Spindle never SSHes in or runs commands "from the outside"; instead the agent dials back to spindle over vsock the moment it boots, says hello, and from then on every step of your workflow is sent to it as a message. The agent runs the command as an unprivileged user, streams stdout and stderr back, and reports the exit code. The host side of this lives in spindle and the guest side is a little Rust binary called shuttle. (shuttle implements agentproto which is the protocol used by spindle. Technically speaking anyone could implement this and, assuming side effects hold, you could have your own agent!)

two kinds of images#

There are two "flavours" of image you can boot, and they're aimed at fairly different people.

The first is NixOS images. These are the interesting ones: because the whole guest is built with Nix, you can configure it from your workflow file directly. Things like dependencies, services, virtualisation (e.g. Docker), registry and caches are all written right there in the YAML, and the guest agent builds and activates that config before any of your steps run. If we've built that exact base plus config before, spindle can just hand the guest a store path to realize (fetching from whatever cache spindle has configured) instead of rebuilding it, so the second run is quick.

The second is non-NixOS images, which today just means Alpine, but can be anything. You don't get the workflow-level NixOS config here (there's no NixOS to configure), but if Nix happens to exist inside the image, like it does in our Alpine one, it can still talk to the spindle Nix cache just fine.

example nixos workflow#

If you've used spindle before this will look familiar, it's the same manifest you already know, just with a few extra keys that the NixOS image understands. Here's a workflow that needs postgres to test against and Docker to build an image:

# .tangled/workflows/test.yaml
engine: microvm

when:
  - event: ["push", "pull_request"]
    branch: ["master"]

image: nixos

dependencies:
  - go
  - github:nixos/nixpkgs#hello

registry:
  nixpkgs: github:nixos/nixpkgs/nixos-unstable

caches:
  https://nix-community.cachix.org: "nix-community.cachix.org-1:mB9FSh9qf2dCimDSUo8Zy7bkq5CX+/rkCWyvRCYg3Fs="

services:
  postgresql:
    enable: true
    ensureDatabases: ["spindle-workflow"]
    ensureUsers:
      - name: spindle-workflow
        ensureDBOwnership: true

virtualisation:
  docker: true

steps:
  - name: run tests
    environment:
      PGHOST: /run/postgresql
    command: |
      docker build -t app .
      psql -c "select 1"
      go test ./...

dependencies are packages that are added to environment.systemPackages (so, PATH). A bare name like go is looked up in nixpkgs (same as regular spindle), but you can also point at any flake with the flakeref#attr syntax, so github:nixos/nixpkgs#hello pulls hello straight out of that flake. registry is how you remap the global refs: here we pin nixpkgs to nixos-unstable, so now the bare go above resolves from unstable. You can alias your own flakes the same way (myflake: github:me/x, then myflake#tool in dependencies). caches is a map of binary cache URL to its trusted public key, and they get wired into the read proxy (more on that later), so the guest can substitute prebuilt paths from them instead of building everything from scratch.

services and virtualisation are the interesting parts: they're passed straight through to NixOS, so anything you could write in a NixOS config you can write here. services.postgresql.enable brings postgres up before any of your steps run. Since steps run as the spindle-workflow user, naming a database after that user with ensureDBOwnership is the easy path to a working db - postgres peer auth maps the unix user straight to the matching role, so psql connects over the socket with no password and no extra setup (this name-matching is a NixOS requirement for ensureDBOwnership, if you want a differently named db you'd grant access yourself). virtualisation.docker: true is shorthand for virtualisation.docker.enable = true, which gets you a real Docker daemon inside the VM. By the time your first step runs, postgres is listening and the Docker socket is there, no sidecar dance, it's just part of the machine.

(true works as shorthand for .enable = true anywhere an enable option exists, so most "just turn this on" services are a one-liner!)

building the images#

Image builds are done with Nix. For NixOS we lean on microvm.nix and layer our own bits on top (stripping down kernel modules, configuring users, etc.). For Alpine there's a smallish Nix definition that fetches the kernel, the initrd and the kernel modules, sets up an init script that configures the machine on boot, copies in the dependencies we want (nix, git, etc.) and compresses the whole rootfs into a squashfs.

None of this has to be Nix, though. As far as spindle is concerned an image is valid as long as a few things hold: a guest agent (that implements agentproto) is present and gets started on boot, a spindle-workflow user exists, and the work directory is set up at /workspace. That can be built however you like.

finding an image#

Every built image ships a spec.json next to its artifacts. The spec is the whole contract: where the kernel and initrd and read-only store disk live, the boot args, how much memory and how many vCPUs to give it, the shell to run steps in, the writable volumes, the network interfaces, and the runner-specific knobs (machine type, CPU, extra QEMU args). NixOS images also carry a baseConfigHash identifying the base config baked in (this is the hash of nixosSystem.config.system.build.toplevel.outPath).

A workflow picks an image with the image key at the top level. The name is matched literally against what's on disk, we look for a directory called <name> with a spec.json in it, then fall back to a flat <name>.json. The nice property here is that resolution depends only on the name and what's on disk, never on the host doing the resolving, so the same workflow resolves to the same image on every spindle. If an operator keeps multiple arches side by side they can name them nixos-x86_64, alpine-aarch64 and so on (that suffix is just part of the name, it's not handled specially). If you want, for example, nixos to work, you can just symlink nixos to nixos-x86_64.

Right before launch we double-check the referenced files actually exist and that the host has the tools we need: mkfs.ext4 for the volumes, the QEMU binary for the spec's arch, /dev/kvm and /dev/vhost-vsock, plus the ip / mount / slirp4netns / unshare toolchain if the image wants networking.

the life of a workflow#

A workflow moves through a handful of stages: it gets parsed and its image resolved, it waits for a slot, it gets set up, its steps ran, and then everything is torn down.

The waiting bit matters a lot. Each image declares how much memory, how many vCPUs and how much disk it needs, and a workflow has to acquire a slot from a resource scheduler before anything boots. The scheduler is work-conserving with aging and per-user fairness, so one person submitting a hundred jobs won't starve everyone else, and slots don't sit idle if there's work that fits in the budget.

Once a slot is acquired, we do the setup. Spindle allocates a random vsock CID for the guest and registers it with the agent hub. It creates the per-workflow work directory, starts the two cache proxies (more on those later), then creates the VM: writable volumes become sparse files formatted ext4, the store disk is attached read-only, and QEMU is started with -sandbox on, -nodefaults, no display, no monitor, etc. with serial / virtio_console output to a log file and a QMP socket for control.

Then we wait for the machine. We poll QMP until QEMU says the guest is running, then wait for the agent's handshake to arrive over vsock from the CID we expect. The agent tells us its protocol and versions, and spindle sends back the job id, the trusted cache public keys and the cache proxy ports, NixOS config if already cached... From there steps run one at a time as $shell -lc <command>, as the unprivileged workflow user in /workspace/repo, with the right environment and any unlocked secrets.

Timeouts are cooperative: we work out a deadline from the workflow timeout and ship it to the guest, with a little grace on our side so the guest gets a chance to report the timeout itself rather than us just yanking the machine out from under it. And if the VM crashes mid-step we tail the serial and QEMU logs into the step's stderr, because "guest agent connection lost: EOF" is a genuinely useless thing to read at 2am.

Teardown is the same whether the workflow passed, failed or timed out: drain any pending Nix cache uploads, ask the agent to power off, wait for QEMU to exit (falling back to a QMP system_powerdown, and finally a kill if it's being stubborn), then close the proxies and remove the work directory.

locking down the network#

A VM that can reach the host's local network is a VM that can reach things it has no business reaching. So QEMU doesn't run in the host's network namespace at all. We unshare into fresh user, net and mount namespaces first. Inside that namespace a small wrapper bind-mounts a resolv.conf pointing at the slirp DNS and installs blackhole routes for every special-use IP range (RFC 6890, so private networks, link-local, loopback, etc.) before it execs QEMU. slirp4netns then provides outbound connectivity for that namespace, with --disable-host-loopback, sandbox and seccomp all on. The guest itself sits behind a second layer of QEMU user-mode networking inside that namespace. All of this is done without needing any privileges!

budgets and cgroups#

The scheduler's budget is bookkeeping on its own, it tracks what it's handed out, and the runner (QEMU) will ensure that a workflow only gets those. But optionally the whole thing (QEMU and slirp4netns both) gets placed in a per-workflow cgroup with memory, swap etc. limits, which is an extra enforcement layer on top. A nice side effect is when the cgroup OOM-kills the VM we can see that it was an OOM and report it as such, instead of surfacing it as a generic crash and leaving you guessing.

The spindle itself also gets a cgroup, which means that in a host OOM situation, it should be the workflows that die first, not the spindle itself.

the nix cache, both ways#

The two proxies I mentioned during setup are how the guest talks to spindle's Nix cache, and they run on the host so the guest never needs credentials or direct network access to do it. Like the agent, they also use vsock to communicate with the spindle.

The read proxy fans out to the configured substituters plus any caches you listed in your workflow, so when the guest needs to realize a store path it asks the proxy and the proxy fetches it. The request is sent concurrently to the read caches, so the one that answers it first wins.

The upload proxy goes the other way: any path built inside the guest gets pushed back out to spindle's Nix cache (if one is configured), so the next workflow that needs it doesn't have to build it again. Any paths that already exist on any of the configured read caches won't be uploaded. Built paths are queued by the agent and are immediately uploaded. If any paths are still left when we reach VM teardown, the workflow will wait until everything is uploaded.

in the future#

todo

Feel free to come and ask any questions you might have on https://chat.tangled.sh!

Configure Feed