Deploying binary MirageOS unikernels

Written by hannes
Classified under: mirageosdeployment
Published: 2021-06-30 (last updated: 2021-07-05)

Introduction

MirageOS development focus has been a lot on tooling and the developer experience, but to accomplish our goal to "get MirageOS into production", we need to lower the barrier. This means for us to release binary unikernels. As described earlier, we received a grant for "Deploying MirageOS" from NGI Pointer to work on the required infrastructure. This is joint work with Reynir.

We provide at builds.robur.coop binary unikernel images (and supplementary software). Doing binary releases of MirageOS unikernels is challenging in two aspects: firstly to be useful for everyone, a binary unikernel should not contain any configuration (such as private keys, certificates, etc.). Secondly, the binaries should be reproducible. This is crucial for security; everyone can reproduce the exact same binary and verify that our build service did only use the sources. No malware or backdoors included.

This post describes how you can deploy MirageOS unikernels without compiling it from source, then dives into the two issues outlined above - configuration and reproducibility - and finally describes how to setup your own reproducible build infrastructure for MirageOS, and how to bootstrap it.

Deploying MirageOS unikernels from binary

To execute a MirageOS unikernel, apart from a hypervisor (Xen/KVM/Muen), a tender (responsible for allocating host system resources and passing these to the unikernel) is needed. Using virtio, this is conventionally done with qemu on Linux, but its code size (and attack surface) is huge. For MirageOS, we develop Solo5, a minimal tender. It supports hvt - hardware virtualization (Linux KVM, FreeBSD BHyve, OpenBSD VMM), spt - sandboxed process (a tight seccomp ruleset (only a handful of system calls allowed, no hardware virtualization needed), Linux only). Apart from that, muen (a hypervisor developed in Ada), virtio (for some cloud deployments), and xen (PVHv2 or Qubes 4.0) - read more. We deploy our unikernels as hvt with FreeBSD BHyve as hypervisor.

On builds.robur.coop, next to the unikernel images, solo5-hvt packages (FreeBSD 12.2, Ubuntu 20.04) are provided - download the binary and install it. A NixOS package is already available - please note that soon packaging will be much easier (and we will work on packages merged into distributions).

When the tender is installed, download a unikernel image (e.g. the traceroute described in an earlier post), and execute it:

$ solo5-hvt --net:service=tap0 -- traceroute.hvt --ipv4=10.0.42.2/24 --ipv4-gateway=10.0.42.1

If you plan to orchestrate MirageOS unikernels, you may be interested in albatross - we provide binary packages as well for this (albatross FreeBSD and albatross Ubuntu 20.04). An upcoming post will go into further details of how to setup albatross.

MirageOS configuration

A MirageOS unikernel has a specific purpose - composed of OCaml libraries - selected at compile time, which allows to only embed the required pieces. This reduces the attack surface drastically. At the same time, to be widely useful to multiple organisations, no configuration data must be embedded into the unikernel.

Early MirageOS unikernels such as mirage-www embed content (blog posts, ..) and TLS certificates and private keys in the binary (using crunch). The Qubes firewall (read the blog post by Thomas for more information) used to include the firewall rules until v0.6 in the binary, since v0.7 the rules are read dynamically from QubesDB. This is big usability improvement.

We have several possibilities to provide configuration information in MirageOS, on the one hand via boot parameters (can be pre-filled at development time, and further refined at configuration time, but those passed at boot time take precedence). Boot parameters have a length limitation.

Another option is to use a block device - where the TLS reverse proxy stores the configuration, modifiable via a TCP control socket (authentication using a shared hmac secret).

Several other unikernels, such as this website and our CalDAV server, store the content in a remote git repository. The git URI and credentials (private key seed, host key fingerprint) are passed via boot parameter.

Finally, another option that we take advantage of is to introduce a post-link step that rewrites the binary to embed configuration. The tool caravan developed by Romain that does this rewrite is used by our openvpn router (binary).

In the future, some configuration information - such as monitoring system, syslog sink, IP addresses - may be done via DHCP on one of the private network interfaces - this would mean that the DHCP server has some global configuration option, and the unikernels no longer require that many boot parameters. Another option we want to investigate is where the tender shares a file as read-only memory-mapped region from the host system to the guest system - but this is tricky considering all targets above (especially virtio and muen).

Behind the scenes: reproducible builds

To provide a high level of assurance and trust, if you distribute binaries in 2021, you should have a recipe how they can be reproduced in a bit-by-bit identical way. This way, different organisations can run builders and rebuilders, and a user can decide to only use a binary if it has been reproduced by multiple organisations in different jurisdictions using different physical machines - to avoid malware being embedded in the binary.

For a reproduction to be successful, you need to collect the checksums of all sources that contributed to the built, together with other things (host system packages, environment variables, etc.). Of course, you can record the entire OS and sources as a tarball (or file system snapshot) and distribute that - but this may be suboptimal in terms of bandwidth requirements.

With opam, we already have precise tracking which opam packages are used, and since opam 2.1 the opam switch export includes extra-files (patches) and records the VCS version. Based on this functionality, orb, an alternative command line application using the opam-client library, can be used to collect (a) the switch export, (b) host system packages, and (c) the environment variables. Only required environment variables are kept, all others are unset while conducting a build. The only required environment variables are PATH (sanitized with an allow list, /bin, /sbin, with /usr, /usr/local, and /opt prefixes), and HOME. To enable Debian's apt to install packages, DEBIAN_FRONTEND is set to noninteractive. The SWITCH_PATH is recorded to allow orb to use the same path during a rebuild. The SOURCE_DATE_EPOCH is set to enable tools that record a timestamp to use a static one. The OS* variables are only used for recording the host OS and version.

The goal of reproducible builds can certainly be achieved in several ways, including to store all sources and used executables in a huge tarball (or docker container), which is preserved for rebuilders. The question of minimal trusted computing base and how such a container could be rebuild from sources in reproducible way are open.

The opam-repository is a community repository, where packages are released to on a daily basis by a lot of OCaml developers. Package dependencies usually only use lower bounds of other packages, and the continuous integration system of the opam repository takes care that upon API changes all reverse dependencies include the right upper bounds. Using the head commit of opam-repository usually leads to a working package universe.

For our MirageOS unikernels, we don't want to stay behind with ancient versions of libraries. That's why our automated building is done on a daily basis with the head commit of opam-repository. Since our unikernels are not part of the main opam repository (they include the configuration information which target to use, e.g. hvt), and we occasionally development versions of opam packages, we use the unikernel-repo as overlay.

If no dependent package got a new release, the resulting binary has the same checksum. If any dependency was released with a newer release, this is picked up, and eventually the checksum changes.

Each unikernel (and non-unikernel) job (e.g. dns-primary outputs some artifacts:

  • the binary image (in bin/, unikernel image, OS package)
  • the build-environment containing the environment variables used for this build
  • the system-packages containing all packages installed on the host system
  • the opam-switch that contains all opam packages, including git commit or tarball with checksum, and potentially extra patches, used for this build
  • a job script and console output

To reproduce such a built, you need to get the same operating system (OS, OS_FAMILY, OS_DISTRIBUTION, OS_VERSION in build-environment), the same set of system packages, and then you can orb rebuild which sets the environment variables and installs the opam packages from the opam-switch.

You can browse the different builds, and if there are checksum changes, you can browse to a diff between the opam switches to reason whether the checksum change was intentional (e.g. here the checksum of the unikernel changed when the x509 library was updated).

The opam reproducible build infrastructure is driven by:

These tools are themselves reproducible, and built on a daily basis. The infrastructure executing the build jobs installs the most recent packages of orb and builder before conducting a build. This means that our build infrastructure is reproducible as well, and uses the latest code when it is released.

Conclusion

Thanks to NGI funding we now have reproducible MirageOS binary builds available at builds.robur.coop. The underlying infrastructure is reproducible, available for multiple platforms (Ubuntu using docker, FreeBSD using jails), and can be easily bootstrapped from source (once you have OCaml and opam working, getting builder and orb should be easy). All components are open source software, mostly with permissive licenses.

We also have an index over sha-256 checksum of binaries - in the case you find a running unikernel image where you forgot which exact packages were used, you can do a reverse lookup.

We are aware that the web interface can be improved (PRs welcome). We will also work on the rebuilder setup and run some rebuilds.

Please reach out to us (at team AT robur DOT coop) if you have feedback and suggestions.