xhyve: lightweight vm for Mac OS X

xhyve is a port of bhyve a qemu equivalent for FreeBSD to the Mac OS X Hypervisor framework. This means it has zero dependencies that are not already installed on every Mac that runs at least OS X Yosemite (10.10). The cool thing though is that Mac OS X has full control of the system all the time as no third party kernel driver hijacks the CPU when a VM is running, so the full power management that OS X provides is always in charge of everything. No more battery draining VMs \o/.

xhyve logo

xhyve is Open Source

This is really cool as everyone is able to hack it, so did I. The code is, like every bit of lowlevel C code I saw in my life, a bit complex to read and has not many comments but is very structured so you can easily find what you want to modify and do that.

The project is quite young so don't expect miracles. It has for example no graphics emulation. Running Ubuntu or FreeBSD is reduced to a serial tty and networking. If you want to run a virtual server on your Mac for development purposes it's quite perfect though.

There was one downer that got me: A virtual disk of say 30 GB has a backing file that is exactly 30 GB big even if you only store 400 MB on it. That's bad for everyone running on a SSD where space is limited as of now.

Introducing: Sparse HDD-Images for xhyve

Because the VM code is pretty small (the compiled executable of xhyve is about 230 KB) I though it might be possible for me to change this one last thing that prevented me to use xhyve on my Macbook. It turns out it is really easy to hack the virtual block device subsystem. All disk access code is neatly contained in one file: blockif.c. It is neatly separated from the virtio-block and ahci drivers.

So what I went out to do was three things:

  • Split the big disk image file into multiple segments (as for why read on)
  • Make the disk image segments only store blocks that have actual content in them (vs. storing only zeroes)
  • Make xhyve create the backing image files if they do not exist.

Splitting the disk image into segments

You may ask why, this is rather an optimization for maintaining speed and aid debugging but turned out to have the following advantages:

  • Some file systems may only allow files of a maximum size (prime example: FAT32 only allows 2GB per file)
  • Sparse image lookup tables can be filled with 32 bit values instead of defaulting to 64 bit (which saves 50% space in the lookup tables)
  • Debugging is easier as you may hexdump those smaller files on the terminal instead of loading a multi gigabyte file into the hex editor of your choice
  • Fragmentation of sparse images is reduced somewhat (probably not an issue for SSD backed files)
  • Growing disks is easy: just append a segment
  • Shrinking disks should be possible with help of the guest operating system, if it is able to clear the end of the disk of any data you could just delete a segment.

So splitting was implemented and rather easy to think of, just divide the disk read offset by the segment size and use a modulo operation to get to the in-segment-address. There's one catch: I had to revert from using preadv and pwritev to regular reads and writes. Usually you really want those v functions as they allow executing multiple read and write calls in one system call, thus beeing atomic. But these functions only work with one file descriptor and our reads probably span multiple segments and thus multiple file descriptors.

To make the thing easier and configurable I introduced two additional parameters for the block device configuration:

  • size the size of the backing file for the virtual disk
  • split the segment size. size should be a multiple of split to avoid wasting space.

You may use suffixes like k, m, g like on the RAM settings to avoid calculating the real byte sizes in your head ;)

Be aware: You may convert a disk from plain to split image either by using dd and splitting the image file (exact commands are left as an exercise to the reader) or by setting split to the old size of the image and size to a multiple of split effectively increasing the size of the disk by a multiple of the old size. New segments will be created automatically on next VM start.

Example config

Implementing sparse images

So the last step for making xhyve usable to me: Don't waste my disk space.

I think there are multiple methods for implementing efficient sparse images, I went for the following:

  • Only save sectors that contain actual data and not only zeroes
  • Minimum allocation size is one sector
  • Maintain a lookup table that references where each sector is saved in the image
  • Deallocation of a sector (e.g. overwriting with zeroes) is only handled by a shrink disk tool offline

So how does such a lookup table look?

A sparse disk lookup table is just an array of 32 bit unsigned integers, one for each sector. If you want to read sector 45 you just take the value of array position 45, multiply it by sector size and seek into the image segment to read from that address. Simple, isn't it?

In the current implementation the lookup table is written to a separate file with the extension .lut, all writes to this file are synchronous. The other backing files will be initially created as zero byte length files and when the guest os starts writing data the new sector is appended to the respective segment file and a new offset is written to the lookup table.

The lookup table starts as an array full of UINT32_MAX values (0xffffffff) as this is the marker used to describe that this sector is not yet in the image and thus should be returned as a series of zero values. If a read finds an entry other than that marker the corresponding data is read from the segment file.

All lookup tables for all segment files are appended to the .lut file, so it contains multiple tables, not just one. Positive side of this is that 32 bits of offset data map to a maximum segment size of about just under 4 TB divided by the sector size. If you use an SSD as backing storage you probably should configure your sector size to 4KB as that is the sector size of most SSDs and will result in additional performance. So this will result in a maximum segment size of about 16 PB and I never heard of a Mac that has this much storage. (If yours has please send me a photo)

Writes of new sectors (those appended to the segment file) are synchronous to avoid two sectors with the same address. Other writes are as the user configured them on the command line.

To enable sparse disk images just add sparse as a parameter to your configuration.

Sparse config example

Be aware: You'll have to recreate your disk image to profit of this setting, sparse disks are not compatible with normal disks.

Conclusion

I used this configuration:

-s 4,virtio-blk,test/hdd/hdd.img,sectorsize=4096,size=20G,split=1G,sparse

So this is how it looks on disk:

$ ls -lah
total 2785264  
drwxr-xr-x  23 dark  staff   782B Jan 16 01:27 .  
drwxr-xr-x   9 dark  staff   306B Jan 16 01:27 ..  
-rw-rw----   1 dark  staff   531M Jan 16 02:50 hdd.img.0000
-rw-rw----   1 dark  staff    12K Jan 16 01:18 hdd.img.0001
-rw-rw----   1 dark  staff   5.9M Jan 16 02:50 hdd.img.0002
-rw-rw----   1 dark  staff    24K Jan 16 01:18 hdd.img.0003
-rw-rw----   1 dark  staff   110M Jan 16 02:48 hdd.img.0004
-rw-rw----   1 dark  staff     0B Jan 16 01:11 hdd.img.0005
-rw-rw----   1 dark  staff   172M Jan 16 02:50 hdd.img.0006
-rw-rw----   1 dark  staff     0B Jan 16 01:11 hdd.img.0007
-rw-rw----   1 dark  staff   207M Jan 16 02:50 hdd.img.0008
-rw-rw----   1 dark  staff     0B Jan 16 01:11 hdd.img.0009
-rw-rw----   1 dark  staff    15M Jan 16 02:48 hdd.img.0010
-rw-rw----   1 dark  staff     0B Jan 16 01:11 hdd.img.0011
-rw-rw----   1 dark  staff    46M Jan 16 01:24 hdd.img.0012
-rw-rw----   1 dark  staff     0B Jan 16 01:11 hdd.img.0013
-rw-rw----   1 dark  staff   151M Jan 16 02:50 hdd.img.0014
-rw-rw----   1 dark  staff    12K Jan 16 01:18 hdd.img.0015
-rw-rw----   1 dark  staff    97M Jan 16 02:50 hdd.img.0016
-rw-rw----   1 dark  staff   5.3M Jan 16 02:48 hdd.img.0017
-rw-rw----   1 dark  staff     0B Jan 16 01:11 hdd.img.0018
-rw-rw----   1 dark  staff   4.0K Jan 16 01:15 hdd.img.0019
-rw-rw----   1 dark  staff    20M Jan 16 02:48 hdd.img.lut

The dump you see here is of an image where I just installed Ubuntu server 15.10.
Instead of wasting 20GB of space this one only needs about 1.3GB. Speed is about the same as before (with a SSD as backing storage) but may suffer severely on a spinning rust disk as there are way more seeks if the sectors become fragmented.

Where to get it

Currently you will have to compile yourself, just fetch the sparse-disk-image branch from https://github.com/dunkelstern/xhyve and execute make.

Johannes Schriewer

iOS and Linux developer. Tinkers with electronics and 3D-Printers, loves low-level stuff.

Augsburg, Germany, Planet Earth