Introducing EmulatorKit

After spending some time hacking on Numerical Cookbook, I’ve reached enough confidence on my Ada skills to try out another, more ambitious project, which is more directly related to my long-term goal of resuming TOSP development. This week, said project has reached enough maturity for me to consider it camera-ready, so let me introduce you to my latest software experiment, EmulatorKit.

What is EmulatorKit?

EmulatorKit is a library of emulator components, designed for the purpose of testing low-level AMD64 code on AMD64 hosts. To explain how this differs from other popular emulators and virtual machines such as Bochs, QEMU and VirtualBox, I will first explain which specific challenges of OS testing would benefit from emulator support, and how EmulatorKit plans to provide better support for this than alternatives.

Basic goal: Easing the pain of OS validation

If you have ever been testing low-level code, you know that unlike (most) applications, OS kernels and drivers are not just the sum of their inputs, outputs, and internal state. The very purpose of low level code, indeed, is to interact with external hardware, altering its state and ordering it around according to an internal policy and user-mode software instructions.

This makes low level code very difficult to test because your typical test harness does not really know what your code is doing. Sure, you can investigate the inputs, outputs, and internal state of all your OS components. And you can try to overload it with bad inputs to see if it crashes, the way fuzz testers do. But unless you also have access to hardware state, what you are doing in the end is just checking that the OS code you are testing is good at internal bookkeeping, and that it can detect inconsistencies in its inputs or internal state. You are never truly testing that said code is doing its job properly.

This is where emulators should come in handy, because these can give you a (slower) software version of the hardware you are targeting, whose state you can investigate like that of any other software. This is, in fact, the key difference between the commonly accepted meanings of “emulator” and “virtual machine”: virtual machines do not usually aim to correctly emulate the guest hardware, only to reproduce its behavior in a black box manner. This in turn enables the use of more elaborate code translation techniques, or even sometimes direct native execution when host and guest architecture match, leading in turn to much better guest code performance.

In exchange for their lower guest code execution speed, what emulators bring on the table is a lot more control on the guest hardware. An emulator can throw an exception or emit a debug message anytime an invalid hardware access occurs, or anytime undefined hardware behaviour is invoked. It can warn of hardware usage patterns which are valid according to the specification, but most certainly not what the caller code intended. It can, on demand, temporarily log all access to a specific piece of hardware, which is a lot more powerful than a mere core dump in debugging situations. Basically, it can check everything which real hardware couldn’t due to performance reasons or specification limitations, and tell you about it in a centralized way.

EmulatorKit aims to help one build emulators which actually do that.

Project vision

The main intended usage of EmulatorKit is to help a low-level developer, such as me, to express his understanding of hardware specifications in code, for the purpose of

Checking that low-level code follows the hardware specification
Understanding what said low-level code is doing

The resulting emulated hardware should be suitable for integration into automated regression tests, allowing for an OS development workflow that is both safer and more comfortable than any currently available alternative, especially that of relying on the error-checking behaviour of real hardware.

A side goal, which is nice to have, but not considered absolutely necessary, is to allow for the execution of legacy code (such as VESA VBE functions) in modern CPU modes, and to investigate the operation of hardware vendor-provided binary blobs at a higher level than hand disassembly can possibly provide.

Project goal

I will consider the EmulatorKit project to succeed if it achieves the following goals :

Highly accurate emulation of the AMD64 architecture, including its CPUs, memory subsystems, and de facto standard peripherals (e.g. IOAPIC).
Easy extensibility to any piece of x86 hardware with similar emulation accuracy.
Complete error checking of all illegal forms of hardware usage, including invalid memory accesses and reliance on undefined behaviour.
Ability to log the activity of any specific piece of emulated hardware for a limited period of time.
Ability to produce alternate versions of emulated hardware with custom behavior, for example by starting up the CPU in a specific initial state or having the emulated guest memory directly map into the host address space.
Usabilty in automated testing scenarios, where an OS developer loads the test harness with an OS image and a set of test cases before going to the cofee machine.

The following features are nice to have, but not necessary for me to consider the project to be successful :

“Fast enough” emulation, that is compatible with interactive usage of the emulated hardware.
Ability to detect hardware usage which is probably unintended, even if legal, such as reading from uninitialized memory.
Integration of various non-standard debugging features for the guest OS, such as the ability to send log messages and serialized data to the host.
Limited run-time dependencies, allowing integration in an operating system for the emulation of legacy or untrusted binary code.
Compatibility with developers who do not drink coffee

Comparison to alternatives

VirtualBox

Of the alternatives listed above, VirtualBox is the one that falls most squarely in the virtual machine category. It executes code in a black box manner, directly on the host hardware if it can, in order to reproduce the expected behavior of low level code at the highest possible speed.

Because of this design, guest debugging support is minimal in VirtualBox, making this software just barely more useful than bare metal hardware execution for OS testing. In fact, making the most of VirtualBox requires modifying the guest code to improve its interface with the underlying virtual machine, which is very much the opposite of an OS developer’s usual goals.

Bochs

Unlike VirtualBox, Bochs is a true x86_64 emulator. It fully emulates the guest hardware, and can thus let you tune a great many things about said hardware before emulation, and probe a great deal of hardware state during emulation. This makes it a much better fit than your average virtual machine for OS development scenarii.

Differences between Bochs and EmulatorKit are a lot more philosophical in nature, so I expect that over the course of this project, I will have a great deal to learn from Bochs and, conversely, if this project is successful, the Bochs guys may be interested in my work too. Here are some key points where the two projects differ.

Bochs aims to emulate guest code on host CPUs which are incompatible with it, either because they run a different architecture or are too old. EmulatorKit does not have this goal, which allows its implementation to be much simpler and possibly faster as well.
Bochs is implemented in C++, whereas EmulatorKit is implemented in Ada 2012.
Bochs mostly aims for faithful guest hardware emulation, although it also features a small amount of debugging trickery such as trapping on RAM accesses at a specific address or using a nonexistenc CPU port for text output. On the other hand, being designed as an OS testing tool, EmulatorKit aims for absolute paranoia in hardware spec conformance checks, and will probably take debugging features further as well.
Bochs is designed as a standalone application, that should be configured through configuration files. However, in practice, it must be recompiled for many features to be enabled or disabled. To achieve a more consistent user experience, EmulatorKit is rather designed as a library of emulator components, that is expected to be fully configured in code and compiled into an emulator by the user.
The Bochs project has a long history of going halfway through implementing a new hardware feature, then marking it as incomplete or experimental and merging it into a release as if it were a finished component. EmulatorKit aims to enforce a stricter release policy where current hardware components should be fully implemented before work on new components is started.

QEMU

QEMU is a fascinating piece of software that defies categorization:

It tries to do a great many things, from full system emulation to user-mode native code virtualization.
It was originally designed for PC emulation, but may also emulate some MIPS or ARM systems for you today if you are lucky.
Rather than using configuration files or recompilation, it has the longest list of command line options that I have ever seen in a computer software.
It claims to have host OS independence, but you are much better off using Linux as a host as many features are Linux-specific.

Due to its very broad focus (or lack thereof), QEMU tends to share with Bochs an unfortunate tendency to release features as they are halfway implemented, then leaving them there. For example, SIMD and 64-bit support is still marked as being incomplete on x86 targets (“some support” and “limited support” respectively), despite having been available on commodity hardware for more than 10 years.

In system emulation mode, QEMU uses an extremely complex internal architecture in order to achieve near-optimal emulation speed. For example, it dynamically translates guest machine code into host machine code in a JIT design, and then uses lazy evaluation on condition code flags to reduce memory traffic. This means that modifying QEMU to achieve specific system debugging needs would be a significant challenge, as one need to take into account possible interactions with all of these performance optimizations.

While QEMU’s complexity may certainly be necessary in its target use case of emulating any architecture on any other architecture at high performance, I believe that this complexity does not make it a good fit for OS testing scenarii where correctness and control are a lot more important than speed.

Development workflow

At this point, the project is still in its early days, and I am still figuring out what I am trying to build, so I cannot commit to a stable release policy yet.

However, my end goal is to use a rolling release model, in which a new or modified emulator component cannot end up in a stable release until it has been both used in the real world and fully covered by automated tests.

Real world use tests are important, because they reveal unforeseen problems with the usability of the interface and the implementation. For example, it is usually very difficult to tell if a software component has sufficient performance until it has been tested in its intended context and usage pattern.

Automated testing is important, because it dramatically eases future modification of the code through its ability to catch all regressions that the tester could think about. It is also a very effective way to check if the emulator runs as expected on a previously unused host configuration, and if not to tell what exactly is failing.

Current status and plans

The current codebase is available online on GitHub. At this point, I have implemented a tentative memory subsystem interface, and provided a first trivial implementation of this interface based on a dynamically allocated buffer. As a next step, I plan to begin work on CPU emulation, and iteratively refine each component depending on the needs of the other one.

If I reach a satisfying CPU + memory emulation in this way, I plan to add a debugging output to the emulation, akin to Bochs’ port E9 hack, then try to resume TOSP development on top of this emulated hardware. This will be both the ultimate real-world test for EmulatorKit, and a superior testbed for TOSP.

The OS|periment

Musings on personal computer operating systems