So, in this OS, we have decided to let the kernel manage the CPU and its RAM and to leave the rest to user-mode drivers. From the point of view of OS development, these two pieces of hardware spread in a range of hardware resources : RAM chunks and their virtual address translations, privileged CPU instructions, CPU time, incoming interrupts, and bidirectional communication with the rest of the hardware. Let’s discuss how that last resource should be handled !
A quick overview of the IO landscape
From the point of view of operating systems, a CPU can directly communicate with external chips using two distinct methods : either some CPU pins are directly connected to the target chip, and exchanging data is done using special CPU instructions (IO ports), or a controller on the memory bus intercepts some RAM access requests and redirect them to the hardware which acts as if it was a chunk of RAM located at a non-existent memory address (Memory-mapped IO).
Hardware engineers hate IO ports, because they make things more difficult for them : more wires around the CPU on the motherboard, more CPU instructions, more CPU pins… IO ports also have their drawbacks for OS and driver developers : since instructions that send and receive data must be explicitly run by the CPU, typically a few bytes at a time and in a blocking fashion, port-based communication is at the same time slow, multitasking-unfriendly, and energy-inefficient. For these reasons, outside of the embedded world and of legacy (but important) x86 features, IO ports tend to become a thing of the past in favor of memory-mapped IO.
Indeed, once external hardware’s registers may be accessed through the memory bus like normal RAM, it becomes easy to to introduce so called “Direct Memory Access” (DMA) techniques in which an external controller performs memory-to-hardware and hardware-to-memory data copying while the CPU sits idle or doing other things, except for sending infrequent orders to the DMA controller and processing an interrupt from time to time. This solves the speed and CPU-hogging issues of port-based IO. However, all is not good in heaven, and memory-mapped IO has its fair share of issues, arguably transferring the pain of implementing of port-based IO in hardware into a pain of supporting memory-mapped IO in software.
Why memory-mapped IO hates developers
RAM is slow with respect to the speed at which a CPU executes code, and having software wait for RAM on every MOV is a waste of CPU time of epic proportions. This is why modern compilers and CPU architectures do their best to avoid this outcome, using such tricks as multilevel caches, out of order instruction execution and pipelining. Software writing to RAM does not need to know about the intricacies of these performance optimizations, as CPU hardware make sure that future RAM reads on a recently written memory region give the right result no matter what is actually written on the physical RAM cell at that moment. However, these mechanisms do not work in the two important scenarios :
- When external hardware attempts to access RAM cells, such as in DMA data transfers.
- When software does not write to RAM but in a memory-mapped hardware register, which is not only read by CPU-driven code.
In short, as soon as memory-mapped IO is coupled with modern computer hardware and compilers, extreme care must be taken when writing OS and driver code that uses it. The “volatile” keyword of the C language family must be spammed everywhere in the code, along with manual CPU cache control and cache flushing instructions. Forgetting a single one of these leads to extremely weird and hardly reproducible bugs.
What’s more, memory-mapped IO often doesn’t even bother to behave like real RAM. As an example, it is common to find memory-mapped registers on which only MOVs of a certain integer size work, requiring OS developers to use assembly instructions in a place where one would spontaneously just use regular pointers. Documentation is rarely extremely clear on those matters, and failure to write code that follows the manufacturer’s intent leads to pretty nasty undefined behavior, with the untraceable bugs that come with it. Fun times.
Overall, though, no matter how bad it gets sometimes, memory-mapped IO seems to be here to stay as the main method of bidirectional IO on current and future CPU architectures. So it is time to move on from here, and explain how the kernel is going to manage all these IO mechanisms on its side.
Kernel design considerations
Globally, as mentioned earlier, IO ports are extremely straightforward to manage in software. There is just one trick, though, and that is process isolation. Ideally, the driver of a given piece of hardware, which is user-mode software, would be given access to the IO ports that are connected to its device, and would not be able to access other IO ports. In practice, however, three situations may be envisioned :
- Hardware offers this kind of fine-grained per-port access restrictions
- Hardware can only let user-mode software access all IO ports or none
- Hardware only lets kernel-mode software access IO ports
In addition to these hardware protection mechanisms, it is also possible to envision putting port access restrictions in the kernel, in the sense that software would have to go through the kernel for any port-based data transfer. However, I tend to despise this sort of software-based isolation, as it tends to get extremely slow, put bloat in kernel code, and more generally add one more critical piece of code that can break in the OS. As for speed, thinking of asking the kernel to do bulk data transfers ? Think again. There is no such thing as a universal port-based bulk data transfer protocol, and each peripheral has its own fantasies such as timing to respect between IO bursts or regular feedback to be acknowledged.
For this reason, I believe that the following philosophy should be adopted : per-port IO access controls are considered the best-case scenario, and the OS is built to support them. In case they are not present and user-mode software can only get all or nothing IO port access, it is silently given access to all IO ports with the hope that it will do no harm (after-all, it is a top-level driver which should be relatively trusted). CPU architectures which don’t let user-mode software access IO ports obviously don’t care about microkernel-based OSs the tiniest bit so I don’t see why I should care about them either : they would be unsupported in this scheme.
In a way, the kernel already has primitive support for memory-mapped IO, but this support is both insufficiently powerful and unnecessarily ugly. What the current codebase does is to let the kernel give user-mode software access to the reserved memory regions declared by the computer firmware (BIOS, EFI…), through a process similar to that used for dynamic RAM allocation. There are many major problems with this approach :
- Not every memory-mapped hardware is associated to a firmware-known reserved memory region
- Drivers cannot adjust the processor’s caching policies and must as such run cache flushing instructions constantly, which is both cumbersome and error-prone
- Finally, this methods puts the hassle of managing memory-mapped IO on the memory manager, which is already a relatively crowded kernel component
For these reasons, I believe that this code should be flushed out in favor of a new system where the memory manager ignores reserved memory regions and where memory-mapped IO is fully managed by a “IO manager” kernel component. Said component would have to perform the following tasks :
- Let user-mode software access a specific range of physical memory addresses, even if they seem nonexistent, so that they can manipulate memory-mapped registers. Also provide a reciprocal operation.
- Make this access mutually exclusive unless sharing of the memory-mapped IO range is explicitly asked for.
- Let software learn about the set of available CPU caching policies, and apply the most suitable one to its range of memory-mapped addresses at allocation time.
- On architectures where caching policies cannot be set up by the kernel and where manual cache flushing is not accessible to unprivileged software, an instruction is provided to have the kernel perform cache flushing on its own. On other architectures, this instruction is a no-op, and it user-mode interface may either no-op or perform manual cache flushing in order to avoid the hassle of an unneeded system call.
I am confident that if all of this functionality was provided, software based on memory-mapped IO would be as painless to develop as possible, even though this would still not be so painless since modern optimizing compiler still require volatile keyword spam in order to generate a code that acts on memory-mapped IO as expected.
To conclude, one last question must be tackled : is it the job of the kernel to manage DMA protocols ? If I ask the question, you may guess that I tend to lean towards a “no”. Because while it may be relevant to develop a standard DMA interface, DMA implementations are always provided by some external chip, typically a bus controller, and as such are best left to the driver of that chip.
For a longer explanation, let’s consider some features that a good DMA manager should provide, according to the renowned hobby OSdever Brendan :
- Appropriate barriers should be used to avoid conccurent access to a single memory region by hardware and software
- IOMMUs (which are bus-specific) should be set up to prevent malicious or buggy driver code from accessing unauthorized memory region through DMA
- When DMA-aware drivers crash, it should be possible to either prevent the associated RAM from being freed or stop the DMA transfer
All of these features require intimate knowledge of the DMA technology at work from implementation. However, DMA itself is not a standard technology, so the kernel must always rely on a bus driver to have such knowledge. This means that managing DMA in the kernel would mean constant round trips between hardware drivers, kernel code, and bus drivers, all that for little security benefits since in the end it would always be the bus driver that can do whatever it wants. If all that matters is a standard interface, it can be devised without requiring any extra kernel code.
One interesting lesson to take from these guidelines, though, is that any driver that operates a DMA controller should be considered critical from a security point of view, and preferably put under partial control of the core OS development team.
I hope that this little post has helped making it clearer what are the challenges at work with directly CPU-driven IO, what a kernel-driven IO manager would have to deal with, and why DMA, although an important part of modern IO, should not be part of the kernel IO code in my opinion. And that being said, this is it for now : thank you for reading !