A (wide) look at the process abstraction, part 1

So the other day, I’ve decided to go back to kernel implementation. But then I found out that I couldn’t go very far, because I didn’t know where I was heading. Therefore, I decided to go back to the design table (which in this case is actually a notepad and a blue Bic pen that I can both carry around, but whatever), and take a wider look at what this kernel project is about, where I am, and where I want to go.

Design

Kernel-level goals

A core design goal of the microkernel which I’m currently working on is to make software run in an isolated environment, called a process, in which it may only interact with other software in an opt-in fashion that is partially under the user’s control. Well, within the boundaries of hardware capabilities of course.

This organization benefits to three actors : users, application developers, and OS developers. Users enjoy the extra stability and security that weak cross-software interactions bring, and the increased control on what software is doing. Application developers enjoy the convenience of not having to care much about other software when designing their own product. OS developers enjoy the fine-grain control which putting software in isolated processes bring (it’s easier to act on independent processes than on a blurry mess of intertwined threads).

My kernel only implements the minimal isolation required to make software run independently from each other, that is isolation at the CPU and RAM level. For full process isolation, independent processes also have be isolated from each other with respect to other hardware and higher-level system services such as GUI, networking, or the filesystem. But that’s not the job of an OS’ kernel in my opinion. A kernel is a vital component of the OS, which as such should be kept as lean as possible to decrease the probability of failure. Implementing higher-level system services, and the isolation of other processes with respect to them, should rather be done by disposable user-mode system software. The kernel itself should only have an extensible design which allow such services to easily add their own contribution to the process abstraction, not add it itself.

In this micro-kernel model, the kernel should make sure that software share without any harmful interaction…

  • CPU state (registers, stack pointer…)
  • CPU time
  • Statically and dynamically allocated RAM
  • I/O ports of the CPU
  • Interrupts
The role of threads

For a very long time, personal computers have needed to provide an illusion of simultaneous software execution on single-core CPUs, for uses of such multitasking behavior are many. More recently, as the CPU industry has hit a technological wall and realized that single-core CPUs wouldn’t get much further, we have gotten computers with several independent CPU cores that are truly capable of running independent tasks in parallel. In this context, allowing software to easily execute several pieces of code at the same time, without using a full-blown and sometimes cumbersome process abstraction, became an OS design concern of prime importance. And thus threads, which have an independent CPU state and share CPU time but do not have strong isolation from each other, were born.

Modern operating systems combine processes and threads by encapsulating threads inside of processes. A process can have several, one, or zero threads running at a given time. The first option is perfect for software which can benefit for multi-core CPUs (such as games or video editing), the second option suits the vast majority of software, and the third option, which is alas not widely supported by current OSs, allows system services to be listening to incoming events without polluting the kernel’s thread database with permanently blocked or polling threads.

Inter-Process Communication through RPC

Although application processes are often very well-pleased with being isolated from each other, they will frequently need to interact with system services. Basic interaction scheme is, application software gives orders, and system services check if the process has the right to demand that. If so, the request is processed, if not, the caller process is notified through an error notification channel, or if none is available is killed instantly in a cold-blooded fashion.

In the end, since this happens frequently, it would be nice if there was a simple way for a process A to request something from a process B in a fashion that’s as familiar and almost as efficient as a library function call. This is the purpose of the Remote Procedure Call inter-process communication primitive, which is why I have spent much time checking the viability of that one. Results of – admittedly rough – simulations have been satisfactory so far (also read the comment thread, it is quite interesting), with call latencies that are negligible compared to the 10ms time frame within which any user-software interaction has to complete in order to feel perfectly smooth.

Implementation

Kernel code structure

In my OS, kernel code is chosen as what cannot run within the boundaries of an isolated process, or which would gain no significant benefit from doing so (if a scheduler that is put outside of the kernel runs amok, the whole OS still crashes). The fact that kernel features must be implemented in a single large address space, however, shouldn’t be an excuse for coding them as a huge entangled mess of interdependent spaghetti code. Kernel features, whenever possible, should be implemented in code units that are small, and largely independent from each other.

This philosophy obviously applies to the various elements of process isolation, which is one of the core tasks of the kernel. In my opinion, these should be implemented separately, and only loosely be brought together in a single “process manager” class, which would take care of the tasks which require full knowledge of a process, such as spawning or killing it.

Elementary isolation blocks

The various elements of process isolation which I currently consider at the kernel level are :

  • CPU state and CPU time : These will be isolated through the thread abstraction, each process holding a number of concurrently executing threads.
  • Statically and dynamically allocated RAM : This is already done. These kernel components take care of physical memory distribution and keep track of the virtual address space of each process. Also, the kernel has a linked list based memory allocator for internal use. I don’t know yet if I’ll allow user space processes to use it or if they’ll have to use a library-based memory allocator like on other OSs, although I tend to lean towards the former because it means less code duplication.
  • I/O Ports : On architectures where access to the CPU’s I/O ports can be monitored on a per-process basis, such as x86, this kernel component will take care of allowing and denying each process access to I/O ports.
  • Interrupts : Since interrupts are, by definition, not initiated by software, unlike memory allocation or I/O transactions, I believe they should be treated in a different manner. This kernel component will hold an internal interrupt table which associates each hardware interrupt with one or several RPC calls. When the interrupt bell rings, the RPC calls are made, and driver processes take care of the thing from that point. An exception will be clock interrupts, which are internally managed by the kernel.
  • RPC : This could be the main non-bookkeeping task of a “process manager” kernel component, since it is not about any hardware but solely about processes. This kernel service would keep track of each “server” process’ RPC entry points and each “client” process’ active RPC connections.
Other system services

As said before, everything process-related that is not about CPU and RAM isolation is not the job of the kernel. All higher-level system services should be implemented as user mode, sandboxed processes. But to this end, the kernel must provide a basic API for system services, so that they may bring their own contribution to the process abstraction. Said API would noticeably allow…

  • Storing the full data about a process’ access permission in a single, consistent place.
  • Making external system services document their access permissions in a human-readable way, so that users know what’s going on when the system asks them to confirm that they give a process access to something.
  • Transmitting granted access permissions to the relevant services when the process is loaded.
  • Notifying said services when the process is killed, so that they may invalidate and discard their data about it

As an example, a process’ access permission could be stored in a configuration file that takes this kind of shape (sorry for the lost indentation, WordPress hates me today) :

***Built for process API v1***

Access permissions: Kernel {
IOPorts:
2,3,4R #RW access to ports 2 and 3, read-only access to port 4
Interrupts:
3 #Process may set up an interrupt handler for interrupt 3
}

Access permissions: FileSystem {
FolderAccess:
/system, R # Process may read inside of /system and its subfolders
/system/boot, RW
}

There would be one of such files within a privileged application’s private folder, to specify what permissions the application requires, and one of them per installed application within the kernel’s super-secret system databases. Each time software is to be loaded, the kernel would compares these two files. If they mismatch, and if new access permissions have been added to the application’s requests, the user would be asked to confirm that it grants them. As a performance optimization, filesystem timestamps could be used to avoid unnecessary checks when the application’s requested permissions has not changed (in which case the internal OS copy of the permissions would be used directly).

System services would be free to document their access permissions the way they want within this file, as the kernel would pass that information without reading it. They would, however, need to be able to provide a human-readable version, on demand of the kernel, when an access permission request is to be displayed to the user.

With this last part, I believe I’ve pretty much covered the rough lines of process abstraction implementation, the kernel components that have to be implemented, and their respective roles. Now, for those who are interested, I also offer one last part on a slightly unrelated subject that I believe has a critical importance : task prioritization, why it doesn’t work well in today’s operating systems, and how it could work in an OS based on the RPC system-application communication primitive.

Appendix : RPC and task prioritization

Hardware resources are sometimes scarce, and as such task prioritization is a critical concern for any self-respecting modern OS. On personal computers, it generally boils down to putting user-computer interaction far above everything else. Foreground tasks, which directly interact with the user through visuals, sounds, vibrations, or any other form of HMI, should should never see their performance altered by background tasks, which the user is not focused on and has only vague awareness of (e.g. backing up a hard drive, ripping a DVD, seeding a liveCD).

If task prioritization is not done properly, the perceived computer responsiveness ends up being strongly dependent on the amount of simultaneously running tasks. As soon as the computer becomes a bit busy, it will end up feeling more and more sluggish. This lead to an unsatisfied user, and also effects the margin of maneuver of OS developers, because features which need background processing end up ruining the user experience. Also, although I have so far mostly talked about performance, something must also be said about power management. If tasks cannot be prioritized with respect to each other, and in particular separated in “foreground” and “background” ones, it is very difficult to apply aggressive forms of power usage optimization, such as slowing down or entirely shutting down some expensive background tasks when on battery. The result is worse battery life, so again an unhappy user.

In traditional OS designs, task prioritization is typically done by putting a priority on users, processes, and threads. The idea is that users get a fair share of time, that a user’s share of time is fairly distributed among his running processes, and that threads from a given process fairly share its available execution time. Now, it may not be obvious at first sight, but this design breaks down as soon as one attempts to separate system services from user application code through process boundaries, as should always be done. Here is why : if one increases the priority of an application’s process without increasing the priority of every system service which it depends on (an operation that is quite cumbersome), every request which the application makes to the OS will be processed at a low speed with no performance gain. Worse yet, system services will run less often in proportion of total available CPU time, so in some case the application may actually be slowed down ! Increasing the priority of system services themselves is not a good idea either, because it will just increase the speed at which every transaction with this service is processed, not the ones of our application in particular. So, what do we do ?

For a very long time, the answer of OS manufacturers to this has basically been : “We don’t know how to prioritize tasks properly, you should probably rather get faster hardware or give up on useful features”. From time to time, one OS came up with a clever trick to make some specific tasks run faster, such as accelerating GUI operations with GPU power (since no one but games and advanced image editors use that chip, system code is always alone on it and therefore runs perfectly). But overall, the consumerist way of handwaving task prioritization problems with “Your computer is too old” statements remained dominant.

It is my hope that as CPU manufacturers reach the limits of silicon chip performance and power management becomes a prime concern, more love will go into task prioritization and its applications to better user reactivity and lower power consumption. In meantime, since I consider the planned obsolescence of computer hardware advocated by Microsoft, Apple, and others to be a shame and a waste of natural resources, here is my take on the subject.

The core issue with the task prioritization model above is that it doesn’t take into account the notion of responsibility. Who are the system services working for ? What process does current kernel work benefit to ? We need to answer this kind of questions if we want to optimize the performance of a full application, and not just that of the specific code paths which reside within its main process’ boundaries. My proposal is an alteration to the RPC and system call mechanisms, present as a default but which may be disabled if needed, that would make the system’s scheduler consider the resulting threads as threads from the caller process. A thread would no longer have a single parent process which defines most of its properties, but also have a “process in charge” or “scheduler parent”, which is the one that matters for all priority-related decisions. To rephrase it more simply : if process A requests something from process B using RPC, the thread that will be spawned in process B will be isolated within the process boundaries of B with respect to most system services, but scheduled as a thread of process A. This way, all work which process A is responsible for, and nothing else, will run faster when A’s priority is increased. And therefore it becomes actually possible to efficiently prioritize one user or system task above others.

3 thoughts on “A (wide) look at the process abstraction, part 1

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s