System design 1 – Core capabilities

Three of our goals (adaptability, security, and reliability), require our operating system to favor an approach to programming based on lots small and independently managed programs that work together. This has some consequences on other parts of its design than the sole processes, as we’ll explain below. 


Let’s first consider the following task in a world of independent programs : a user clicks on a button whose sole function is to hide itself. Sounds simple. It would work something close to the following way (each blue box being an independent program) :

The mouse click on the button is sent in the form of a cryptic message (1) to a process called the mouse driver, which decodes it and interprets it as a click. A higher level of abstraction may be required, especially if several mouses or a multi-touch screen are in use, through a “pointer manager” to which the click is redirected (2). Whether this happens or not, a process managing the UI of all programs is finally informed (3) that a click occurred at a given position of the screen. Using some internal logic, it determines which program is concerned by the click, and that a button was clicked. It then finds in which way the program wanted to be informed that someone clicked on this button, and informs it as required (4). The process execute the planned instruction, hiding the button. But it can’t do it all by itself, it tells the UI manager process to do so (5). The UI manager determines which effect hiding the button will have on the window of the program (showing what’s behind), and tells a drawing process to display said effect on the screen (6). The drawing process turns what the UI manager said in simple, generic graphic instructions (e.g. OpenGL commands) that are transmitted to a graphics driver (7), whose job is to communicate with the graphic hardware. This driver turns the hardware-independent instructions into model-specific cryptic instructions, that are then sent to the hardware so that the screen look changes in order to reflect the disappearance of the button (8).

As we can see, even the simplest GUI manipulations involve an important amount of work. This emphasizes the importance of using isolated and independent programs : a single program doing all this would be huge, which is a synonym for slow, buggy, and hardly modifiable. We also see that our multi-programs design will require, for performance reasons, that communication between processes was fast.


But now is the time to say a sad thing : processes don’t actually exist in the hardware. Hardware is stupid and near-sighted, in order to be inexpensive. What usual computer processors can actually do is something like the following :

  • Core vs user applications separation : The processor can run in “kernel mode”, where everything is possible, as required by some parts of the operating system, and in a restricted “user mode” that is more suitable for most applications as it introduces several restrictions in their capabilities that prevent them from breaking everything when they run amok, as we’ll see below. It should be clear that for this to be efficient, the part of the software running in kernel mode should be as limited as possible.
  • Crunching numbers : That’s what programs do in a large part of their time.
  • Reading and writing data in main memory : Actually, the hardware is a little smarter than that : it may make a program work in a subset of main memory while restricting its use of the rest of it (not being able to write in it, or even to access it at all) by reporting such demands to kernel-mode software.
  • Offering access to chipsets and peripherals in a very primitive way : Only kernel-mode software may do that.
  • Running specific instructions when some part of the hardware or the software requests attention : This “interruption” mechanism is very handy in uncountable amounts of cases, and only kernel-mode software may change which instructions are run.

And that’s about all. This means that we’ll have to write software for all the rest. To meet all of our goals, the core of the operating system, managing the process abstraction and direct access to the hardware, will have to provide the following capabilities :

  • Process/thread management and scheduling : Isolating applications from each other in memory (making them become what’s called a process) and running several parts of a process simultaneously (each part being called a thread) requires some work from the core. In particular, “fake” simultaneity often has to be enforced by quickly switching from one task to another. In that case, the order in which tasks are run and the switching time are determined using some rules, a concept called scheduling that has to be implemented at the core of the system.
  • Resource allocation and hardware-process communication : Sharing memory and other hardware resources between several processes is generally an unwanted behavior, as it makes programming harder, even though in some cases it might be convenient. Therefore, such a situation should only be possible if all involved processes are wanting it. This result can be achieved by making all hardware resources managed by the core of the operating system and then give (or allocate) them to programs in a way that no unwanted sharing may occur.
    It may also happen that the hardware has something to say (e.g. “HDD full” or “Operation complete”, but in cyphered hardware language). For the same reason as above, the operating system will take care of every hardware message, and then transmit it to the process managing that hardware through interprocess communication techniques (see below).
  • Interprocess communication : The processes should be able to communicate with each other and work together. In this particular operating system, as it is a very important issue, we choose to implement several mechanism, some faster than others, in order to give developers much freedom and favor multi-process programming. If some are not used, we’ll still be able to remove them later.
    • Signals : Dumbest but fastest form of interprocess communication. Related to hardware interrupts, this feature triggers the launch of a planned action from the distant process, with possibility of the OS providing “generic” actions when the receiver did not plan to receive that signal.
    • Sending and receiving data : Related to UNIX-ish pipes, this feature allows a process to have a number of sockets in order to send or receive data to or from another process. It’s worth noting that said programs must be explicitly waiting for that data, for example by defining a function that’s triggered upon receiving data, otherwise it will just accumulate in a system-provided buffer and be eventually lost. Each program has three default sockets for legacy reasons : standard text input, standard text output, error (text) output.
    • Shared memory : Sending and receiving data requires making a copy of it. For large data, it is unpleasant and generally unwanted. Moreover, it is sometimes interesting to keep a pack of data shared between multiple applications, so that when one modifies the data, the other one sees those modifications right away. It’s also useful for reliability purposes : if a process crashes, the other keeps the shared data safely. There should hence be a way to make a region of memory actually shared by two programs. Sharing would be initiated by the program owning the data, and result in the other one receiving a pointer that, upon use, triggers the actual sharing process and makes the shared part of the other process’s memory become a part of his memory too.
    • Distant procedure call : DPC is when a program calls a function from another program. It can be implemented either statically (like using shared libraries, the program look for its sibling at launch time and launches it if needed, then the shared functions between processes become effectively shared between the processes) or dynamically (somewhere in its run, the program instructs the system to find another process and run a “public” function from it). Dynamic RPC is slower than static RPC but induces more flexibility and robustness.
    • Synchronization : This is especially important when shared memory is around the corner. There should be a way of preventing multiple processes from doing certain things, as an example writing somewhere in memory all at once. This is called mutual exclusion and involves making sure that no more than a fixed number of process (generally one) may do something at the same time.
      Another useful way of synchronizing processes and threads is what’s called barriers : they’re basically about waiting for several processes or threads to have completed a task before moving to something else. It’s as an example useful when a task is divided in multiple smaller tasks that are distributed along multiple cores of a processor.
    • Capabilities : All the forms of IPC first involve to *find* a process to communicate with. This may work using an unique identifier for each running process (the PID), the process’s file name, but also a using “capability” based system that allows a process to be found based on what it does. This has not been implemented on a wide scale as far as I know, and needs careful testing as it may affect performance, but has potential interesting applications in the adaptability area.
  • One more though : if we want to make programming multiple processors an easy task, we must help the developer in this task. Practically, we’ll do this by introducing the possibility of making all interprocess communication running commands from a distant process create a new thread in order to run that command, a system called “pop-up threads”. This behavior reduces speed on mono-processor systems, but introduces a higher amount of parallelism that may permit much higher speed on multi-processor systems. Testing will tell if the performance footprint remains or if this capability overcomes it.

  • Permissions : Another thing that should be managed about processes is what they have the right to do at a core level. Enforcing separation between kernel-mode software and user-mode software is of little use if user-mode software can ask anything from kernel-mode software. Hence, for each behavior above that may affect system stability and security, there should be a way to tell if the software has the right to do it and to what extent.
  • Live updating and backing up of processes : These are more high-level feature. They are only here because they affect process management.
    Keeping programs up to date is necessary for reliability and security purposes. Until now, updating was implemented fully through user applications in desktop OSs and worked the following way : on Windows (and sometimes on Mac OS X), you have to reboot for every update, which is annoying and lead people to ignore them, while on other UNIXes updates are applied to files on disk, but not to running programs, which means that if the program is never stopped, it won’t be updated (some people always keep their computer in sleep mode because they fear long boot times), and it may eventually crash by reading files made for a new incompatible release of it. In this operating system, we would like to go one step further by allowing live updating of running processes with only little lag for the user. Now this clearly requires help from the operating system, as the updating program will have to play with the other one. How live updating will work has not yet been decided. We’ll go into more details in a later post.
    Then there’s the backup issue. How many of us have lost hours worth of work due to a program brutally crashing ? Of course, developers are responsible of that because they released buggy software. We all know that. But actually, debugging software is not just a boring task that developers want to avoid, it’s also extremely *hard* to find some bugs. Even the most stability-oriented and carefully designed program includes bugs in its first release, because discovering them would have involved some kind of testing that the developer couldn’t think of. So buggy software is not going to disappear anytime soon, and the operating system has to manage them. What I propose is backing up processes at regular time intervals, so that users would be able to “go back in time” after a crash (or any other thing that went wrong) and restore the program state that was around some minutes ago, limiting data loss and allowing to get back to work more quickly. Again, details are not suited for this introductory article and will be introduced later.
That’s a lot of work, isn’t it ? It would hence be a bad idea to group all this in a single kernel-mode program : we would lose the benefit of our multiple process design in the core area that’s precisely the most used part of the system ! For that reason, we’ll try to move as much work as we can in user processes, keeping only what’s  mandatory in one or more kernel-mode processes, simply called the kernel. That’s called a micro-kernel design, and it will be discussed in more detail in further articles that will more deeply describe each core feature presented above. For the moment, thank you for reading !
(As an aside, for a long time we’ll be focusing on core : it works almost independently of any other operating system function, it requires a lot of work, its good operation is vital, and it is needed in order to implement any other part of the OS, so it should receive first-class attention. We will probably implement the core just after designing it and before designing any other part of the OS, for debugging and testing purposes, but that still has to be confirmed)

3 thoughts on “System design 1 – Core capabilities

  1. Amenel May 31, 2010 / 1:26 pm

    The process backup is a brilliant idea! Same goes with the live updating of processes although this one will require some thought. I have a few usability propositions to make but you’ve probably posted a more relevant article.

    Btw, although I get your point, not all updates on Mac OS X need a reboot. Even with Vista, some updates don’t need it either.

    The problem with this article is that by reading it, the contributions/original ideas are not clear. We don’t succeed in distinguishing what exists in current OSes and what is specific to your project. But this was the first article I read here… so, congratulations!

  2. Hadrien May 31, 2010 / 4:09 pm

    Thanks ;) About the updates, I’m talking about system updates. I didn’t know that OS X didn’t always required reboot after updates, because I only play with it from times to times on my girlfriend’s notebook, going to slightly reword this article in order to take this into account.

    Unless your usability ideas target the kernel API, I’m afraid that they’ll have to stay on paper a little bit : it may take several months (years ???) before I’m able to make anything GUI-related at my current development pace (though I’ll be able to go much faster once my stage is over). Once I start to publish things about GUI and interface on this blog, we’ll be able to discuss that further, but at the moment it’ll be like asking a price for a car whose engine didn’t went out of the lab yet ;)

    About original ideas, I don’t think that there’s anything brand new in here. I just picked some operating system design ideas here and there that sounded great and went well together. Major sources of inspiration were Andy Tanenbaum’s Modern Operating Systems and the EROS project, I think, but I’d have a hard time writing a CREDITS section for this article ^^’

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s