This document is an updated version of “The OS-periment’s service model : release candidate” that takes into account the feedback received when publishing this document on OSnews.
Okay, so let’s sum things up again, hopefully in a more clear way this time.
Following the microkernel tradition and the philosophy behind early UNIX systems, this OS is to be structured as a set of relatively independent modules, separated through process boundaries, which provide services to each other in a controlled way. The question which I aim to answer here is, which way ? How would processes mainly communicate with each other in order to provide these services ? Read more to find out.
I. Principle and use cases
If we take two processes, a client and a server, the action “server providing a service to client” is done through having the client ask the server to do work for him. The server then does the work, and then notifies the client when it’s done, if it’s interested in that information, optionally providing results that way. The programming abstraction commonly used for such purposes on library-based OSs is a function call : client code gives work to do to the server (the library), the server does the work then notifies the client when it’s done, optionally providing results (through the library function’s return mechanism). So the most naive implementation of a “service” would be an IPC primitive which emulate a function call, having the client “call” a function of the server. This IPC primitive is called Remote Procedure Call, or RPC.
A problem with RPC is that it is not as universal of an IPC primitive as one might wish. Consider, as an example, a notification mechanism. When a server sends a notification to a client, it certainly doesn’t care about what the client will be doing with it. So what the server would wish to do is to just send the notification, and then go doing something else. But if notifications were implemented using RPC, the server would have to stay blocked while the client processes the notification, until the client’s code returns. There would be, of course, ways to avoid this, like having the server create an extra thread to send the notification, but all in all one must admit that RPC simply doesn’t fit this use case well. The reason : it requires the server to care about when the client has completed its task, even when it doesn’t want or need to, as it blocks it while the call is processed. There are several other issues with blocking calls : they aren’t suitable for handling unexpected events, they cause deadlocks, getting around them leads process to create several threads only to have them block shortly thereafter, in a waste of system resources… All in all, the RPC model could be quite good, but blocking is definitely not a part of it which we want around.
So let’s remove it.
Our reworked IPC primitive would be something that works like a function call on the client’s side and runs a function of the server with the parameters given by the client, like in RPC. But this time, immediately after work to do has been given to the server, the client’s call returns. If the server wants to provide results, it does it by calling in turn a function of the client, whose parameters are standard : a callback.
Now, how should this primitive be called ? Since it’s basically RPC with callbacks and without the requirement for the calling code to block, I’d have spontaneously called it non-blocking RPC, but as Brendan points out, the notion of a non-blocking call makes little sense. The notion of messaging doesn’t fit well either, as the “message” word is linked to the notion of sending (or receiving) data while this IPC primitive is about sending work to do. The name of “service request” also comes to mind, but it fails to grasp the full generality of this communication primitive (which can as well be used for notifications and callbacks). “Task sending” would be sufficiently generic, but the word is already widely used to describe threads, fibers, processes, etc). In the end, I’m open to suggestions, and will stick with “remote call” in meantime, because even though it is not entirely appropriate, I have to use something.
II. How should it work in practice ?
II.1) Service broadcasting
II.1.a) Server side
Client code should obviously not be able to call all the functions of the server code, otherwise our IPC mechanism totally breaks process isolation and that’s not good at all. Therefore, as part of its initialization, the server should specify to the kernel which of its functions may be called by client processes, along with functions pointers so that the kernel may actually run them when they are called.
To allow for function overloading, the server should not only broadcast function names, but also a full function prototype. To allow for interoperability between clients and servers written in different programming languages, function prototypes should be written in one specific programming language, no matter which programming language the server is written in, so that the kernel may easily parse them. Translation to and from other languages’ function prototypes can be done by library code as needed. We adopt C/C++’s conventions for this task, as this is what most of the operating system will be written in and there’s no reason not to make our life simpler :)
As the programming language used to code the server may not support function overloading itself, it should be possible for a server to broadcast a function of its code under a different name (i.e. the prototype broadcasting process should not be fully automated by development tools)
To allow for breaking API changes that do not alter a function’s parameters structure (as an example, if a structure’s definition changes), a versioning system should be introduced. The server process may or may not broadcast an integer version number along with each function prototype, if he does not the function is assumed to be at version 0.
To allow for non-breaking API changes that extend a function’s feature set by appending extra parameters to it, the remote call subsystem should accept default parameter values. Conversely, when a callback is sent by the newer server to the older client, the new parameters of that callback should be silently ignored by the kernel.
Up to this point, the method through which the server sends its function prototypes to the IPC subsystem is unspecified. One possible way would be to use a system call and hard-code remote call definitions in the server’s code while another option suggested by xiaokj is to use a “spec sheet” text file which includes the function’s prototypes and may optionally include compatibility kludges. After some thoughts, the last suggestion is rejected, for the following reasons :
- Putting compatibility kludges (code) inside of a config file (data) is definitely very ugly, so it’s not something which I’d want to do. But without that, the spec sheet becomes nothing more than a list of function prototypes, and loses most of its interest.
- It is unclear how the function prototypes will be linked to the actual server code. If hexadecimal function pointers within the server’s binary are written in the spec sheet, then they will be invalidated every time the server is recompiled, unless development tools automatically update the spec sheet (which puts a high constraint on them. What about developers who hate IDEs and prefer to write their code in a text editor and compile it directly using GCC ?). If the server has to broadcast function pointers anyway, then we end up having part of the broadcasting work in the server code and part of it in the spec sheet, which is not exactly tidy.
- The spec sheet should only be modified during development periods, during which the code is available, so having function prototypes put in an external text config file sounds like an unnecessary burden. When compatibility kludges are allowed within the spec sheet, it allows one to make old clients run with much newer servers that have dropped compatibility with the API version that the client used. But I’d argue that it’s the server designer’s job to maintain backwards compatibility with old clients, not the user’s or the OS’. Designing an interpreter for the language that will be used to implement these compatibility kludges takes time and effort, I’m not ready to spend those because of people who can’t even do the basic effort of keeping their server software backwards-compatible. They’ll get the bad reputation they deserve if they do it.
- User-accessible config files require the presence and initialization of a filesystem. Purely code-based RPC does not.
- I have to agree that the spec sheet idea enforces separation between “core” server code and compatibility kludges, if server designers choose to use the possibility. But if these have good will, they could as well separate compatibility kludges and server code within their source by putting the compatibility stuff in a separate code module, and I have less work to do on my side in that case :)
II.1.b) Client side
To make different versions of the server and the client work well together using the features described above, it is necessary that client sends, too, as part of its initialization, the names, prototypes, versions, and default parameter values of the functions which is has been designed to work with.
Client developers wouldn’t have to care about that in practice, however, as most nice server developers will develop, either by themselves or using automated development tools, “stub” library calls which hide the internals of the remote call mechanism to them and make it as simple as calling a library function with the proper parameters. Only thing which the client’s developer will have to work on is callbacks, and again libraries can be used to make it as simple as writing the callback function itself and providing a function pointer to it as one of the library functions’ parameters.
II.2) Making the remote call
Okay, so the server and clients have sent their respective prototypes to the kernel, which has itself done its compatibility checks and confirmed that everything is okay. Now, how does the client call one of the server’s functions ?
“Normal” function calls are done by pushing the function’s parameters on the stack, in an order and a fashion that’s specified by the programming language and compiler’s calling convention, then jumping at the location of the function’s code. Obviously, this cannot be done here, because if client code can call server code seamlessly we have a major process isolation malfunction. Sharing the client’s stack with the server cannot be considered either for the same reason. So we have to somehow transfer the function parameters from the client’s stack (where they are when the stub library code is called) to a stack within the server’s address space, and we want that transfer to be as fast as possible because this remote call mechanism is to be used even in the lowest layer of the OS.
As mentioned earlier, most of the OS (at least its lowest layers) is going to be written in C/C++. If the client and the server are both written in C/C++, and use the same calling conventions, all we have to do is to copy the stack region associated with the function parameters somewhere on a stack within the server’s address space, add the new default parameters if the client was built for an older version of the server’s function, and then we can make the call on the server side while on the client side we remove the parameters from the stack and return. That is as fast as one can possibly get, and only requires the stub library code on the client side to specify the size of the function parameters on the client stack.
If communication between clients and servers written using two different programming languages is involved, it is possible to use translation layers at the library level. In fact, this is how languages which do not belong to the C family do system calls on most C-based modern OSs, and it works just fine.
Care must be taken about pointers, to ensure that the client does not send pointers within its own address space as parameters to remote calls, which could lead to either a breach of process isolation or server code which simply doesn’t work. To address involuntary errors, a special pointer type should be used in stub library calls, so that client writers remember that they have to share every data which they point to with the server. To address malicious attempts from the client to alter the server’s address space or have a look at it, the kernel should check every pointer parameter within a remote call in order to ensure that they only point in parts of the server’s address space that are shared by the client. On their side, server developers must either be careful and make sure that their remote calls’ parameters do not include pointers that are “hidden” inside structures, a typical example of such mistake being including a linked list within a remote call’s parameters, unless the library stub serializes it first.
II.2.b) Running the code
At this point, server-side code is ready to be run. Two modes of server operation are supported. One of them is the asynchronous mode : if the server is idle, then a thread is spawned within it to handle the task. While a task is being processed, further remote calls are put in a queue, either as inactive threads or on a mere heap of function parameters stack. When the processing is done, next task in queue is being processed, and so on, one task at a time.
Another mode of server operation is the threaded mode. The difference with the asynchronous mode is that each time a new remote call occurs, a new thread is spawned within the server process in an active state, until the number of running threads within the server process equates the number of CPU cores, at which point remote call processing falls back to an asynchronous behaviour.
A cache of “dead” threads may be used to greatly speed up thread creation by removing the need to allocate memory each time a new thread is created.
Threaded operating mode is recommended for independent CPU-bound tasks, as it makes optimal use of the computer’s multiple CPU cores. On the other hand, for IO-bound tasks which do not benefit from the extra parallelism, and in situations where running tasks in parallel would require so much synchronization that they would run slower than in a sequential fashion, aynchronous operation should be preferred.
When the server is done with its work, it may have to send a notification and results to the client using a “callback” remote call. It is interesting to investigate what happens then. At first look, it is exactly similar to the kind of remote calls used by the client, but in practice there is an important difference : in that case, the kernel must deal with the compatibility of newer server callbacks with older client implementations. So it must remove function parameters from a stack. Unlike adding parameters, the kernel may only do this for the server if the length of the new parameters is known of it in advance. This can be done in two ways. Either the server specifies the length of each function parameter each time it calls a callback function, adding overhead to the interprocess communication protocol, or variable-length parameters are forbidden in remote calls.
Considering how much of a bad idea variable-length parameters generally are (pushing a whole array on the stack is a memcpy() hell and a stack overflow waiting to happen), I can’t help but lean towards the second solution, especially considering that adding variable-length parameters later is possible if there’s demand for it. Considering that this is not a distributed operating system, shared memory buffers and pointers to the server’s address space sound like a very valid alternative to variable-length function parameters. But this point is open for discussion.
III. Possible extensions to this model
All this leads to a beautifully simple and fast remote call model, but sometimes extra features are needed. This is why the remote call infrastructure should offer room for extensions that alter the behaviour of the calls. Here are examples of what such extensions could be.
Nothing can cause more painful problems than having a server stuck in a while(1) loop equivalent, especially if that server uses an asynchronous operating mode. However, in some situations, it is able to prevent this issues by having the server automatically abort its current task if it is not completed after a certain amount of time, send an error to the caller process, and revert its internal state to what it used to be before the killed thread started to run.
This extension can improve the reliability of the operating system significantly, but it should be optional for two reasons :
- It adds a lot of overhead to client and server’s development, which may just be too much for non-critical services. The server must support reverting its internal state to an earlier version, which implies regularly saving it (causing performance hits) and can become quite complicated, especially when a threaded operating mode is being used. On its side, the client must support random killing of its operations on the server.
- It is hard to predict how much time a wide range of operations will take, as such data is hardware-dependent. Timeouts which work well on the developer’s PC may be too small on a user’s one. Also, as soon as the remote call operates on a variable amount of data, predicting in advance how low the remote call should last becomes even more tricky.
III.2) Cooldown time
Sometimes, a notification may be fired very frequently, and it may not actually matter whether the notification calls are indeed all processed. As an example, imagine a program where each time a data structure is modified, a notification that says nothing about which changes have occurred is fired to the UI layer to ensure that said UI is updated. Updating an UI more than, say, 30 times per second, may simply be a waste of CPU power.
For those situations, it is advantageous to give the notification remote call a “cooldown time”. If two calls are made with a time interval between them smaller than that time, the second call is scheduled to only run once the cooldown time has elapsed, and all further notification calls are dropped.
III.3) Out of order execution
For asynchronous operation, it is normally guaranteed that remote requests are processed by the server in the order in which they have arrived. But sometimes, some upcoming requests should be prioritized above others. Consider, as an example, a HDD driver, which must ensure that data blocks which are physically close to each other are written together in order to maximize its performance.
In that case, server software may want a hack-free way to reorder its pending requests. The out of order execution extension provides just that : when a new incoming request arrives and is queued, a notification call occurs on a separate async remote call interface of the server. The handler of that notification can then decide if the newly queued element should be ordered or not, while various mutual exclusion mechanisms can be used to ensure that the pending task queue is not modified during the most critical periods of this decision process.
Obviously, server software should then point out clearly in its API’s documentation that calls may be reordered, in order to avoid disasters linked to the client assuming in which order its requests will be processed.
III.3) Security checks
At the core of this OS’ philosophy stands the statement that system modules shouldn’t be allowed to do much more than they need in order to complete their task. This sandboxing-centric philosophy also applies to services : why should processes be able to access system services which are not useful for their tasks. As remote calls are the way system services are requested on this OS, a remote call sounds like the perfect place to check if a process has the right to access a given service, by having a look at its security tokens, or whatever else security mechanism I will implement.
This extension could work by having the kernel itself examine the process’ security permissions, or it could work by having a remote call sent to the server when the client asks to run a specific function for the first time, and having the server itself check the client’s security permissions. The latter is potentially worse for privacy and may increase the harm which a compromised server can do in a way which I can’t think of right now, but the former implies putting complex security checks in the kernel while all complexity which can be taken out of the kernel should be taken out of the kernel.