I start to get this feeling that my body hates me. Why am I always so tired ? Don’t I sleep well enough ? Well, I can’t do much about pollens in the air and warm nights, but improving the rest didn’t help. Is it because I am a bit hypothyroidic then ? But medicine has yet to find a curable cause to that, although there still is a last chemical to study in my blood. Or is it a normal side-effect of one of my allergies ? I’m starting to lose hope. How much time left before I have to postpone all good stuff to holidays ? But that’s not happened yet, and while I still can write and design stuff, while I still have some energy left to rebel against that senseless fatigue that tries to fill my life with chaotic thoughts and a depressing absence of in-depth intellectual activity, here are some additions to the RPC system which I am currently pondering.
Task completion notification and results
RPC is, by design, a member of the family of asynchronous programming primitives. You send some work to do to a system service, the function returns immediately, and you are free to do everything you want from then on. This programming model is traditionally useful for building massively parallel and highly responsive software, since one needs not to have threads blocked and care about how long they will remain so. The inevitable counterpart, however, is that it is harder to return results, or even notify the caller that some chunk of work has completed.
The traditional solution that is proposed to this problem is callbacks. When it’s done, the asynchronous thread “returns” by calling a function of the caller to notify it of the completion of the requested work and its result. Callbacks, however, are not the easiest thing in the world to manage for clients, since they mean stray threads that have to be dealt with and properly synchronized with the rest of the program’s code. RPC makes it worse : while it is possible to use it to implement a form of inter-process callbacks, it also means that service clients have to mess with the complex part of the RPC system, service broadcasting. This, in turn, is not a highly desirable outcome : clients spawning and killing RPC entry points all the time hurts system performance, is a potential attack vector, and violates the important rule that OS structure should work towards making the life of user-mode software easier even when it means making the life of system developers harder.
An interesting high-level approach to this problem is futures : when an asynchronous function is run, it returns a “future” object, that looks much like a pointer to the result but will cause any function that attempts to touch it to block until said result is actually available. However, while this is an extremely elegant way to give asynchronous calls a more synchronous feel, I sense several problems with it. First, as with many other high-level programming constructs, future implementations tend to be highly language-specific, and mean a lots of work for the OS who tries to use them as a core primitive. A more pernicious side-effect of futures is that since they are designed to feel a lot like synchronous results, programs risk way too often to accidentally trigger the lookup mechanism and block.
Once these elegant approaches have been discussed, less elegant approaches remain. After all, one may argue, since RPC supports pointers and references by design, RPC routines could solely return results through those. The main issue, then, would be that the caller of the RPC routine does not know when it will complete, and when it will be able to access and trust the value of the processed variables. However, solving the task completion notification problem, alone, is easier than solving the full task completion notification and result retrieval one, and could perhaps be done with an ad-hoc solution.
Let’s imagine that anytime a program makes an RPC call, it would get as a result some sort of “task identifier” that uniquely identifies the request. Let’s add to that at least one of these functionalities is then added :
- Threads can synchronously wait for the completion of a task when they have nothing else to do
- Threads can check the status of a task (with the usual warning that polling is wasteful)
- Threads can set up a parameter-less callback to be notified of task completion (though I don’t like this latter one, as it basically requires a re-implementation of RPC)
Now, this starts to look an awful lot like a tweaked UNIX signalling-like small extra IPC mechanism, that could just as well used in other places (such as for notifying user processes of external events). Task completion detection itself could work by automatically considering a task completed when the execution of the associated thread is over, as a default, but letting RPC services take back manual control on that mechanism if they need to (such as in task-serializing RPC services like disk drivers). There is one last problem to tackle, though.
Over time, as RPC calls are made, the OS will have to manage an increasing amount of task identifiers, linked with the boolean information of whether the associated task has terminated or not. In effect, this looks suspiciously like a memory leak : as system uptime grows, the amount of managed task identifiers also grows, possibly to unacceptable levels. Clearly, something must be done about this problem.
Among the various options that I could think of, the only one that could work fairly well would be to have task identifiers “time out” some time after a task has completed, which would naturally regulate their population. But then, the associated delay would have to be well chosen, and that in turn brings extra complexity to the implementation (what if, say, aggressive power management rules makes task run extremely slowly, like one second every 15 minutes, when a device is not in use ?). If someone has a better idea in mind, I’m all ears.
Task cancellation and error management
Sometimes, users start some big request in a software, then realize that they have made a mistake and want to cancel this request. To provide this option easily, applications need in turn system services to provide the same cancellation feature. Conversely, service requests also like to be able to return before having completed their work if they run into errors. This is particularly a problem with synchronous APIs, which need extra threads to be spawned and a way for API functions to return an “invalid” result when they have been cancelled.
Now, what if there was a way to send a UNIX-like signal to an RPC-driven daemon, that it is able to periodically poll during long tasks (I really can’t think of a more elegant way) so as to abort and return more early when a request is cancelled ? Reminds you of something ? Well, that is for a reason. The task state storage mechanism mentioned above, if adopted, could be used to manage an extra task state, the “Cancelled” state, that services and client could probe to adapt their behaviour as they see fit. That would be, as far as I can tell, one more reason to embrace it.
(And as a bonus feature, since that mechanism is kernel-managed, all pending RPC calls could automatically be put in an error state in the sad event that the associated server would freeze or crash, a useful if a bit frustrating error reporting feature)
Some system services are naturally suitable for a manipulation via a small amount of short and expressive requests, or can be easily adjusted to deal with those. One can think, as an example, of the file system. Others system services, however, are cursed to be forever spammed with endless amounts of service requests, unless they decide to embrace some sort of scripting API, which is slightly at odds with the philosophy of the RPC mechanism, that normally directly exposes the programming interface of services to clients.
Due to this, Alfman recently suggested here that a performance optimization be brought in the RPC mechanism so as to allow clients to send large amounts of RPC requests all at once. And I have to agree in principle, although it is the kind of ideas that I would tend to keep for a v2 of this operating system, since they are not essential to its core operation and only represent a nice extra if added on top of a working OS.
So, what do you think about these ideas and their possible inclusion in this OS ? Do you like or dislike some, and why ? Do you have ideas on those points that I don’t know how to address yet ? As usual, do not hesitate to express yourself in the comments.