The OS-periment’s service model, RC2

This document is an updated version of “The OS-periment’s service model : release candidate” that takes into account the feedback received when publishing this document on OSnews.

Okay, so let’s sum things up again, hopefully in a more clear way this time.

Following the microkernel tradition and the philosophy behind early UNIX systems, this OS is to be structured as a set of relatively independent modules, separated through process boundaries, which provide services to each other in a controlled way. The question which I aim to answer here is, which way ? How would processes mainly communicate with each other in order to provide these services ? Read more to find out.

I. Principle and use cases

If we take two processes, a client and a server, the action “server providing a service to client” is done through having the client ask the server to do work for him. The server then does the work, and then notifies the client when it’s done, if it’s interested in that information, optionally providing results that way. The programming abstraction commonly used for such purposes on library-based OSs is a function call : client code gives work to do to the server (the library), the server does the work then notifies the client when it’s done, optionally providing results (through the library function’s return mechanism). So the most naive implementation of a “service” would be an IPC primitive which emulate a function call, having the client “call” a function of the server. This IPC primitive is called Remote Procedure Call, or RPC.

A problem with RPC is that it is not as universal of an IPC primitive as one might wish. Consider, as an example, a notification mechanism. When a server sends a notification to a client, it certainly doesn’t care about what the client will be doing with it. So what the server would wish to do is to just send the notification, and then go doing something else. But if notifications were implemented using RPC, the server would have to stay blocked while the client processes the notification, until the client’s code returns. There would be, of course, ways to avoid this, like having the server create an extra thread to send the notification, but all in all one must admit that RPC simply doesn’t fit this use case well. The reason : it requires the server to care about when the client has completed its task, even when it doesn’t want or need to, as it blocks it while the call is processed. There are several other issues with blocking calls : they aren’t suitable for handling unexpected events, they cause deadlocks, getting around them leads process to create several threads only to have them block shortly thereafter, in a waste of system resources… All in all, the RPC model could be quite good, but blocking is definitely not a part of it which we want around.

So let’s remove it.

Our reworked IPC primitive would be something that works like a function call on the client’s side and runs a function of the server with the parameters given by the client, like in RPC. But this time, immediately after work to do has been given to the server, the client’s call returns. If the server wants to provide results, it does it by calling in turn a function of the client, whose parameters are standard : a callback.

Now, how should this primitive be called ? Since it’s basically RPC with callbacks and without the requirement for the calling code to block, I’d have spontaneously called it non-blocking RPC, but as Brendan points out, the notion of a non-blocking call makes little sense. The notion of messaging doesn’t fit well either, as the “message” word is linked to the notion of sending (or receiving) data while this IPC primitive is about sending work to do. The name of “service request” also comes to mind, but it fails to grasp the full generality of this communication primitive (which can as well be used for notifications and callbacks). “Task sending” would be sufficiently generic, but the word is already widely used to describe threads, fibers, processes, etc). In the end, I’m open to suggestions, and will stick with “remote call” in meantime, because even though it is not entirely appropriate, I have to use something.

II. How should it work in practice ?

II.1) Service broadcasting

II.1.a) Server side

Client code should obviously not be able to call all the functions of the server code, otherwise our IPC mechanism totally breaks process isolation and that’s not good at all. Therefore, as part of its initialization, the server should specify to the kernel which of its functions may be called by client processes, along with functions pointers so that the kernel may actually run them when they are called.

To allow for function overloading, the server should not only broadcast function names, but also a full function prototype. To allow for interoperability between clients and servers written in different programming languages, function prototypes should be written in one specific programming language, no matter which programming language the server is written in, so that the kernel may easily parse them. Translation to and from other languages’ function prototypes can be done by library code as needed. We adopt C/C++’s conventions for this task, as this is what most of the operating system will be written in and there’s no reason not to make our life simpler :)

As the programming language used to code the server may not support function overloading itself, it should be possible for a server to broadcast a function of its code under a different name (i.e. the prototype broadcasting process should not be fully automated by development tools)

To allow for breaking API changes that do not alter a function’s parameters structure (as an example, if a structure’s definition changes), a versioning system should be introduced. The server process may or may not broadcast an integer version number along with each function prototype, if he does not the function is assumed to be at version 0.

To allow for non-breaking API changes that extend a function’s feature set by appending extra parameters to it, the remote call subsystem should accept default parameter values. Conversely, when a callback is sent by the newer server to the older client, the new parameters of that callback should be silently ignored by the kernel.

Up to this point, the method through which the server sends its function prototypes to the IPC subsystem is unspecified. One possible way would be to use a system call and hard-code remote call definitions in the server’s code while another option suggested by xiaokj is to use a “spec sheet” text file which includes the function’s prototypes and may optionally include compatibility kludges. After some thoughts, the last suggestion is rejected, for the following reasons :

  • Putting compatibility kludges (code) inside of a config file (data) is definitely very ugly, so it’s not something which I’d want to do. But without that, the spec sheet becomes nothing more than a list of function prototypes, and loses most of its interest.
  • It is unclear how the function prototypes will be linked to the actual server code. If hexadecimal function pointers within the server’s binary are written in the spec sheet, then they will be invalidated every time the server is recompiled, unless development tools automatically update the spec sheet (which puts a high constraint on them. What about developers who hate IDEs and prefer to write their code in a text editor and compile it directly using GCC ?). If the server has to broadcast function pointers anyway, then we end up having part of the broadcasting work in the server code and part of it in the spec sheet, which is not exactly tidy.
  • The spec sheet should only be modified during development periods, during which the code is available, so having function prototypes put in an external text config file sounds like an unnecessary burden. When compatibility kludges are allowed within the spec sheet, it allows one to make old clients run with much newer servers that have dropped compatibility with the API version that the client used. But I’d argue that it’s the server designer’s job to maintain backwards compatibility with old clients, not the user’s or the OS’. Designing an interpreter for the language that will be used to implement these compatibility kludges takes time and effort, I’m not ready to spend those because of people who can’t even do the basic effort of keeping their server software backwards-compatible. They’ll get the bad reputation they deserve if they do it.
  • User-accessible config files require the presence and initialization of a filesystem. Purely code-based RPC does not.
  • I have to agree that the spec sheet idea enforces separation between “core” server code and compatibility kludges, if server designers choose to use the possibility. But if these have good will, they could as well separate compatibility kludges and server code within their source by putting the compatibility stuff in a separate code module, and I have less work to do on my side in that case :)
II.1.b) Client side

To make different versions of the server and the client work well together using the features described above, it is necessary that client sends, too, as part of its initialization, the names, prototypes, versions, and default parameter values of the functions which is has been designed to work with.

Client developers wouldn’t have to care about that in practice, however, as most nice server developers will develop, either by themselves or using automated development tools, “stub” library calls which hide the internals of the remote call mechanism to them and make it as simple as calling a library function with the proper parameters. Only thing which the client’s developer will have to work on is callbacks, and again libraries can be used to make it as simple as writing the callback function itself and providing a function pointer to it as one of the library functions’ parameters.

II.2) Making the remote call

II.2.a) Parameters

Okay, so the server and clients have sent their respective prototypes to the kernel, which has itself done its compatibility checks and confirmed that everything is okay. Now, how does the client call one of the server’s functions ?

“Normal” function calls are done by pushing the function’s parameters on the stack, in an order and a fashion that’s specified by the programming language and compiler’s calling convention, then jumping at the location of the function’s code. Obviously, this cannot be done here, because if client code can call server code seamlessly we have a major process isolation malfunction. Sharing the client’s stack with the server cannot be considered either for the same reason. So we have to somehow transfer the function parameters from the client’s stack (where they are when the stub library code is called) to a stack within the server’s address space, and we want that transfer to be as fast as possible because this remote call mechanism is to be used even in the lowest layer of the OS.

As mentioned earlier, most of the OS (at least its lowest layers) is going to be written in C/C++. If the client and the server are both written in C/C++, and use the same calling conventions, all we have to do is to copy the stack region associated with the function parameters somewhere on a stack within the server’s address space, add the new default parameters if the client was built for an older version of the server’s function, and then we can make the call on the server side while on the client side we remove the parameters from the stack and return. That is as fast as one can possibly get, and only requires the stub library code on the client side to specify the size of the function parameters on the client stack.

If communication between clients and servers written using two different programming languages is involved, it is possible to use translation layers at the library level. In fact, this is how languages which do not belong to the C family do system calls on most C-based modern OSs, and it works just fine.

Care must be taken about pointers, to ensure that the client does not send pointers within its own address space as parameters to remote calls, which could lead to either a breach of process isolation or server code which simply doesn’t work. To address involuntary errors, a special pointer type should be used in stub library calls, so that client writers remember that they have to share every data which they point to with the server. To address malicious attempts from the client to alter the server’s address space or have a look at it, the kernel should check every pointer parameter within a remote call in order to ensure that they only point in parts of the server’s address space that are shared by the client. On their side, server developers must either be careful and make sure that their remote calls’ parameters do not include pointers that are “hidden” inside structures, a typical example of such mistake being including a linked list within a remote call’s parameters, unless the library stub serializes it first.

II.2.b) Running the code

At this point, server-side code is ready to be run. Two modes of server operation are supported. One of them is the asynchronous mode : if the server is idle, then a thread is spawned within it to handle the task. While a task is being processed, further remote calls are put in a queue, either as inactive threads or on a mere heap of function parameters stack. When the processing is done, next task in queue is being processed, and so on, one task at a time.

Another mode of server operation is the threaded mode. The difference with the asynchronous mode is that each time a new remote call occurs, a new thread is spawned within the server process in an active state, until the number of running threads within the server process equates the number of CPU cores, at which point remote call processing falls back to an asynchronous behaviour.

A cache of “dead” threads may be used to greatly speed up thread creation by removing the need to allocate memory each time a new thread is created.

Threaded operating mode is recommended for independent CPU-bound tasks, as it makes optimal use of the computer’s multiple CPU cores. On the other hand, for IO-bound tasks which do not benefit from the extra parallelism, and in situations where running tasks in parallel would require so much synchronization that they would run slower than in a sequential fashion, aynchronous operation should be preferred.

II.3) Callback

When the server is done with its work, it may have to send a notification and results to the client using a “callback” remote call. It is interesting to investigate what happens then. At first look, it is exactly similar to the kind of remote calls used by the client, but in practice there is an important difference : in that case, the kernel must deal with the compatibility of newer server callbacks with older client implementations. So it must remove function parameters from a stack. Unlike adding parameters, the kernel may only do this for the server if the length of the new parameters is known of it in advance. This can be done in two ways. Either the server specifies the length of each function parameter each time it calls a callback function, adding overhead to the interprocess communication protocol, or variable-length parameters are forbidden in remote calls.

Considering how much of a bad idea variable-length parameters generally are (pushing a whole array on the stack is a memcpy() hell and a stack overflow waiting to happen), I can’t help but lean towards the second solution, especially considering that adding variable-length parameters later is possible if there’s demand for it. Considering that this is not a distributed operating system, shared memory buffers and pointers to the server’s address space sound like a very valid alternative to variable-length function parameters. But this point is open for discussion.

III. Possible extensions to this model

All this leads to a beautifully simple and fast remote call model, but sometimes extra features are needed. This is why the remote call infrastructure should offer room for extensions that alter the behaviour of the calls. Here are examples of what such extensions could be.

III.1) Timeouts

Nothing can cause more painful problems than having a server stuck in a while(1) loop equivalent, especially if that server uses an asynchronous operating mode. However, in some situations, it is able to prevent this issues by having the server automatically abort its current task if it is not completed after a certain amount of time, send an error to the caller process, and revert its internal state to what it used to be before the killed thread started to run.

This extension can improve the reliability of the operating system significantly, but it should be optional for two reasons :

  • It adds a lot of overhead to client and server’s development, which may just be too much for non-critical services. The server must support reverting its internal state to an earlier version, which implies regularly saving it (causing performance hits) and can become quite complicated, especially when a threaded operating mode is being used. On its side, the client must support random killing of its operations on the server.
  • It is hard to predict how much time a wide range of operations will take, as such data is hardware-dependent. Timeouts which work well on the developer’s PC may be too small on a user’s one. Also, as soon as the remote call operates on a variable amount of data, predicting in advance how low the remote call should last becomes even more tricky.

III.2) Cooldown time

Sometimes, a notification may be fired very frequently, and it may not actually matter whether the notification calls are indeed all processed. As an example, imagine a program where each time a data structure is modified, a notification that says nothing about which changes have occurred is fired to the UI layer to ensure that said UI is updated. Updating an UI more than, say, 30 times per second, may simply be a waste of CPU power.

For those situations, it is advantageous to give the notification remote call a “cooldown time”. If two calls are made with a time interval between them smaller than that time, the second call is scheduled to only run once the cooldown time has elapsed, and all further notification calls are dropped.

III.3) Out of order execution

For asynchronous operation, it is normally guaranteed that remote requests are processed by the server in the order in which they have arrived. But sometimes, some upcoming requests should be prioritized above others. Consider, as an example, a HDD driver, which must ensure that data blocks which are physically close to each other are written together in order to maximize its performance.

In that case, server software may want a hack-free way to reorder its pending requests. The out of order execution extension provides just that : when a new incoming request arrives and is queued, a notification call occurs on a separate async remote call interface of the server. The handler of that notification can then decide if the newly queued element should be ordered or not, while various mutual exclusion mechanisms can be used to ensure that the pending task queue is not modified during the most critical periods of this decision process.

Obviously, server software should then point out clearly in its API’s documentation that calls may be reordered, in order to avoid disasters linked to the client assuming in which order its requests will be processed.

III.3) Security checks

At the core of this OS’ philosophy stands the statement that system modules shouldn’t be allowed to do much more than they need in order to complete their task. This sandboxing-centric philosophy also applies to services : why should processes be able to access system services which are not useful for their tasks. As remote calls are the way system services are requested on this OS, a remote call sounds like the perfect place to check if a process has the right to access a given service, by having a look at its security tokens, or whatever else security mechanism I will implement.

This extension could work by having the kernel itself examine the process’ security permissions, or it could work by having a remote call sent to the server when the client asks to run a specific function for the first time, and having the server itself check the client’s security permissions. The latter is potentially worse for privacy and may increase the harm which a compromised server can do in a way which I can’t think of right now, but the former implies putting complex security checks in the kernel while all complexity which can be taken out of the kernel should be taken out of the kernel.

16 thoughts on “The OS-periment’s service model, RC2

  1. Alfman June 5, 2011 / 2:40 am

    “I’d have spontaneously called it non-blocking RPC, but as Brendan points out, the notion of a non-blocking call makes little sense.”

    I thought Brendan was joking in his statement about non-blocking calls, maybe he wasn’t.

    I just wanted to put this into context of how the dotnet SOAP implementation handles non-blocking RPC.

    http://www.developer.com/net/net/article.php/3408531/Working-With-Asynchronous-NET-Web-Service-Clients.htm

    The article is too long, jump to the source code.

    In short, when you add a soap service to the .net project, the IDE downloads the specification and it automatically generates a proxy class (which you can view the source code for, parsing the soap XML and wiring together parameters, etc). This proxy class is given a name/namespace and becomes available to your project.

    Here’s an example of the author performing a synchronous blocking call against the auto generated class wrapper to the SOAP call.

    “ws.FindPhoneNumberForName(tbName.Text);”

    Now, the class generator also creates non-blocking versions of all the soap functions (it does this for us out of the box).

    “ws.BeginFindPhoneNumberForName(tbName.Text,
    new AsyncCallback(lookupHandler),ws);”

    Inside the callback handler, we can retrieve the result:

    “mLangResult = webServ.EndMyWebMethod(asyncResult);”

    So the RPC/callback mechanisms work almost the same as you describe, but there’s a slight difference. The project actually has a class representing the RPC calls.

    “To allow for function overloading, the server should not only broadcast function names, but also a full function prototype.”

    Instead of a function prototype, could the server send the client an actual class which could be late bound to the client? This way the server could provide functionality to the client in certain cases without incurring IPC overhead.

    For example, a GUI window service could send the client a class which records several operations at once locally within the client until either the queue is full or it is explicitly submitted?

    Another example is for files, a local class could eliminate the need for trivial syscalls (I’m thinking things like “lseek”, which are just as easily merged with an actual read/write)

    In this model the service could update the class implementation without affecting the interface which the client depends on.

    I realize this idea is out of the blue, and I haven’t giving it much thought, but merging requests could have benefits of shortcutting unnecessary round trips between processes.

    I don’t have time right now to explain it further…I just wanted to post something before forgetting.

  2. Alfman June 5, 2011 / 5:10 am

    My comments…you’ve mentioned alot, so I’ll try to keep them brief.

    “Client code should obviously not be able to call any function of server code, otherwise our IPC mechanism totally breaks process isolation and that’s not good at all.”

    I know what you meant to say, but it reads as though no server functions are able to be called.

    “To allow for interoperability between clients and servers written in different programming languages, function prototypes should be written in one specific programming language, no matter which programming language the server is written in, so that the kernel may easily parse them and translate them to other languages as needed.”

    I don’t like the idea of adding many language specific bindings in the kernel.
    If the translation were in a user space library, then it could be both forward and backward compatible with kernels having no knowledge of the language.

    “Conversely, when a callback is sent by the newer server to the older client, the new parameters of that callback should be silently ignored by the kernel.”

    It might be good to mention input/output parameters here?

    “…make sure that their remote calls’ parameters do not include pointers that are ‘hidden’ inside structures, a typical example of such mistake being including a linked list within a remote call’s parameters, unless the library stub serializes it first.”

    I’d like to know more about the serialization.

    “To address malicious attempts from the client to alter the server’s address space or have a look at it, the kernel should check every pointer parameter within a remote call in order to ensure that they only point in parts of the server’s address space that are shared by the client.”

    What about the reverse, does the client trust the server?
    Do all processes share the same address space? Or are they local addresses which need to be mapped to one another?

    “Now, how does the client call one of the server’s functions ? … Considering how much of a bad idea variable-length parameters generally are…shared memory buffers and pointers to the server’s address space sound like a very valid alternative to variable-length function parameters”

    I understand your reasoning for disallowing variable length arguments. And I agree that the shared memory mitigate the need for passing variable length data on the stack. However I’m curious how the processes will manage the shared memory between them? Hopefully there will be a standard memory allocator for shared buffers which the two processes can use, otherwise the memory allocation mechanisms would need to be specified in the documentation for the server IPC call.

    Consider these scenarios:
    1. Client alloc, server free (client passes structures into server, and discards)
    2. Client alloc, client free (client manages shared memory)
    3. Server alloc, client free (server returns structures to client)
    4. Server alloc, server free (not sure if realistic, but could happen over multiple calls)

    “Timeouts”

    If a client or server dies, I think the other end should immediately be notified instead of waiting for a timeout.

    “Out of order execution”

    Can you supply an example of why explicit OOE is needed?
    A threaded server can clearly be “out of order” by returning when it wants to.
    In async, I presume the server can send it’s completion notifications back to the client in any order it wants to, no?

    Dynamic RPC?

    Will it be possible to RPC dynamically? For example, a service might be a VOIP pbx where the client tells the server to notify it via another RPC call when a call comes in. The VOIP service would obviously not have knowledge of the client at compile time (although the RPC interface should be known ahead of time).

  3. Hadrien June 5, 2011 / 8:57 am

    I thought Brendan was joking in his statement about non-blocking calls, maybe he wasn’t.

    You’re right, maybe he was. Anyway, I think I’ll end up using the “nonblocking remote call” terminology, it just best describes the concept among what I have at the moment, except if I find something better.

    (…)So the RPC/callback mechanisms work almost the same as you describe, but there’s a slight difference. The project actually has a class representing the RPC calls.

    I see, but I’m not sure I see the benefit at this point. This could just make things harder for people who don’t use OO programming languages, except if implemented in the bindings of each specific language (classes for OO languages, functions for non-OO)…

    Instead of a function prototype, could the server send the client an actual class which could be late bound to the client? This way the server could provide functionality to the client in certain cases without incurring IPC overhead.

    I’m a bit uncomfortable with setting “this uses a class” among the requirements of the core IPC model, because it means that clients written in languages that don’t have classes would get classes shoehorned in language paradigms where they don’t fit well. Think GObject.

    I also don’t feel comfortable with the idea of dynamically linking different versions of server code with client code. This would break one of the interesting features of this RPC model, which is that different versions of a given code can communicate with each other with the RPC code acting like a glue. Here, the library code would have to link with the client code, which sets some extra requirements.

    On the other hand, things like caching and RPC call stacking could totally be done in the client-side stub library code, using classes in the bindings of languages which support it. The kernel should only be able to accept batches of RPC calls at once, which is totally something that can be added to the model and implemented.

  4. Hadrien June 5, 2011 / 9:46 am

    “Client code should obviously not be able to call any function of server code, otherwise our IPC mechanism totally breaks process isolation and that’s not good at all.”

    I know what you meant to say, but it reads as though no server functions are able to be called.

    Fixed. The correct way to do this in English is indeed using “all”, not “any”.

    “To allow for interoperability between clients and servers written in different programming languages, function prototypes should be written in one specific programming language, no matter which programming language the server is written in, so that the kernel may easily parse them and translate them to other languages as needed.”

    I don’t like the idea of adding many language specific bindings in the kernel.
    If the translation were in a user space library, then it could be both forward and backward compatible with kernels having no knowledge of the language.

    In fact, we totally agree, but maybe this sentence wasn’t well-written either. This time I can’t see why, maybe you could try to read it again and tell me what’s wrong.

    “Conversely, when a callback is sent by the newer server to the older client, the new parameters of that callback should be silently ignored by the kernel.”

    It might be good to mention input/output parameters here?

    Sorry, I don’t understand what you mean.

    “…make sure that their remote calls’ parameters do not include pointers that are ‘hidden’ inside structures, a typical example of such mistake being including a linked list within a remote call’s parameters, unless the library stub serializes it first.”

    I’d like to know more about the serialization.

    Well, I know that it is possible and that popular linked lists implementations do it, but I haven’t thought about how exactly it might work. Probably by saving the linked list’s elements in an array, then modifying their “linking” pointers so that those which are nonzero are turned into array indexes starting with 1 instead. In my opinion, this should be implemented on a per-structure basis, and the kernel shouldn’t have to care about the structure hierarchy used by a program.

    “To address malicious attempts from the client to alter the server’s address space or have a look at it, the kernel should check every pointer parameter within a remote call in order to ensure that they only point in parts of the server’s address space that are shared by the client.”

    What about the reverse, does the client trust the server?
    Do all processes share the same address space? Or are they local addresses which need to be mapped to one another?

    I don’t think the client should trust the server either. This would provide servers with the capability to instantly kill clients which they don’t like, something which clearly doesn’t fit well in a sandboxing model.

    About the address space, I think it’s much easier to do it using local addresses. At the time where client shares it data with the server, it receives a pointer to the location of the data in the server’s address space. To reduce the amount of mistakes on the client’s side, this pointer could be implemented as a special pointer type (something like struct DistantPointer{ void* the_pointer; }), so that using a normal pointer as a remote call parameter fails.

    Another way to do it, that I start to think might be better, is to have the client provide pointers to shareable data on its side, and have the kernel do the sharing job. This way, the validity of pointers is automatically ensured : no more costly checks. On the other hand, the RPC protocol must then manage sharing failures.

    I understand your reasoning for disallowing variable length arguments. And I agree that the shared memory mitigate the need for passing variable length data on the stack. However I’m curious how the processes will manage the shared memory between them? Hopefully there will be a standard memory allocator for shared buffers which the two processes can use, otherwise the memory allocation mechanisms would need to be specified in the documentation for the server IPC call.

    Actually, we’re talking about one of the few pieces of code which I’ve already implemented there, and it’s with great happiness that I can say : yes, I’ve thought about that.

    There is a specific memory allocator for shareable data (because it needs to take some extra precautions which standard malloc doesn’t need, e.g. only one allocated object per page), and the MM system includes a call for sharing memory between two processes. Data liberation is managed in a beautifully simple fashion which I’m quite proud of (but I’m sure someone has already invented in the past) : when a process frees data that is shared with other processes, that data is only removed from its address space. After all, why should a process be allowed to discard another process’ data ?

    Consider these scenarios:
    1. Client alloc, server free (client passes structures into server, and discards)

    Client allocates, shares, sends the RPC call, then frees the shared data on its side since it doesn’t need it anymore. When the server is done, it frees the data on its side too. The MM system notices that the shared chunk has no owner anymore, so this time it is freed for good.

    2. Client alloc, client free (client manages shared memory)

    Client allocates, shares, sends the RPC call. Server does its thing, then frees the data on its side. Client frees the data too, and the data is discarded for good. This scenario may be better handled if sharing is done automatically by the kernel as described above. This way, the server doesn’t have to care about freeing the shared data, the sharing and liberation on server side is fully managed by the kernel.

    3. Server alloc, client free (server returns structures to client)

    Same scenario as client alloc/server free, if callbacks work like regular remote calls (which I’d like a lot).

    4. Server alloc, server free (not sure if realistic, but could happen over multiple calls)

    Same scenario as client alloc client free again. And again, in this case, it might be better if the sharing is managed by the kernel.

    “Timeouts”

    If a client or server dies, I think the other end should immediately be notified instead of waiting for a timeout.

    Certainly, but timeouts are for situations where the server is not exactly dead but hung in an infinite loop or something similar. In that case, the kernel needs a timeout before it declares that the server is dead.

    Can you supply an example of why explicit OOE is needed?
    A threaded server can clearly be “out of order” by returning when it wants to.
    In async, I presume the server can send it’s completion notifications back to the client in any order it wants to, no?

    Never. The server can always do it by itself. I thought that having it access its pending task queue could make OOE cleaner, but you’re right that in the end it might cause more harm than good. *STRIKE*

    Dynamic RPC?

    Will it be possible to RPC dynamically? For example, a service might be a VOIP pbx where the client tells the server to notify it via another RPC call when a call comes in. The VOIP service would obviously not have knowledge of the client at compile time (although the RPC interface should be known ahead of time).

    Couldn’t this feature be managed in exactly the same way as a callback ?

  5. Alfman June 5, 2011 / 7:55 pm

    “In fact, we totally agree, but maybe this sentence wasn’t well-written either. This time I can’t see why, maybe you could try to read it again and tell me what’s wrong.”
    “so that the kernel may easily parse them and translate them to other languages as needed”

    The way I read it is that the kernel is doing translations on behave of userspace applications written in multiple languages.

    “Sorry, I don’t understand what you mean.”

    I guess I’m a little unclear as to what the server can send back to the client. Can the server change the parameters as though they’re passed by reference?

    “In my opinion, this should be implemented on a per-structure basis, and the kernel shouldn’t have to care about the structure hierarchy used by a program.”

    I agree in principal, but other RPCs do have to step in and either dictate a structure, or fundamentally understand a program’s existing structure in order to serialize it. For you the first approach is always possible but undesirable. The second approach is easier using reflective languages, but in C there’s no such thing.

    “Another way to do it, that I start to think might be better, is to have the client provide pointers to shareable data on its side, and have the kernel do the sharing job. This way, the validity of pointers is automatically ensured”

    I’m assuming your talking about making RPC copy client data into a new page at the server? The alternative (of sharing the existing page) would not have the appropriate granularity, since the server would see more data than it’s entitled to.

    “it needs to take some extra precautions which standard malloc doesn’t need, e.g. only one allocated object per page… Client allocates, shares, sends the RPC call, then frees the shared data on its side since it doesn’t need it anymore.”

    Yes, that seems pretty neat, I’ve never seen it done that way. It’s very transpearent.
    However doesn’t it make some traditional structures, like linked lists, impractical due to large allocated object size?
    Never the less, I can see how it gets you the semantics you want.

    I’m wondering if there’s any issue with a client which calls the server multiple times simultaniously? What happens when the client shares the same page over and over again? If one page is shared by multiple RPCs, the server cannot free the page until the last RPC finishes – is there a reference counter on the shared memory?

    “Couldn’t this feature be managed in exactly the same way as a callback?”

    Well I don’t know.

    If a server tries to send a completion notification for an event which lacked an RPC from the client, what happens?
    I guess you could require that, in order to receive notifications from a server, then the client must always have a pending/idle RPC request in place to the server. However this could make it more difficult for the server to remember to notify the client of two quick phone calls in a row (before the client was able to re-establish the event listening RPC).

    Ideally, from a developer perspective, it’s far easier to send the notification immediately (from the server to the client). A normal RPC “callback” mechanism doesn’t usually provide those semantics, which is the reason I thought the server might want to RPC into the client dynamically. One solution might be to setup a special mode where the RPC remains active acter a callback and can handle multiple events.

    I still ask whether there would ever be a scenario where we want to RPC between two processes dynamically at run time?

  6. Hadrien June 5, 2011 / 9:15 pm

    “In fact, we totally agree, but maybe this sentence wasn’t well-written either. This time I can’t see why, maybe you could try to read it again and tell me what’s wrong.”
    “so that the kernel may easily parse them and translate them to other languages as needed”

    The way I read it is that the kernel is doing translations on behave of userspace applications written in multiple languages.

    Indeed. Edited out.

    I guess I’m a little unclear as to what the server can send back to the client. Can the server change the parameters as though they’re passed by reference?

    Of course, if the server sends to the client a pointer to a shared memory buffer, it can work just like a reference.

    “In my opinion, this should be implemented on a per-structure basis, and the kernel shouldn’t have to care about the structure hierarchy used by a program.”

    I agree in principal, but other RPCs do have to step in and either dictate a structure, or fundamentally understand a program’s existing structure in order to serialize it. For you the first approach is always possible but undesirable. The second approach is easier using reflective languages, but in C there’s no such thing.

    As you say, there are two options :
    A/I dictate that there should be no pointers within structures used as RPC calls parameters. It’s simple, but ugly for developers which have to serialize their structures into versions which are useable for RPC calls themselves.
    B/I ask the client and server to provide detailed information on the data structures which they use, of the data structures which these structures themselves use, etc, so that the kernel may parse a function calls with structures as if it was the parameters were solely integer data, pointers, and padding. This gives developer optimal freedom, but is significantly more complex to implement.

    Do you think that the benefits of the second solution would be worth the costs ?

    “Another way to do it, that I start to think might be better, is to have the client provide pointers to shareable data on its side, and have the kernel do the sharing job. This way, the validity of pointers is automatically ensured”

    I’m assuming your talking about making RPC copy client data into a new page at the server? The alternative (of sharing the existing page) would not have the appropriate granularity, since the server would see more data than it’s entitled to.

    Nope, I was talking about two ways of sharing an existing page. In the first case, the client and server explicitly care about the sharing, and in the second case, clients and servers just send pointers to shareable data and let the kernel do the sharing job.

    I must emphasize the “shareable data” notion here. That data has been allocated using a special allocator, so that it is alone in the pages of memory which it occupies and can be safely shared between the client and the server without extra information leaking in an unwanted way.

    I’ve thought about copying before, but I think that sharing is more powerful. It is faster as soon as the transferred data starts to become large (no copy is involved, so the memory bus is left alone), and it allows the recipient to communicate data to the sender by writing in the shared memory region (like pointers and references do in normal function calls).

    “it needs to take some extra precautions which standard malloc doesn’t need, e.g. only one allocated object per page… Client allocates, shares, sends the RPC call, then frees the shared data on its side since it doesn’t need it anymore.”

    Yes, that seems pretty neat, I’ve never seen it done that way. It’s very transpearent.
    However doesn’t it make some traditional structures, like linked lists, impractical due to large allocated object size?

    Not necessarily. My “allocated object” terminology was maybe inadequate : you can totally allocate several objects at once in the same shareable buffer, using an equivalent of the C calloc() and the C++ new[] if you want syntactical sugar to make things look nicer.

    I’m wondering if there’s any issue with a client which calls the server multiple times simultaniously? What happens when the client shares the same page over and over again? If one page is shared by multiple RPCs, the server cannot free the page until the last RPC finishes – is there a reference counter on the shared memory?

    Good point ! There’s none currently, but I can easily implement it. And it would result in better behaviour under repeated sharing than the current one, which is to create several pointers to the shared memory region in the server’s address space, wasting said address space and polluting the TLB’s cache.

    I’ll start working on that tonight, it should be pretty fast.

    EDIT : Done already !

    If a server tries to send a completion notification for an event which lacked an RPC from the client, what happens?

    I guess you could require that, in order to receive notifications from a server, then the client must always have a pending/idle RPC request in place to the server. However this could make it more difficult for the server to remember to notify the client of two quick phone calls in a row (before the client was able to re-establish the event listening RPC).

    Ideally, from a developer perspective, it’s far easier to send the notification immediately (from the server to the client). A normal RPC “callback” mechanism doesn’t usually provide those semantics, which is the reason I thought the server might want to RPC into the client dynamically. One solution might be to setup a special mode where the RPC remains active acter a callback and can handle multiple events.

    I still ask whether there would ever be a scenario where we want to RPC between two processes dynamically at run time?

    The way I currently would see callback implemented, they would simply take the form of the server doing a remote call on the client, using a function prototype that is agreed between the client and server in advance and implemented by the client. So the server can send an RPC call to the client through the “callback” function even though it is not currently processing a task from the client yet.

    If, for security reasons as an example, it must be ensured that only servers which the client has sent work to may call the callback function, then a mechanism can be introduced to enforce that. But this is not a fundamental limitation of the remote call model, and could be optional. At the fundamental level, the client simply remote-calls a client function.

    Would that answer your concerns ?

  7. Alfman June 5, 2011 / 10:39 pm

    “Do you think that the benefits of the second solution would be worth the costs ?”

    I don’t know.
    I prefer builtin RPC serialization, but only when language reflection is available.
    Without reflection attempting to do serialization at the RPC level is futile, since defining the recursive structure to the RPC library may be more complex than serializing the data ones self.

    “you can totally allocate several objects at once in the same shareable buffer, using an equivalent of the C calloc() and the C++ new[] if you want syntactical sugar to make things look nicer.”

    So your kernel will provide page allocation, and allow userspace to define a malloc/calloc on top of it?

    Is there any problem if pointers in one page allocation reference objects in another page allocation in terms of shared pages? What if one page is shared and the other is not, will the server end up in a page fault when it tries to follow the pointer?

    “So the server can send an RPC call to the client through the ‘callback’ function even though it is not currently processing a task from the client yet…. Would that answer your concerns ?”

    I’m not sure.
    Are you saying that the server’s notification would queue at the client until the client performs the RPC call?
    Or would the client receive the event even without a pending RPC?

    The first approach is probably managable, although it may violate causality: “here’s the response to your request, once you ask it”.

    The second approach makes me unconfortable because the callback is going to be called without the proper context of the initial request. Normally RPC occures in the context of a thread or some other async task. Without the request, this context will not be set up.

  8. Hadrien June 5, 2011 / 11:07 pm

    “Do you think that the benefits of the second solution would be worth the costs ?”

    I don’t know.
    I prefer builtin RPC serialization, but only when language reflection is available.
    Without reflection attempting to do serialization at the RPC level is futile, since defining the recursive structure to the RPC library may be more complex than serializing the data ones self.

    Then considering that most of the system will be written in C and C++, maybe the complexity is not worth it ?

    So your kernel will provide page allocation, and allow userspace to define a malloc/calloc on top of it?

    Actually, I currently have malloc-like functionality implemented in the kernel. But it is too rough to be directly accessible to user space (does not check parameters enough), so an user-mode interface would still be needed. That user-mode interface could easily implement a calloc call on top of a malloc implementation, considering that it’s as easy as multiplying two integers.

    Is there any problem if pointers in one page allocation reference objects in another page allocation in terms of shared pages? What if one page is shared and the other is not, will the server end up in a page fault when it tries to follow the pointer?

    Thinking about it, the situation is worse than that.

    I can’t think of a popular CPU architecture which supports data-relative pointer addressing (data pointer A points to some data which is N bytes ahead of/before A). At best they support IP/PC-relative addressing, like x86_64 or ARM. And that’s still not enough, since the relative position of the program’s PC and the shared data in memory is going to change when making the switch from client code to server code.

    If languages want support for such “fully relative” pointers, they have to emulate it (which amounts to working with serialized data at their core). That’s slow, so compiled programming languages that aim at performance don’t do it unless they have explicit support for it which is explicitly enabled by the developer (and I don’t think C has such a thing).

    This means in turn that if the shared memory is located at a different location in the caller’s and the callee’s address space, all pointers within the linked list will become invalid, no matter if they point within the shared buffer or not.

    Which in turn means that either linked lists must be manually serialized by the developer, or the kernel must gain intimate knowledge of every pointer within them in some way in order to do the address space translation and check that they do not point to “private” places within the server’s address space. There’s no way around it.

    “So the server can send an RPC call to the client through the ‘callback’ function even though it is not currently processing a task from the client yet…. Would that answer your concerns ?”

    I’m not sure.
    Are you saying that the server’s notification would queue at the client until the client performs the RPC call?
    Or would the client receive the event even without a pending RPC?

    The first approach is probably managable, although it may violate causality: “here’s the response to your request, once you ask it”.

    The second approach makes me unconfortable because the callback is going to be called without the proper context of the initial request. Normally RPC occures in the context of a thread or some other async task. Without the request, this context will not be set up.

    Okay, forget that I talked about callbacks then, and let’s go back to a clean slate. The server would have a phone call notification, and an remote call prototype associated with it. Clients which want to be informed of phone calls have to implement a remote call matching this prototype, and then contact the server to inform it that they want to receive the phone call notification through that remote call.

    When a phone call occurs, the server notifies the client, as instructed, through the remote call that has been provided.

    Is this dynamic RPC ?

  9. Alfman June 6, 2011 / 1:38 am

    “This means in turn that if the shared memory is located at a different location in the caller’s and the callee’s address space, all pointers within the linked list will become invalid, no matter if they point within the shared buffer or not.

    …There’s no way around it.”

    Well, that’s assuming each process has a separate address space.

    With relocatable code, there’s no reason a process *needs* to have it’s own address space; addresses are arbitrary and never hard coded in typical programs. Page level security is enforced all the same.

    I can think of limitations to using this approach though.
    1. On 32bit systems this will limited total ram to 4GB (many people don’t know this, but it’s possible for separate processes to exceed 4GB in total. For example, on 32bit x86 it’s possible to run 5 processes with 2GB a piece.) This probably won’t matter to you.

    2. Address space fragmentation affects all processes. Sequential page requests requests go against a system-wide free pool. If a client requests a 10MB buffer, then those pages must be linear across the system, not just within the application’s address space.

    3. Virtual memory will not be possible beyond 4GB.

    These problems are greatly diminished in 64 bit since there are so many more addresses available.
    Since your targeting devices which may have 512MB anyways, address space fragmentation may not be as big a deal.

    You could do a hybrid model. A trivial example is using the most significant bit in the pointer to indicate that memory is allocated against a global address space. And all lower addresses indicate a local process address space. This way all shared pages always come out of a global address space (and are therefor compatible between all processes). Each process would be guarantied 2GB of unfragmented local address space. This isn’t so different from how linux divides ram between userspace and kernel.

    In the hybrid design, it would only be valid to place pointers to global address space (bit 31=1) in shared memory. Data in local address space would not have this restriction.

    It’s a tough problem.

    “When a phone call occurs, the server notifies the client, as instructed, through the remote call that has been provided.

    Is this dynamic RPC ?”

    Yes, that’s what I had meant.

    Some RPC designs might have a problem with instantiating dynamic endpoints for RPC at runtime.
    To support “dynamic RPC”, not only must the endpoints be modifiable, but there should be support for multiple calls to different endpoints simultaneously. (This could be an issue for compile time RPC code generators).

    I don’t think there’s any technical reason you can’t do this, but I was wondering about explicit support for it.

  10. Hadrien June 6, 2011 / 10:06 am

    “This means in turn that if the shared memory is located at a different location in the caller’s and the callee’s address space, all pointers within the linked list will become invalid, no matter if they point within the shared buffer or not.

    …There’s no way around it.”

    Well, that’s assuming each process has a separate address space.

    With relocatable code, there’s no reason a process *needs* to have it’s own address space; addresses are arbitrary and never hard coded in typical programs. Page level security is enforced all the same.(…)

    Good point. With AMD64, x86 has finally come to support position-independent code without any OS-level hackery required, and with 64-bit addressing swapping has become a thing of the past except maybe for some obscure server use cases, so maybe all programs could share the same address space and only use paging for protection and isolation.

    Compiler and binary format support would be required, though. If I understand the ELF specification correctly (which is far from being granted), an ELF binary has an hard-coded entry point in the process’ virtual address space, and all pointers are defined with absolute addressing based on it at link time. Relocatable ELF code, like shared libraries, basically works by deferring the linking operation until the OS actually loads the library, at which time symbol tables are parsed and the actual value of pointers is defined.

    Maybe there’s an obscure ELF extension that allows working with position-independent code from the start without dynamic linking being required, though. I’d have to ask some ELF gurus about this if I want to go further down this path.

    But before that, another problem, which you mention, is that of memory fragmentation. This is more fundamental, and I think you underestimate it. Assuming that allocated buffers won’t be larger than a few MB is a strong assumption : the basic OS certainly doesn’t have to eat up more, but some software (like multimedia one) needs large buffers by its very nature. And it is its right to allocate all the memory it needs at once for optimal performance.

    Imagine, as an example, that someone does video editing on a machine with 1GB RAM. That was not unheard of in the past, even though people tend to use more powerful machines for this nowadays. That person has a few background tasks running (a web browser, an e-mail client, stuff like that), and total system consumption is around 150MB. Sadly, the system has been running for a long time and doing a wide variety of tasks, so a significant part of these 150MB is scattered all around the physical address space. The user doesn’t know about this, and even if he knew I’m not sure he would care.

    Suddenly, the video editing application wants to allocate a 3-second buffer of 800×600 24-bit 24fps raw video frames in order to ensure good performance when applying video effects in real time. That’s around 800MB of RAM. So it could theoretically fit in the available RAM. But as soon as our busy 150MB are scattered around RAM a bit, it will be impossible to find it in a contiguous way.

    You might argue that in this case, your hybrid model might be used, and the process could use its unfragmented local address space. But why wouldn’t the video editing application share this buffer ? The video effect might well be processed by an external software in a separate process. In the audio world, JACK and ReWire work like this, and such systems are a joy to use when they do work.

    So I’m not convinced yet that “address space fragmentation may not be as big a deal”, although I agree with your other points.

    Yes, that’s what I had meant.

    Some RPC designs might have a problem with instantiating dynamic endpoints for RPC at runtime.
    To support “dynamic RPC”, not only must the endpoints be modifiable, but there should be support for multiple calls to different endpoints simultaneously. (This could be an issue for compile time RPC code generators).

    I don’t think there’s any technical reason you can’t do this, but I was wondering about explicit support for it.

    The way I currently envision RPC, it would be fully implemented in software’s code, and “dynamically”.

    Let’s imagine, as an example, that the kernel internally keeps track of available RPC calls through a prototype and the PID which broadcasts it, and that we only allow one instance of each server to run at a time (because it is simply significantly simpler).

    When a client does an RPC call for the first time, the client library stub call asks the kernel “Hello kernel ! What is the PID associated with the server which I know by the name of X today ?”. The kernel sends back the PID, and the library stub call caches it for future use, then it asks the kernel “Hello again, may I use the server_PID.prototype remote call or a compatible equivalent ?”. The kernel checks if the call is indeed available on server side, sets up any internal structure needed for compatibility between the client and the server, and then sends the library stub a unique integer identifier which can be used as a short reference to the call later (parsing prototypes is slow, no need to do it several times). The client library stub caches that integer identifier for future use, and all it has to do to make the remote call now and afterwards is to provide the kernel with the integer identifier associated with the call and the parameters.

    When a remote call is made, the server is provided with at the very least the PID of the client, and for things which require a callback a unique identifier that differentiates different occurrences of a given remote call would be required too. When the server is done with its work, if it’s the first time it sends the callback to the client, it asks the kernel “Hello kernel, may I use the client_PID.callback_prototype remote call or a compatible equivalent ?”. The kernel then checks if the callback is indeed available on the client side, sets up any internal structure needed for compatibility between the server and the client, and then sends the server a unique integer identifier which can be used as a short reference to the callback later (parsing prototypes is slow, no need to do it several time). The server caches that integer identifier for future use, and all it has to do to send the callback now is to provide the kernel with the integer identifier associated with the call, the parameters, and the unique identifier associated with the transaction. So if the unique identifier is managed as a callback parameter, the situation is quite symmetric. And if it’s not, the kernel will probably manage it on its own anyway.

    One of my goal is to make the situation of the client and the server as symmetric as possible, so that a client can become a remote call recipient, exactly like a server.

    Now, let’s investigate the “phone call” scenario in this model.

    Through the remote call mechanism, the client states to the server that it wants to receive phone calls, and provides the function name of the remote call which it wants the server to use in order to notify it when a phone call arrives. The server then puts this function name within its own “phone call notification” prototype, and asks the kernel if it can provide an identifier associated with the client_PID.server_prototype remote call. The kernel then does compatibility checks between the client prototype and the server prototype, and if everything is good set up the remote call connection and sends an integer identifier associated to it to the server. When a phone call arrives, the server uses its integer identifier and function parameters to notify the client through a remote call, and since the role of the client and the server are symmetric it is just as if the server was a client, asking the client (which becomes a server) to perform the tasks associated with the handling of phone calls.

    Couldn’t it work ?

  11. Alfman June 6, 2011 / 11:34 am

    “Maybe there’s an obscure ELF extension that allows working with position-independent code from the start without dynamic linking being required, though. I’d have to ask some ELF gurus about this if I want to go further down this path.”

    Well, I thought elf was relocatable by default, but maybe it isn’t since a static binary doesn’t need to be.

    “So I’m not convinced yet that ‘address space fragmentation may not be as big a deal’, although I agree with your other points.”

    Your point is taken.

    I wrongly assumed you’d be targetting devices/apps which are ram limited before being address space limited.

    Never the less, I’d argue that allocating a 512MB buffer is bad practice since there is a high change of fragmentation problems within a process on 32bit platforms. I don’t think a huge multidimentional array would be that advantagious over an array of pointers to individual frames. Of course this is completely beside the point, your right fragmentation not pretty on 32bit, but then we’re back at the original problem of mapping pointers between processes. You could do like linux and require processes to use relative pointers in shared memory.

    “Couldn’t it work ?”

    The only thing, and it’s minor, is that a server may terminate and come back under a different PID.
    When servers are restarted, will the old clients be able to continue sending requests to the new server? As you described it, the client would have the old server’s PID instead of something more persistent. To the user (or even a dev), it might not be clear why RPC stopped working since both the client and server are running.

    Something a tad bit more reliable would be for the client to “getserverhandle()” and use that instead of PID. The server handle would be a structure maintained in the kernel pointing to the name of the service and PID of the server (and a reference count).

    This way, if the server dies and comes back, it will pick up the same serverhandle it had before. Idle clients wouldn’t need to know the server had gone down.

  12. Alfman June 6, 2011 / 11:38 am

    I guess there could still be problems with the server and client becoming unsynchronized in my last comment. I need to go sleep.

  13. Hadrien June 6, 2011 / 12:04 pm

    Your point is taken.

    I wrongly assumed you’d be targetting devices/apps which are ram limited before being address space limited.

    Hmmm… I think it’s me who hasn’t understood what you were trying to advocate.

    I thought that you wanted to enforce that all processes share a single address space by forgetting page translation altogether and directly working with the physical address space. But now, if you advocate that all processes should share a single virtual address space but keep using page translation for making contiguous virtual blocks out of noncontiguous physical blocks, that’s a different story. For the years to come, most computers will be able to address much more memory than they have RAM, so fragmentation is not going to come into play anytime soon. Except on 32-bit platforms like ARM, which could be problematic as soon as they start to have RAM amounts close to 4GB.

    Only nasty thing is that I’ll now have to play with CPUID to know what the limits of the address space are, while before I knew that individual processes’ address space necessarily had to be smaller than the available RAM, and thus within the limits of what the machine can support. But if I am to implement swapping at some point, I’ll have to do this sooner or later anyway.

    Bigger problem is, I don’t think that the following two concepts are compatible with each other :

    1/Relocatable code based on IP-relative pointer addressing.
    2/Shared pointers that work for all processes, based on having all processes share a common address space is compatible.

    The latter would require absolute pointer addressing within the common address space to work, if I’m not misunderstood. So all in all, we’d still end up using a different kind of pointers for sharing purposes than for regular code, and requiring some sort of pointer translation.

    Never the less, I’d argue that allocating a 512MB buffer is bad practice since there is a high change of fragmentation problems within a process on 32bit platforms. I don’t think a huge multidimentional array would be that advantagious over an array of pointers to individual frames. Of course this is completely beside the point, your right fragmentation not pretty on 32bit, but then we’re back at the original problem of mapping pointers between processes. You could do like linux and require processes to use relative pointers in shared memory.

    Development simplicity is the thing. If there’s contiguous physical RAM available, you get the full performance for it. If there isn’t, the program works as well as an array of pointers. The developer doesn’t have to care about the memory layout.

    Requiring processes to use relative pointers in shared memory is what I meant by “enforcing serialization”.

    The only thing, and it’s minor, is that a server may terminate and come back under a different PID.
    When servers are restarted, will the old clients be able to continue sending requests to the new server? As you described it, the client would have the old server’s PID instead of something more persistent. To the user (or even a dev), it might not be clear why RPC stopped working since both the client and server are running.

    As you said in an earlier comment, the client should receive a notification when the server dies anyway (as its previous state may not be fully brought back). The client-side library stub could use this notification to re-establish a new link with the server. Or alternatively, another solution, which you mention, is that the communication channel set up by the kernel and referenced by the “unique identifier” mentioned earlier can be conceived so that the same identifier can be used for a different communication channel (with the new server process) in the event where the server dies.

    All in all, tolerance to the death of servers may be a very good thing not only for reliability but also for upgradeability : it means that when a security update for a server (that keeps perfect API compatibility) has arrived, the server can tell the kernel to put upcoming events on hold, finish what it was doing, save its state, and restart, allowing for live updating of software without any reboot needed.

    I guess there could still be problems with the server and client becoming unsynchronized in my last comment. I need to go sleep.

    You don’t live in a GMT+1/+2 region like me, don’t you ? Around here, it’s midday :)

  14. Alfman June 7, 2011 / 3:51 am

    “As you said in an earlier comment, the client should receive a notification when the server dies anyway (as its previous state may not be fully brought back).”

    Yes, however I think we were referring to server failure during an RPC request, not a server failure between RPC requests. How would an idle client receive a notification that the server was reset (without polling)?

    On linux, the client knows when an IPC pipe goes down since read returns -1 and errno is set. However here the difference is that here your RPC requests are connectionless, and there’s no “connection” to break (this may be another flawed assumption on my part).

  15. Hadrien June 7, 2011 / 9:24 am

    Yes, however I think we were referring to server failure during an RPC request, not a server failure between RPC requests. How would an idle client receive a notification that the server was reset (without polling)?

    Good point. There’s no fully reliable and efficient (as opposed to regular server state saving) way to do this.

    On linux, the client knows when an IPC pipe goes down since read returns -1 and errno is set. However here the difference is that here your RPC requests are connectionless, and there’s no “connection” to break (this may be another flawed assumption on my part).

    There is such a thing as a connection. Remember the initial steps discussed previously. The client asks the kernel to access some service on the server, the kernel sets up some management structures, then returns a number that uniquely identifies these structures to the client. The client then uses this number to quickly perform RPC calls.

    If the kernel has support for server death notification, it can, by itself, send a notification to all processes which were connected to the server this way.

    Again, that’s some support to implement. To be added to the next version of this document, once we have discussed the details.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s