My current investigation of the Ada programming language has led me to an interesting discovery: there is such a thing as programming languages with integrated InterProcess Communication (IPC) functionality. In the case of Ada, that functionality is based on a combination of Remote Procedure Calls (RPC) and distributed objects, and designed for network transparency. Apparently, Ada’s designers also reached the conclusion that message-based communication is the Assembly of IPC in that using it is both error-prone, frustrating, and time-consuming. This discovery led me to think a bit about what I’m trying to achieve with RPC, and whether I’m doing it the way I should.
The point of IPC in general
There are multiple situations in which independent software processes must communicate with each other, such as…
- …when tasks must be spread out across multiple computers due to hardware specialization (e.g. factory control program directing the work of multiple autonomous embedded systems)
- …when a task, implemented by a given process, gets input from and/or sends output to an unrelated process (e.g. any modern userspace program communicating with the system’s I/O facilities)
- …when the functionality of a large software system is broken up across multiple smaller processes, so as to shrink the purpose and liability of each process, increasing security and reducing crash impact (e.g. traditional UNIX programming, microkernel operating systems)
The family of technologies enabling controlled communication between processes in such settings is called “InterProcess Communication”, or IPC for short. Since they partially break process isolation, implementing them on the local scale requires some assistance from the operating system. The OS can, however, only provide a set of simple low-level communication primitives, and leave higher-level communication protocols to user-mode libraries.
The minimal functionality required for IPC is a way for processes to exchange data packets. Such a functionality can itself be provided through a simple “pipe” mechanism allowing applications to exchange streams of bytes, or through more advanced “message passing” mechanisms like D-BUS which allow data to be handled in a more structured way. Since constantly polling the input buffers for new data is inefficient, such data links are generally coupled with a signalling mechanism allowing the OS to notify processes when they receive data. And with these two mechanisms, one already has pretty much the whole POSIX IPC API covered, if we keep aside some optimizations for local IPC based on the existence of shared hardware between processes.
When RPC comes into play
Not everything is cleanly expressed in terms of data transfers between processes, however. If we invoke some of the examples provided above, the factory control program doesn’t really want to transmit data to the embedded systems, it wants to make them perform some action. Same for user space programs calling system software, which do not only want to send some data to the system I/O facilities, but also seek to have them perform some tasks with said data (such as printing text on a console). Message passing mechanisms alone cannot express such “call this function of that process” imperative intents, called Remote Procedure Calls or RPC, which essentially leaves two possible options:
- Implement a protocol for RPC across a message-passing link, and leave it up to the communicating processes to make sure that the data being sent and received is actually valid
- Directly integrate mechanisms for RPC into the operating system’s IPC primitives, alongside or in place of traditional message passing
The trade-off here is one of implementation simplicity of either the operating system or user-space programs. RPC is a relatively high level IPC primitives, and supporting it at the OS level requires a more complex kernel, which in turn can be a source of security flaws or reliability issues. However, if there is no standard OS solution for RPC, user-space programs have to implement it in an ad-hoc fashion, which means either dealing with uncanny “generic” library constructs, lots of computer-generated wrapper code, or tons of effort duplication as every communicating process wraps its own RPC mechanism in his own way.
If, on the other hand, if programming language themselves supported standard semantics for RPC over message passing channels, then this tradeoff might be avoided, at least for single-language projects. Which leads me to write a bit about Ada…
Language-provided RPC constructs: the Ada 95 way
When the 1995 revision of the Ada programming language standard was devised, one of its design goals was to ease the development of distributed systems. More precisely, a goal was to be able to start from a non-distributed program, make minor change to its modules and adjust compiler configuration so as to separate it in isolated “partitions” (similar in spirit to OS processes), and then transparently deploy those partitions on multiple machines, without losing core local development assets such as language constructs or type safety in the process.
For this scenario to work, code from a given Ada package had to be able to call functions of another Ada package, even when said packages are implemented in two different processes running on two different machines. Consequently, the Ada designers needed to support a form of RPC, and more specifically one that is not only able to work on a local machine but also over a network. Since networked machines in a distributed system may be linked together in a wide variety of way, the chosen RPC solution also had to be link-agnostic. Tasks such as locating the various components of a distributed system, establishing connections between them, and keeping these connections alive, should therefore not be part of the final standard but rather be left to third-party library implementations.
The end result of this effort is Annex E “Distributed Systems” of the Ada95 reference manual. People interested in the design process behind it, which I quickly alluded to above, can also refer to the appropriate section of the Ada95 rationale. Effectively, what this annex does is to define a set of compiler pragmas that are used to define the public interface to each partition of the distributed system, so that the compiler may automatically generate RPC stubs and other communication primitives and generate errors when unauthorized communication is attempted between partitions. It also specifies an interface to the aforementioned link layer, called the “Partition Communication Subsystem”, which the generated RPC stubs will call to perform the actual cross-partition communication, in a message-passing fashion. The actual link layer being used is implementation-dependent, a commonly used implementation being Adacore et al.’s PolyORB.
All in all, I think that what the Ada language designers achieved here is pretty neat. They have managed to abstract away much of the complexity of distributed system development, and to cleanly separate communication concerns (which are up to the Partition Communication Subsystem) from system design concerns (which is performed through the set of compiler pragmas specifying the remote interface to each partition). They also tackled more advanced issues in the way, such as dispatching calls (when a RPC function pointer is dereferenced or an abstract object method is invoked) or easy partition layout reconfiguration (packages may be reassigned from one partition to another without recompilation).
Their work, as designed, is surely specific to the realm of distributed system, in that the code to both endpoints of an RPC call must be available at compile time. But there is no theoretical reason why such a system could not be extended into a full RPC implementation which is also capable of dynamic binding to one or several foreign targets. In my opinion, what this does is to showcase the power of language-integrated RPC in single-language software project. To show this, I will now discuss how language-agnostic RPC mechanism work…
CORBA and other language-agnostic RPC mechanisms
Let’s first smash an open door by stating that programming languages can be very different from each other. There is, as an example, little in common between Assembly and Python, even though both languages can be translated to machine code and be executed on a computer. This raises the question of how a language-agnostic RPC system should be designed. As a first step, since different languages have different feature sets, one has to wonder how to handle features which are present in a programming language, but missing in another. From this issue, two extreme design approaches emerge:
- A “minimalist” design, which tries to follow a subset of all languages’ feature set so that no wrappers are required
- An “all-inclusive” design, which tries to follow a superset of all languages’ feature set and produces wrappers for missing features in each language
Of course, neither of these options is very practical in the real world. A truly universal minimalist design would be extremely limitating, since it would have to force the extremely restricted feature set of very low-level languages like Assembly upon developers of higher-level languages. Conversely, a true all-inclusive RPC design would completely handle the hassle of adapting to another languages’ construct, at the cost of a design that is so complex that it cannot effectively be implemented.
Consequently, most language-agnostic RPC designs end up somewhere in the middle, by defining an “sufficiently complete” abstract feature set of the RPC interface, providing wrappers for popular languages which do not implement this full feature set, and ignoring less popular languages altogether. Said feature set typically includes synchronous and/or asynchronous function calls, a type-safe parameter marshalling mechanism, and some object-orientation glitter such as a type system allowing for dispatching function calls.
After that comes the issue of type incompatibilities between languages. For a trivial example, arrays and string implementations vary tremendously from one language to another, such that the associated data structures cannot be safely passed around from one language to another, and have to be converted. To simplify this conversion, language-agnostic RPC libraries generally define their own type system, and a system for converting data to and from that system. Needless to say, this double conversion to an “language-agnostic type” is horribly inefficient, making these RPC mechanisms useless for any kind of high-performance task, even though single-language RPC on a local machine could theoretically work well with extremely few data conversions.
Reconsidering the TOSP approach to RPC
So far, I did not plan to address the issue of communication between programs written in different languages in TOSP’s RPC system. I provided an abstraction for type-safe single-language function calls across process boundaries, that allowed to a minimal level of compatibility between service version N and service version N+1, and that was about it. Obviously, looking back at it now, I feel that there is a need to go back to the design table and work more on this RPC abstraction.
What, exactly, do I want to achieve with RPC? Here are the main criteria which I see for RPC between client applications and system services and between multiple parts of a large client application:
- Programs which are written in the same programming language can transparently call functions of each other, with very good performance (no data conversion, as little syscalls as possible), across process boundaries
- Programs which are written in different supported programming languages can also communicate in a relatively seamless way, although reduced performance and some type conversion hurdles are acceptable
- I’m only interested in local RPC operation and want to take advantage of all the optimizations that come with it
- It should be easy to write RPC bindings for other languages, so that users of obscure languages don’t come yelling at me for favoring the small bunch of languages which I am comfortable with
- If a language already has an existing framework for RPC or communication with functions written in other language, there should be a way to leverage this functionality instead of reinventing the wheel
Asynchronous calls are very important, but can potentially be built using synchronous calls that return instantly and behave in a standard way (for errors, completion monitoring, results…).
With this in mind, here are some thoughts on how I should tweak the existing TOSP RPC design, or dramatically change it altogether:
- Provide message passing IPC primitives at the kernel level, so as to leverage the built in message-based RPC primitives of certain languages and RPC libraries
- Decide on a list of programming languages that will be officially supported by TOSP, for which RPC primitives are to be provided
- Precisely define the desired feature set of a TOSP RPC primitive for every supported language, discuss the pros and cons of having it as a kernel primitive at all
- Build wrappers for communication between languages which are not designed to communicate with each other. These libraries are to be used on the receiving end of an RPC call, i.e. the sender shouldn’t have to care which programming language the receiver is written in unless it explicitly wants to.
- Ponder using Ada for the bulk of the system codebase, as it provides better interfacing with other languages than C/++ and has built-in RPC primitives (among other niceties)