Moving data around: on dataflow IPC semantics

As we’ve seen last week, a core security goal of a personal computer operating system is to ensure that programs only do what the computer users want. Unfortunately, not all software is equal on this front. Large, monolithic software with a broad feature set, such as modern web browsers, is for example hard to sandbox properly. In this blog, I will discuss an old software design trick from UNIX which can be used to reduce the need for such large software monoliths, and ways it can be made competitive with monolithic software from a performance point of view.

The joys and pains of dataflow programming

Have you ever programmed using UNIX script shells, Labview, VHDL, Verilog, Max/MSP, CSound, PureData, SPAD or GLSL? These domain-specific programming languages have a varied range of purposes, but what they all have in common is their intensive use of the dataflow programming paradigm.

Dataflow programming revolves around building complex data processing functionality out of simple black boxes which receive a number of data inputs, and produce a number of data outputs. This programming model has a number of nice features:

It can be easily understood by non-programmers
With stateless building blocks, it lends itself well to parallelization
It models the behavior of hardware very well
It encourages building complex data pipelines out of simple building blocks

Back in the days where UNIX was designed, its creators understood the power of the dataflow programming model, and put it at the center of their design. The original UNIX command line tools are a nearly textbook example of dataflow programming, where people can build complex commands out of a restricted set of primitive programs by connecting the input and output of programs to one another. Nearly because as in other general-purpose dataflow programming environments, the pipelining elements of the UNIX CLI were mixed with some sequential elements of imperative programming in order to achieve higher expressivity.

This use of dataflow programming in UNIX was probably chosen for CLI usage convenience, and unfortunately fell out of favor as UNIX software went in a GUI direction, in favor of the monolithic approach that everyone else used. But the programming metaphor turns out to have very significant benefits from a security point of view, which keeps it relevant even to this day. Chaining simple software together, each running in a separate, isolated process, means that each software only needs to perform a few tasks, and consequently can run with very low security privileges. This is a win from a security point of view, because it reduces the potential impact of buggy or malicious software by an order of magnitude.

The software engineering reasons why UNIX was built this way remain valid as well. Software which has a narrow scope usually has a better design and simpler implementation, which means that it is easier to develop and debug, and will ultimately be of higher quality than a monolithic equivalent given equal development effort.

However, there is a major caveat to dataflow programming with isolated processes. The data pipeline elements work by internally performing some operation on their input data and returning the result as output, and in some implementations, this requires explicitly copying data multiple times. Whereas as anyone who’s done graphics programming or high performance computing knows, copying large data structures around is the single most sure-fire way to create a performance bottleneck in a data pipeline, and must be avoided at all cost.

In the face of the low performance caused by such copying, developers will sometimes think that dataflow programming is intrinsically inefficient, and rewrite their program as an imperative monolith in order to achieve higher performance. However, that conclusion is actually incorrect. There is nothing about the paradigm of dataflow programming that intrinsically forces data to be copied around, and the secret to avoiding it lies in a better understanding of inter-process communication semantics.

Arguing semantics: copying vs sharing vs moving

When a process A wants to transmit data to a process B running on the same machine, it can do so in a myriad of ways, but all these implementations ultimately boil down to three abstract communication semantics:

Copying: Process A keeps access to its output and process B receives a private copy as input
Sharing: Process A and process B end up having simultaneously access to the same copy of the data
Moving (aka message passing): Process A loses access to its output data and process B gains access to it

As discussed before, on your typical personal computer where the memory bus is a performance bottleneck, copying is expensive, so we want to avoid it. But fortunately, there are very few software engineering scenarii where process A actually needs to keep a private copy of its output. In most cases, process A would actually want to discard or erase its output copy right after it is sent, so as to avoid memory leaks in long-running pipelines. This means that sharing and moving would work just as well, with much better performance.

In operating systems/hardware combinations where process isolation is based on paging, sharing has a very efficient implementation, provided that the relevant block of memory is page-aligned. All that the OS has to do is to map the pages corresponding to the output of process A into process B, after an optional acknowledgement from process B that it accepts them as input. So long as the memory architecture of the host computer is not pathological and overtly hostile to software development (a situation also known as NUMA), this will result in both processes having equally direct access to the shared memory block, at a very low performance cost.

Data sharing has its problems, however. Here is a quick summary of the main ones:

Processes need to agree on, and rigorously follow, a synchronization scheme for data writes. This is a hard programming problem, and the reason why pretty much every sane and experienced developer hates threads.
Any time process A writes anywhere into the shared memory block, it becomes untrusted from a security point of view, and process B needs to reiterate its input validation procedure.
As the data pipeline becomes longer, the complexity of managing the interactions between many unrelated processes sharing the same memory block grows extremely fast.

For this reason, many developers prefer to use data moving semantics for interprocess communication. Fortunately, there is a fairly straightforward way to implement data moving semantics on any hardware architecture which supports data sharing semantics. The key is to provide an atomic operation which unbinds the output block from process A, and binds it to process B, in a fashion that is transparent to every process involved.

Message passing semantics solve all the safety problems of data sharing semantics, because processes are never able to write on the transmitted memory block at the same time, A is not able to modify its output after sending it so data validation is only required once on B’s side, and there are never more than two processes involved in a communication operation no matter how long the data pipeline gets. It also has the benefit of being network-transparent, although for TOSP’s hardware target, this is usually not a concern. Consequently, whenever this semantic is good enough for the problem at hand, it should be preferred over both copying and sharing for interprocess communication.

However, it is important to remember that local message passing is implemented as a voluntarily limited form of data sharing, and that sometimes, it will not be able to solve some problems that data sharing adresses perfectly well.

For example, consider the scenario of a ring buffer where process A can continuously write up to the point where process B is reading, and process B can continuously read up to the point where process A is writing. This communication channel is well synchronized by design, and allows process B to operate on data as quickly as A is able to generate it. If we tried to implement such a communication channel using message passing, either process B would have to wait until process A is able to send a full linear buffer, which is bad for concurrency and output latency; or process A would have to send data blocks as very small chunks packed into rather large memory pages, which would use RAM inefficiently, and overuse the kernel sharing primitives to the point of potentially making them a performance bottleneck.

Conclusions

Dataflow programming is a very effective metaphor for building complex data processing pipelines out of simple software processes. The level of insight exhibited by the UNIX designers when they made it a central paradigm to their CLI interface cannot be understated, it really was a Very Good Idea.

Today, this paradigm is still relevant as a core operating system metaphor. Not only because it still allows for software to be easier to design, develop, and test, leading to higher-quality programs. Not only because it is a natural metaphor for many application domains, and lends itself well to parallel implementations. It also has massive security benefits, by drastically reducing the system resource access permissions that each process in a data pipeline needs in order to do its duty.

When implementing a dataflow programming environment, one needs to be mindful of performance considerations, as it is very easy to create a performance bottleneck by copying large data structures around in many parts of the pipeline. Whenever possible, one must ensure that data is processed in-place by each element of the pipeline. A way to achieve this goal is to pay close attention to the semantics of interprocess communication across the data pipeline.

Two interprocess communication semantics which can work pretty well are sharing and moving data. Sharing is natively supported by modern hardware, but hard to get right due to synchronization, validation, and scalability issues. Whereas moving can be trivially implemented on top of a sharing implementation, and avoids all the issues of naive sharing, but its granularity creates a performance compromize between pipeline latency and throughput, which restricts its range of applications.

My conclusion will be that in a modern operating system using the dataflow programming metaphor at its core, moving semantics should be favored in interprocess data transfers, as it keeps most of the benefits of sharing while being much easier to get right. But that as usual in OS development, hardware power should not be hidden, and sharing should remain available as an expert option for software developers who know what they are doing and cannot operate within the limitations of message passing semantics (cue appropriate scary documentation warning).

The OS|periment

Musings on personal computer operating systems