RPC viability testing plan

The design of the previously defined RPC model now seems to be bullet-proof enough, so only thing which remains to be seen is whether it could be fast enough for use down to the lowest kernel levels. This is why I’m doing some performance tests to check the viability of everything. These tests roughly evaluate the performance which the final RPC system would have, under improbably high workloads to compensate the roughness of the tests themselves (they are not done on the real thing, but on a simplified model of it).

If the system proves to perform badly under the improbably high workloads defined here, the “limit” workload under which it performs okay will be determined.

Here’s what I’ll be testing…

Server startup

Question to be answered

Would the overhead of 50 “heavy” servers starting up have a measurable impact on system boot times ?

Behavioral model

Each server broadcasts the remote calls which it provides to the kernel, in the form of a 30-entries array, each entry containing :

  • A pointer to the relevant function in the server’s address space
  • The function name and return type, as a string (50-character long in current implementation)
  • The number of parameters which the function takes (10 in current implementation)
  • A pointer to an array describing the function parameters, each entry of the array containing…
    • The parameter name (30-characters long in current implementation)
    • The length of the corresponding parameter, in bytes. All parameters are 64-bit (8 bytes) here.
    • A pointer to this parameter’s default value, or NULL if there’s none. 9 parameters will have default values, the first one won’t.

The kernel will have to copy all the information included in this array (and the various buffers it points to) within allocated memory in its own address space. The system call overhead is taken as that of two context switches and two lock acquisition/release cycles.

Client connection setup

Question to be answered

Would the overhead of 50 “heavy” clients starting up have a measurable impact on system boot times ?

Behavioral model

Let’s assume that in the worst case, each client would contact 10 different servers and would set up 10 connections with each. We’ll also assume that those clients all have a slightly outdated interface definition, and that they only know about the first 9 parameters of each server-side function call.

What will take place is hence…

  • 10 server PID lookups per client. We’ll assume that there are 100 running processes with 40-character names, and that the process manager uses array-like storage (the PID is the index)
  • 10 connections set up per server and client. Each connection setup implies sending to the kernel a prototype similar to the one above, with the overhead of a system call each time a prototype is sent, plus…
    • Having the kernel compare the prototypes with the server prototypes (20 prototypes) to check their mutual compatibility
    • Having the kernel notice that they mismatch and setup a “compatibility stack frame” to be appended to each stack frame set up by the client.

Each of these operations will have the extra overhead of a system call.

Actually, I think that this benchmark will fail. There’s no way so much can be done in one second or two. The interesting question is, what will the true client connection limit prove to be ? Or could multi-threaded boot lead to sufficient performance ?

RPC overhead

Question to be answered

Once the RPC connection is set up, is its run-time cost as low as I claim it to be ?

Behavioral model in threaded operation

In threaded operation, a new thread is spawned within the server each time the client makes a call. For these aggressive benchmarking purposes, I’ll neglect the performance gains that can potentially be achieved by keeping a pool of unused threads around, and will have the new thread be allocated each time instead. We’ll also investigate the case where the client is sharing two 1MB buffers with the servers as part of the call. This leaves a remote call model of…

  • Pushing client-side function parameters on client stack
  • System call overhead
  • Allocating 1MB buffer (emulates server thread stack + thread management structures)
  • Share anything that’s pointed by a pointer parameter, and make the pointer parameters point to the server’s version of the thing.
  • Move client parameters and compatibility stack frame on server stack
  • Free client buffers
  • Free the server stack

We’ll test the overhead of 1000 calls to begin with, then less or more if needed.

Behavioral model in asynchronous operation

In asynchronous operation, the same server thread is used over and over again, simply having its state reset each time. This allows for less allocation overhead, at the cost of more copying. Initially, additionally from the client buffers a 10MB buffer will be allocated within the server, as an “asynchronous queue”, and a single 1MB server stack will be allocated too. RPC overhead model becomes…

  • Pushing client-side function parameters on client stack
  • System call overhead
  • Share anything that’s pointed by a pointer parameter, and make the pointer parameters point to the server’s version of the thing.
  • Copying client parameters to the async queue
  • <Kernel waits for the async server to run, then…>
  • Context switch overhead
  • Move client parameters and compatibility stack frame on server stack
  • Free client buffers

Again, the test is carried out for 1000 calls initially, to be adjusted as needed…

Advertisements

2 thoughts on “RPC viability testing plan

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s