As mentioned in the last post of this series, two notions are ill-defined in the process abstraction design that has been presented so far. One is that of process properties, and the other is that of live updating. After giving the subject some thoughts, it appeared to me that those notions are important and have an influence on the interface design of kernel components, which is what we’re interested in right now. As such, this post is here to clear the confusion, and explain what changes.
What are process properties ?
The process abstraction jails each independent piece of software running on the system within a well-defined set of allowed and denied actions. It is implemented by a network of user-mode services and kernel components, called “insulators”, each having a limited, well-defined responsibility with respect to the jailing of processes. The action of insulators is coordinated by a central component, the process manager, which manages the basic lifecycle of a process (creation, destruction, PID lookup, updating) and broadcasts the orders associated to it to other insulators.
When a process is created or updated, the process manager is provided with the in-memory equivalent of a configuration file. This file specifies the non-default properties of a process with respect to each and every insulator that said process deals with. Like any configuration file, I believe it should be written in plain text for easy tinkering and processing. The process manager extracts from it the part that is relevant to each insulator and sends it as the second parameter of its add_process() or update_process() method, the first one being the process’ PID. This method must, as such, be perfectly standardized across all insulators, so that the process manager may always call it in the same way.
Standardization is incompatible with the use data structures as the second parameter of add_process() and update_process(), as is partially suggested by the previous posts in this series. Instead, what I believe must be done is that the process manager should just tear off the relevant part of the process properties and send it, without any further processing, to the relevant insulator, that will then parse it itself.
As an example, if process properties were written as a Python-like indentation-formatted file…
*** TOSP process properties format v1 *** PhyMemManager: maximum_memory: 20 MB VirMemManager: allowed_virtual_flags: R, RW, R can_map_manually: false
Then PhyMemManager would receive as its process properties the text
maximum_memory: 20 MB
and VirMemManager would receive as its process properties the text
allowed_virtual_flags: R, RW, R can_map_manually: false
It seems to me that as a standard communication channel for the process manager that provides insulators with enough freedom to use whatever configuration options they like, this is one of the most sensible options. Standard parsing tools can still be provided to insulator developers in order to ease the pain of dealing with configuration files and reduce risky code duplication across insulators.
In practice, this means that every add_process() method but the process manager’s would receive the new syntax PID add_process(PID, ProcessProperties), where ProcessProperties is just a fancy typedef for a Unicode string or a binary blob.
What is live updating ?
On today’s mainstream desktop and mobile operating systems, there are basically two schools of software updating. The Windows school, which requires software to be restarted for any kind of useful update to be applied; and the UNIX school, which may update files in the background, but still requires running programs to be manually restarted, along with everything which depends on them, for the changes to be effective. In the context of system services, restarting the updated components often involves restarting the whole OS, which is slow and unnecessarily disruptive to the user experience. Live updating is a low-level OS feature which aims at improving this situation.
The principle is that updates should go as follows : update files of the running process, then restart it in a way that does not have any notable effect on the user experience. Since live updating is mostly useful for system services, asking that binaries be specifically crafted towards this purpose in some circumstances is acceptable, although undesirable. The most important, however, is that the process should be entirely painless for the user : no random error, no lengthy lag, no major symptom overall that an important system service is being restarted. It should be as if the new version of the service had always been there and running.
How would it work in practice ?
- The system-wide software updater silently updates the service’s files, without shutting it down (this means that the service should preferably not access its files at run time, which is a good practice from a performance point of view anyway)
- It then starts the new binary of the service in “update mode”. The process manager is aware of the situation, and will make the new service process invisible to filename lookup until its initialization is over (to avoid confusion with the old running service). The updated service itself is aware of the situation, and will apply due care when acquiring mutually exclusive hardware resource access, typically sharing them with caution or asking the old service to give up on them through RPC. It will also find a way to synchronize any internal state information with the old service.
- Once the updated service is initialized and ready to replace the old one, it will call the update_process() method of the process manager. The goal of this method is to make the newly initialized system service take the place of its predecessor. The two processes will switch PIDs, the new process will be made visible by filename lookup while the old one becomes invisible, all remaining RPC connections from the old process will be transferred to the new process in a compatible fashion, etc. Basically, the old service will keep enough resources to complete its currently running jobs, but make sure that all new jobs are redirected to the new service.
- At the same time, the system will send the old service a request to terminate after its pending jobs are over, with a reasonable timeout to make sure that termination actually happens at some point should the old service hang. The mechanism that is used to do this should be akin to what happens when the computer is shut down. If it uses RPC, care should be taken that the associated RPC connections are not transferred to the new version of the service or deleted as part of the updating process. Flags on RPC connections could probably be used to this end.
In this scheme of live updating, update_process() would embrace a PID update_process(PID, PID) syntax, in which the first parameter is the PID of the old service, and the second parameter is the PID of the new service. It would return the altered PID of the old process (which is the former PID of the new service) as a result if successful, PID_INVALID otherwise. Also, live updating should be a reversible operation : in the event where one of the update_process() operations should fail, it should be possible to transfer back resources from the new process to the old process using update_process() in a fashion that is just as seamless as transfers in the other direction.
At this point, I am pretty satisfied with what I have done so far. This post should as such mark the end of the “process abstraction” series. For the upcoming weeks, I will likely be updating the former “process abstraction” articles, then beginning work on memory management refactoring and implementation. For now, I would be highly interested in comments on the two mechanisms described above, should you see a flaw in them or something like that : it is always easier to make changes early than late if something is wrong…