Until some clever electronics engineers come up with a form of computer memory that is simultaneously fast, cheap, and nonvolatile, we will have RAM in our computers, and in fairly limited amounts. And like with any hardware resource of limited availability, this RAM will be filled at some point. In this article, I will discuss what should happen then.
Swap is not a solution
Some may argue that RAM shortage can be prevented by using swap, that is, using files on mass storage devices as RAM. This is, however, partly untrue and overall a very ugly solution to this problems, for the following reasons:
- Swap files also have a limited size so they can be filled up too, though admittedly more slowly
- Slow swap is assumed to be fast RAM by software, causing major system performance hits
- OSs often set many restrictions on what can be put in swap, reducing its usefulness
- Removing the drive on which the swap file is located will cause a near-instant major system crash
- Swapping causes massive strain and wear on mass storage devices as compared to normal OS usage
In short, I believe that swap is a kludge that can be used as a temporary solution to address a well-known lack of RAM on a machine, but shouldn’t be used as a long-term protection against RAM shortage. Other countermeasures should be used instead.
Separating “vital” RAM and “extra” RAM
More often than not, large programs do not strictly require all the RAM that they allocate. A part of it is actually necessary for their job, while another part is made of caches and similar stuff that is used to speed them up. Effectively, this can be handled by evaluating the amount of free RAM when the program is started, and then allocating a reasonable amount of cache memory, or even none at all if the situation is truly critical. However, this is not the proper way to do it.
The Linux disk cache shows an example of how such situations should be handled instead : as long as there’s free RAM, it keeps allocating into it to reduce the impact of disk accesses, but does it in such a fashion that if other software ends up allocating more RAM than is available, that “cached” ram can be promptly reclaimed by the kernel without issues. On the inside, it likely works as follows: each time the kernel filesystem driver wants to access a part of the disk cache, it first probes the associated memory region to see if it is still allocated. If not, it considers the associated cache line to be invalid, and reloads it from the disk to another cached RAM region.
Obviously, such a mechanism cannot work without explicit support from the caller application. However, nothing prevents the system library from featuring some sort of “cache memory” abstraction that hides away the low-level work and makes it easier to use for user-mode devs. The benefit would be to ensure that as long as applications are kept well-coded, RAM shortage would only happen when all allocated RAM chunks are strictly necessary to the well-being of the associated software, and not, as an example, when some web browser’s in-RAM web cache has grown anarchically beyond reasonable dimensions.
Preventing RAM shortages
Even with the aforementioned precautions, there would be some situations in which software would fill up the available RAM through perfectly legitimate operations. All it takes is a single application that is given enough of a heavy workload, massive amounts of independent applications running simultaneously, or an application that is subject to memory leaks that gradually eat up all of the available RAM as time passes. Before discussing what should happen when all RAM has been legitimately exhausted in such a way, one can wonder if and when such an event could be planned in advance, allowing the system to warn its user so that he takes action.
Sudden allocation of massive amounts of RAM can rarely be planned in advance. Clean failure, which will be discussed in the next section, sounds like the only option in such a case. But when RAM is filled up more slowly, and especially if this happens on the user’s request, one can envision setting a warning threshold of RAM exhaustion (which can be absolute, relative, or a mix of both), and display a warning notification when this threshold is reached. This threshold could either be an arbitrary number, or something more advanced based on statistical data about users’ and applications’ RAM usage.
Failing with and without causalities
There is a number of ways a lack of RAM can be managed when it happens. Two solutions tend to stand out, however : the one in which the allocating software manages the failure by itself and is brutally killed by the OS it if it doesn’t (as is implemented by most BSDs in the form of returning NULL to the memory allocation request), and the one in which the OS directly kills some memory hog in order to free up some RAM to answer the allocation request (as is implemented by Linux in the form of the “out of memory killer”).
I tend to be biased towards the first solution, since it is my opinion that software should never ignore the error return codes of system functions. It is also undoubtedly the best option when a user asks software to do an impossible task, like loading 64GB of data in RAM. However, not all processes are equal in the world of operating systems, and this creates situations in which rejecting a process’ memory allocation requests might be highly undesirable.
Consider the following scenario, as an example: as the machine is running close to its RAM limits, a low-level system service such as the kernel or a hardware driver suddenly ends up requiring more RAM to do its job. If this RAM request is not satisfied, it will result in some sort of driver failure (though not a full crash if that driver is well-coded), that may affect all running user-mode processes in some way. In such circumstances, wouldn’t it be better to sacrifice just one user-mode process so that the driver may continue to operate properly?
From this data, it appears that having the RAM allocation request fail is the best option in some situations, and that killing some process to free up RAM is the best option in other situations. Which is likely why both approaches have their defenders. The solution which I propose, and have in fact already implemented to some extent, is to allow malloc() to have both behaviours, switching from one to another through an optional “force” flag. If that force flag is disabled, malloc will fail and return NULL when there isn’t enough available RAM. If that force flag is enabled, it is guaranteed that malloc calls will either succeed or kill the caller, no matter how much process-killing it takes. Since this malloc flag puts a lot of nuisance power in the hand of the caller, its use will obviously be subjected to the possession of a special security permission that only trusted system services should have.
Deciding to kill processes to free up RAM is not enough, though. It is also of primary importance to kill the right process. On Linux, as an example, I can’t count the number of times when the OOM killer has decided to kill X and all GUI applications simply because I asked too much from VirtualBox, and this shouldn’t happen. The decision should probably be based on rock-solid heuristics, and since out of memory events are getting pretty rare these days as computers now sport multiple gigabytes of RAM, it might even be a good idea to interactively ask for user feedback before proceeding.
Here is my take on the basic building blocks of an optimal “out of memory killer” heuristic:
- The more RAM a process has allocated, the more likely it is that no additional murders will needed.
- The more third-party RPC clients a process has, the higher impact killing it is likely to have.
- The higher the scheduling priority of a process, the more likely its existence is likely to matter to users.
- Processes which have no unsaved data or “writing” file connexions are better candidates for killing than processes who do.
- The kernel always comes last on the list, since killing it amounts to halting the system with a panic screen.
- In any case, that killing spree stops once the malloc caller has been killed, and never starts if it is determined that killing “better” candidates won’t solve the problem.
You may notice that there is no direct notion of system services here, save for the kernel’s special treatment. That is because in a sandboxed OS design, root access does not exist anymore, and determining what is most likely to be a system process using lists of security privileges becomes extremely complex and error-prone. Which is why an RPC-based heuristic is used instead to determine more pragmatically how many processes may rely on the candidate to operate.
Another thing which I should point out is that some sensible killing heuristics may use fairly high-level concepts. As an example, the RPC one requires making sure that a malicious process cannot spawn lots of children and use them as RPC clients to reduce its likelihood of being killed. In a sandboxed design, ensuring with fair reliability that the connected processes use third-party code may be done by checking that they belong to a separate sandboxed application, since that limitation requires much more elaborate schemes for crackers to bypass (the malicious application must have received permission to call executables from other applications and the user must have installed several applications from the cracker).
A interesting limitation of this heuristic, however, is that it only works if RPC is mostly used for client-to-server communication, further strengthening the asymmetric role of something which I originally envisioned as a symmetric communication protocol.
And I think that’s it for out of memory issues. Just one more unrelated thing though: while working on the documentation of kernel components, it has occured to me that some of them might be ill-named, in sense that their names do not clearly reflect what they do. So while no third-party code relies on the names yet, I thought that it would be a good idea to change that. Here goes:
- “Physical memory manager” is quite a long name and does not clearly state that this component deals with the allocation of page-aligned chunks of RAM. Thus, I would spontaneously change the name to “RAM manager”.
- “Virtual memory manager” is again a long and misleading name for something which deals with the virtual address space of processes. Since I only support page-based address translation anyway, I might as well call it “Paging manager”.