A word about processes and reactive scheduling
(Warning : having played rather extensively Ace Attorney : Phoenix Wright between courses on a friend’s DS recently, it’s possible that some elements from this excellent game pollute this post. Don’t pay attention.)
Well, as I told you last time, I’m currently busy re-writing some code that had been made needlessly complicated, so I’m afraid I have no fancy stuff to show today.
On the other hand, I’ve been thinking for some time about stuff that will be very soon a major concern : processes. Why I’ve already talked for some times about what processes are, what kind of resources they can be granted access to, and what kind of security model I’m thinking about, thinking about memory outage management and scheduling made me realize that processes can’t just be a PID with some security permissions and resources attached to it and a priority number. The following shows some examples.
So… What’s the problem ? Let’s see what happens in case of a memory outage on most Unices :
- If we can swap things out or discard cached memory and free enough memory in the way, then do so.
- Otherwise, kill the process which uses the most memory.
- Problem solved.
So, what happens if a large number of relatively small processes are spawned and manage to eat up a lot of system memory ? Are larger but legit system process killed ? The answer seems obvious to me : there must be a way to flag some processes as vital system processes that are not to be killed lightly, meaning that they cannot be chosen for such decisions where an arbitrary process must be killed. They are only to be killed when personally at fault. The criteria for applying this flag to a process would be that if this process goes down and is not brought back up, most or all of the basic system features stop working.
Obviously, with great power comes great
heat dissipation responsibility. Processes which are attributed this flag should be triple-checked for memory leaks and other suspicious behaviors, since they are harder to kill when faulty.
But it does not end here…
We’ve said “let’s kill the process which uses the most memory”. This is an obvious and rather effective defense against malicious software doing some variant of “while(1) malloc(sizeof(int));”. But there is a simple problem here : legit software may require huge amounts of ram, too.
Let’s suppose that our user wants to use some piece of Adobe software. Photoshop CS4 could easily get above 600 MB of RAM footprint for editing a single 3 MB JPEG from a digital camera. Playing some recent 3D game ? They start to recommend 3 GB of RAM. Or we can think about our user getting into video editing or 3D graphics creation, which are intrinsically computationally intensive tasks and use a lot of RAM even when the software you use does not belong to the bloatware family.
In short : if when we lack RAM we randomly point our finger on some piece of software which eats up most of the RAM, scream “bloat !”, and kill it, we may well kill the software which our user was currently using, losing all unsaved data in the way.
To the question “Can we discriminate malware eating up all RAM and legit software actually needing all that RAM ?”, the short answer is “no”.
So what do we do now ?
We can try the ostrich algorithm : refuse to allocate that memory, have the malloc return 0, and close our eyes hoping that it will work well. Supposing that we’re in the malware case, it’s not a good idea either. We can also blame the software manufacturer for requiring that much RAM without checking if it’s available first, but that would not be very fair since our goal as an OS manufacturer is that software should not have to care about such things.
So we indeed have no choice but to kill the memory-hogging process.
And this gets us back to an old discussion : there must be some way of “backing up” processes at regular intervals, so that not all active data is lost when they are killed. But this is a topic that should be raised later, once we have working processes at hand.
And now, a word about scheduling…
Most operating systems use the so-called round robin scheduling algorithm. The basis of this algorithm is fairly simple, although it can be made as complicated as needed
- Each process has an associated “priority level”.
- A process is only allowed to continuously run for a certain amount of time, related to its priority level (sample rule : a process of priority n runs for n milliseconds). After that, the operating system is awakened and switches to the next thread in his list.
- This way, over a large amount of time, each thread received more or less CPU time depending on its priority level.
- On a multi-threaded operating system, additional tweaks are introduced to ensure that each thread gets a relevant share of the available CPU time for the process.
- On a multi-user operating system, similar tweaks are used to ensure that a user with a lot of running processes does not steal all CPU time from users with a small number of running processes.
That’s more or less all that can be said about round robin scheduling. It’s quite simple, requires only a few CPU power, and works very well most of the time… However, it does have its issues.
Round robin guarantees that CPU time will be distributed fairly accross all running processes if we have an infinite amount of time available, but it does not offer any guarantee about how much time it effectively takes to do something. This is a major issue as soon as we have reactivity to mind. And as I’ve stated before, it’s one of my main goals that this OS should be extremely reactive.
Let’s suppose we want to drag and drop a window. When we do that, the window manager has a large amount of work to do, since it must constantly redraw the window on screen and what was behind it. It is critical that the window manager gets enough CPU time to do its task in sync and smoothly (say, at least 30 times per second), otherwise the user will perceive this as sluggish performance from the operating system as a whole. Problem is, we have a lot of background processes running, and it takes a lot of time before the scheduler has run all other processes and goes back to the window manager. So as high as its priority might be, the window manager does not run very often. So if scheduling remains perfectly fair, there’s no way we can get smooth window drag and drop.
Another example of reactivity problem is audio and video I/O : when recording, AV frames are regularly put in a buffer by the recording hardware. If the operating system is too slow at handling those incoming frames, part of them will be discarded, and information will be lost. During playback, it’s not much better : if the operating system is unable to provide the raw audio and video data to the output hardware at a regular rate, playback will skip, and the user will be angry too.
So what do we do to ensure maximal reactivity of tasks which need it ? We can, like Linux, introduce a special “real time” process priority. Threads from those processes run until they have all completed their task, without any switch to “normal” processes occuring : they basically own the CPU cores they run on for their whole lifetime. Problem : if such a process is frozen, either because of a bug or due to an intentional malfunction, the whole OS is basically frozen, just like Mac OS 9 in the good old days… Not a good idea as far as reliability is concerned.
This kind of problem is familiar to those who work on computer networks : sometimes, especially on a wireless network, packets fly away and never come back. Programs should never wait for an answer from the network during an infinite amount of time. And the answer to this problem is, of course, a timeout : each real time process should be granted a limited amount of time in order to complete its tasks. If a thread goes past that limit, actions are taken to ensure that the system does not remain locked any longer. As an example, the thread may be killed, or it may be “downgraded” to usual round-robin scheduling among the other threads.
Take our window management thread drawing its moving window, as an example, and suppose that we keep our previous goal of reaching at least 30 frames per second in this task. Our time limit for drawing a frame would then be a bit less than 1/30th of a second. If a drawing operation lasts any longer, it would be killed. The remaining time could be used at will by background processes. This way, reactivity is ensured first, but other background processes can still do their work in the background without starving from lack of CPU power.
Things start to shape up, but some details of this still have to be sorted out at this stage, though :
- If a thread from a real time process spawns another thread, that thread will be killed (or punished in another way) along with its parent thread once the time limit is reached. In meantime, that thread will be put on another CPU core, if available, or a variant of round robin will be used to switch back and forth between the various threads of the real time process.
- But the real problem is, of course, what happens when another process spawns a thread in a running real-time process (e.g. by giving it some new work to do before it has completed the previous one). If we decide to allow several threads with different deadlines to run inside a single real-time process, it’s pretty obvious that said process may permanently hog the CPU and freeze the whole OS. This is obviously not something we want to happen. So we have three options :
- Have the calling process wait until the real-time thread has completed its task. Not a good idea in practice : the only thing it does is have the real time process become more and more out of sync with incoming events. At some point, when we’re late, it’s time to drop some incoming events, just like video players skip frames in order to keep on sync.
- Kill the current thread of the real-time process and run the new thread instead. That would be difficult to implement in practice because making thread killing-proof is not a trivial task. Moreover, it still allows the real time process to permanently hog the CPU.
- Have the call fail. The process is busy and can’t answer the incoming request. It’s up to the caller to decide what to do next depending on the situation : either try again later, or forget the idea altogether and try again when there’s another new incoming event. This sounds like the most sensible option, I think.
- Choices 2 and 3 are more or less equivalent from the point of view of allowing only one thread at once in a real-time process : some incoming events are dropped (kind of like frameskip in multimedia), the only thing which changes is which events are drop and when they are dropped. But the third option is more interesting in that it allows to permanently stop a thread from hogging the CPU all the time, as I said previously. How is it so ? Simple : along with the maximal CPU time T which a thread may take to complete its job, a real time process could have an additional characteristic, a minimal “cooldown” time Tco. Between the moment where the last thread of that process stops and the moment where a new thread may be spawned, a time of at least Tco must have been spent.
With all this in place, we start to have a pretty solid mechanism to prevent real-time processes from hogging the CPU all the time. Let’s check :
- A thread from a real-time process, along with all thread it creates, has a limited lifetime.
- No other process may permit a real-time process to run all the time. Pauses in the activity of a real-time process are enforced.
A last issue remains, though : if there’s many real time processes running at once, or if malicious processes are given RT permissions, there is a chance that the following happens :
- Process A and B are running with RT priority, and process A has an active thread T1.
- T1 spawns a thread T2 in process B, then dies.
- T2 waits a bit, then spawns a thread in process A, then dies.
- Go back to 2.
The obvious fix which comes to mind is to be extend the reach of our previous decisions about spawned threads and have T2 die when T1 dies. But this means that T1 will have to run for at least as long as T2, which may be highly problematic. Suppose that process A manage mouses interrupt and process B is the window manager : how is the creator of process A, who plans the maximum running time of his threads, supposed to know how much time process B needs in order to do its job ?
Then maybe it could be a good idea to prevent T2 from spawning a thread in parent process A ? After all, legit software shouldn’t ever need to do this, and we can’t allow it either. But then there’s a problem with the following three-process problem :
- Process A, B, and C are running with RT priority, process A has an active thread T1 and process B has an active thread T2.
- T1 spawns a new thread T3 in process C, then dies.
- T2 spawns a new thread T1 in process A, then dies.
- T3 spawns a new thread T2 in process B, then dies.
- Go back to 2.
I think we’ve reached a dead end at this point. Except by keeping around the full genealogy of each thread and spending lots of computational time making complex scheduling decisions based on that genealogy, it seems that we can’t fully prevent malware which has been granted RT privileges from freezing all non-privileged processes of the system. So better stay with the relatively simple model which we initially had, which should be enough to prevent legit processes from locking the system, and remind the user to be careful when they grant a process with the right to run at RT priority…
No, just kidding, I have a last trick around.
Suppose we’re in the worst situation possible with the current system. Malware has been given RT priority by the user despite the warning, and has completely frozen all processes which do not have RT priority. We know what’s happening next : the system becomes totally unresponsive, the user waits for one second, two seconds, three seconds, and starts to panic. The only thing left is to brutally turn off the computer and lose all unsaved data.
I hereby declare that… there is still a flaw which we can exploit in this malware strategy !
Real time processes are made for simple tasks which have to be done very quickly. There is no reason why there should be more than a few RT processes running simultaneously, and even in this worst case everything should be done in less than a second. So if there are real time processes around consistently using all CPU for more than a few seconds, we can safely bet that there is malicious software around.
The exact time before the OS gets suspicious should be an adjustable setting, but I bet a good default would be 3 to 5 seconds. Once the OS has detected the problem, it’s time for it to solve it. A simple solution is to kill all running RT threads and report the incident to the user. There might be some lost interrupts, a bit of memory might leak, but at least non-RT processes will get to run properly for a moment.
There are ways to counter this strategy, of course. As an example, malware might have a non-RT process which spawns a thread in one of the malicious RT processes anytime it runs and thus hangs the system again. Striking back is difficult, but still possible. However, I think it’s time to stop here. Every attack from malware can be countered, but each workaround makes the OS more complicated at the core, which we really don’t want to happen (be it only because it makes it more easy to exploit). Moreover, when fighting suspicious behaviors, we make more and more assumptions about the way program works and thus introduce more and more constraints in the way of legit programs. This way has been taken by some computer companies in the last few years, but it is not the way we want to go.
So, I think that we’ll stop where we were some times ago : once we had stopped the most likely issues from legit processes, have definitely established that any future issue comes from the user deliberately putting the security of the system at risk, and can detect a locked up OS and take some basic actions, I think it’s safe to say that we’ve reached a good compromise. After all, we’ll only know what malware is up to once we’ve seen some on our OS, which is not likely to happen anytime soon.
So that’s all for now. Hope you found today’s rather theoretical post interesting, myself I just loved thinking about this and sharing my thoughts with you !
(Ace Attorney : Phoenix Wright sprites from http://www.doulifee.com/Storage/aceatt/. I won’t do it again, I promess !
Round robin description mostly from Andy Tanenbaum’s Modern Operating Systems)