Skip to content

A word about processes and reactive scheduling

November 13, 2010

(Warning : having played rather extensively Ace Attorney : Phoenix Wright between courses on a friend’s DS recently, it’s possible that some elements from this excellent game pollute this post. Don’t pay attention.)

Well, as I told you last time, I’m currently busy re-writing some code that had been made needlessly complicated, so I’m afraid I have no fancy stuff to show today.

On the other hand, I’ve been thinking for some time about stuff that will be very soon a major concern : processes. Why I’ve already talked for some times about what processes are, what kind of resources they can be granted access to, and what kind of security model I’m thinking about, thinking about memory outage management and scheduling made me realize that processes can’t just be a PID with some security permissions and resources attached to it and a priority number. The following shows some examples.

So… What’s the problem ? Let’s see what happens in case of a memory outage on most Unices :

  • If we can swap things out or discard cached memory and free enough memory in the way, then do so.
  • Otherwise, kill the process which uses the most memory.
  • Problem solved.

So, what happens if a large number of relatively small processes are spawned and manage to eat up a lot of system memory ? Are larger but legit system process killed ? The answer seems obvious to me : there must be a way to flag some processes as vital system processes that are not to be killed lightly, meaning that they cannot be chosen for such decisions where an arbitrary process must be killed. They are only to be killed when personally at fault. The criteria for applying this flag to a process would be that if this process goes down and is not brought back up, most or all of the basic system features stop working.

Obviously, with great power comes great heat dissipation responsibility. Processes which are attributed this flag should be triple-checked for memory leaks and other suspicious behaviors, since they are harder to kill when faulty.

But it does not end here…

We’ve said “let’s kill the process which uses the most memory”. This is an obvious and rather effective defense against malicious software doing some variant of “while(1) malloc(sizeof(int));”. But there is a simple problem here : legit software may require huge amounts of ram, too.

Let’s suppose that our user wants to use some piece of Adobe software. Photoshop CS4 could easily get above 600 MB of RAM footprint for editing a single 3 MB JPEG from a digital camera. Playing some recent 3D game ? They start to recommend 3 GB of RAM. Or we can think about our user getting into video editing or 3D graphics creation, which are intrinsically computationally intensive tasks and use a lot of RAM even when the software you use does not belong to the bloatware family.

In short : if when we lack RAM we randomly point our finger on some piece of software which eats up most of the RAM, scream “bloat !”, and kill it, we may well kill the software which our user was currently using, losing all unsaved data in the way.

To the question “Can we discriminate malware eating up all RAM and legit software actually needing all that RAM ?”, the short answer is “no”.

So what do we do now ?

We can try the ostrich algorithm : refuse to allocate that memory, have the malloc return 0, and close our eyes hoping that it will work well. Supposing that we’re in the malware case, it’s not a good idea either. We can also blame the software manufacturer for requiring that much RAM without checking if it’s available first, but that would not be very fair since our goal as an OS manufacturer is that software should not have to care about such things.

So we indeed have no choice but to kill the memory-hogging process.

And this gets us back to an old discussion : there must be some way of “backing up” processes at regular intervals, so that not all active data is lost when they are killed. But this is a topic that should be raised later, once we have working processes at hand.

And now, a word about scheduling…

Most operating systems use the so-called round robin scheduling algorithm. The basis of this algorithm is fairly simple, although it can be made as complicated as needed

  • Each process has an associated “priority level”.
  • A process is only allowed to continuously run for a certain amount of time, related to its priority level (sample rule : a process of priority n runs for n milliseconds). After that, the operating system is awakened and switches to the next thread in his list.
  • This way, over a large amount of time, each thread received more or less CPU time depending on its priority level.
  • On a multi-threaded operating system, additional tweaks are introduced to ensure that each thread gets a relevant share of the available CPU time for the process.
  • On a multi-user operating system, similar tweaks are used to ensure that a user with a lot of running processes does not steal all CPU time from users with a small number of running processes.

That’s more or less all that can be said about round robin scheduling. It’s quite simple, requires only a few CPU power, and works very well most of the time… However, it does have its issues.

Round robin guarantees that CPU time will be distributed fairly accross all running processes if we have an infinite amount of time available, but it does not offer any guarantee about how much time it effectively takes to do something. This is a major issue as soon as we have reactivity to mind. And as I’ve stated before, it’s one of my main goals that this OS should be extremely reactive.

Let’s suppose we want to drag and drop a window. When we do that, the window manager has a large amount of work to do, since it must constantly redraw the window on screen and what was behind it. It is critical that the window manager gets enough CPU time to do its task in sync and smoothly (say, at least 30 times per second), otherwise the user will perceive this as sluggish performance from the operating system as a whole. Problem is, we have a lot of background processes running, and it takes a lot of time before the scheduler has run all other processes and goes back to the window manager. So as high as its priority might be, the window manager does not run very often. So if scheduling remains perfectly fair, there’s no way we can get smooth window drag and drop.

Another example of reactivity problem is audio and video I/O : when recording, AV frames are regularly put in a buffer by the recording hardware. If the operating system is too slow at handling those incoming frames, part of them will be discarded, and information will be lost. During playback, it’s not much better : if the operating system is unable to provide the raw audio and video data to the output hardware at a regular rate, playback will skip, and the user will be angry too.

So what do we do to ensure maximal reactivity of tasks which need it ? We can, like Linux, introduce a special “real time” process priority. Threads from those processes run until they have all completed their task, without any switch to “normal” processes occuring : they basically own the CPU cores they run on for their whole lifetime. Problem : if such a process is frozen, either because of a bug or due to an intentional malfunction, the whole OS is basically frozen, just like Mac OS 9 in the good old days… Not a good idea as far as reliability is concerned.

This kind of problem is familiar to those who work on computer networks : sometimes, especially on a wireless network, packets fly away and never come back. Programs should never wait for an answer from the network during an infinite amount of time. And the answer to this problem is, of course, a timeout : each real time process should be granted a limited amount of time in order to complete its tasks. If a thread goes past that limit, actions are taken to ensure that the system does not remain locked any longer. As an example, the thread may be killed, or it may be “downgraded” to usual round-robin scheduling among the other threads.

Take our window management thread drawing its moving window, as an example, and suppose that we keep our previous goal of reaching at least 30 frames per second in this task. Our time limit for drawing a frame would then be a bit less than 1/30th of a second. If a drawing operation lasts any longer, it would be killed. The remaining time could be used at will by background processes. This way, reactivity is ensured first, but other background processes can still do their work in the background without starving from lack of CPU power.

Things start to shape up, but some details of this still have to be sorted out at this stage, though :

  • If a thread from a real time process spawns another thread, that thread will be killed (or punished in another way) along with its parent thread once the time limit is reached. In meantime, that thread will be put on another CPU core, if available, or a variant of round robin will be used to switch back and forth between the various threads of the real time process.
  • But the real problem is, of course, what happens when another process spawns a thread in a running real-time process (e.g. by giving it some new work to do before it has completed the previous one). If we decide to allow several threads with different deadlines to run inside a single real-time process, it’s pretty obvious that said process may permanently hog the CPU and freeze the whole OS. This is obviously not something we want to happen. So we have three options :
    1. Have the calling process wait until the real-time thread has completed its task. Not a good idea in practice : the only thing it does is have the real time process become more and more out of sync with incoming events. At some point, when we’re late, it’s time to drop some incoming events, just like video players skip frames in order to keep on sync.
    2. Kill the current thread of the real-time process and run the new thread instead. That would be difficult to implement in practice because making thread killing-proof is not a trivial task. Moreover, it still allows the real time process to permanently hog the CPU.
    3. Have the call fail. The process is busy and can’t answer the incoming request. It’s up to the caller to decide what to do next depending on the situation : either try again later, or forget the idea altogether and try again when there’s another new incoming event. This sounds like the most sensible option, I think.
  • Choices 2 and 3 are more or less equivalent from the point of view of allowing only one thread at once in a real-time process : some incoming events are dropped (kind of like frameskip in multimedia), the only thing which changes is which events are drop and when they are dropped. But the third option is more interesting in that it allows to permanently stop a thread from hogging the CPU all the time, as I said previously. How is it so ? Simple : along with the maximal CPU time T which a thread may take to complete its job, a real time process could have an additional characteristic, a minimal “cooldown” time Tco. Between the moment where the last thread of that process stops and the moment where a new thread may be spawned, a time of at least Tco must have been spent.

With all this in place, we start to have a pretty solid mechanism to prevent real-time processes from hogging the CPU all the time. Let’s check :

  • A thread from a real-time process, along with all thread it creates, has a limited lifetime.
  • No other process may permit a real-time process to run all the time. Pauses in the activity of a real-time process are enforced.

A last issue remains, though : if there’s many real time processes running at once, or if malicious processes are given RT permissions, there is a chance that the following happens :

  1. Process A and B are running with RT priority, and process A has an active thread T1.
  2. T1 spawns a thread T2 in process B, then dies.
  3. T2 waits a bit, then spawns a thread in process A, then dies.
  4. Go back to 2.

The obvious fix which comes to mind is to be extend the reach of our previous decisions about spawned threads and have T2 die when T1 dies. But this means that T1 will have to run for at least as long as T2, which may be highly problematic. Suppose that process A manage mouses interrupt and process B is the window manager : how is the creator of process A, who plans the maximum running time of his threads, supposed to know how much time process B needs in order to do its job ?

Then maybe it could be a good idea to prevent T2 from spawning a thread in parent process A ? After all, legit software shouldn’t ever need to do this, and we can’t allow it either. But then there’s a problem with the following three-process problem :

  1. Process A, B, and C are running with RT priority, process A has an active thread T1 and process B has an active thread T2.
  2. T1 spawns a new thread T3 in process C, then dies.
  3. T2 spawns a new thread T1 in process A, then dies.
  4. T3 spawns a new thread T2 in process B, then dies.
  5. Go back to 2.

I think we’ve reached a dead end at this point. Except by keeping around the full genealogy of each thread and spending lots of computational time making complex scheduling decisions based on that genealogy, it seems that we can’t fully prevent malware which has been granted RT privileges from freezing all non-privileged processes of the system. So better stay with the relatively simple model which we initially had, which should be enough to prevent legit processes from locking the system, and remind the user to be careful when they grant a process with the right to run at RT priority…

No, just kidding, I have a last trick around.

Suppose we’re in the worst situation possible with the current system. Malware has been given RT priority by the user despite the warning, and has completely frozen all processes which do not have RT priority. We know what’s happening next : the system becomes totally unresponsive, the user waits for one second, two seconds, three seconds, and starts to panic. The only thing left is to brutally turn off the computer and lose all unsaved data.

I hereby declare that… there is still a flaw which we can exploit in this malware strategy !

Real time processes are made for simple tasks which have to be done very quickly. There is no reason why there should be more than a few RT processes running simultaneously, and even in this worst case everything should be done in less than a second. So if there are real time processes around consistently using all CPU for more than a few seconds, we can safely bet that there is malicious software around.

The exact time before the OS gets suspicious should be an adjustable setting, but I bet a good default would be 3 to 5 seconds. Once the OS has detected the problem, it’s time for it to solve it. A simple solution is to kill all running RT threads and report the incident to the user. There might be some lost interrupts, a bit of memory might leak, but at least non-RT processes will get to run properly for a moment.

There are ways to counter this strategy, of course. As an example, malware might have a non-RT process which spawns a thread in one of the malicious RT processes anytime it runs and thus hangs the system again. Striking back is difficult, but still possible. However, I think it’s time to stop here. Every attack from malware can be countered, but each workaround makes the OS more complicated at the core, which we really don’t want to happen (be it only because it makes it more easy to exploit). Moreover, when fighting suspicious behaviors, we make more and more assumptions about the way program works and thus introduce more and more constraints in the way of legit programs. This way has been taken by some computer companies in the last few years, but it is not the way we want to go.

So, I think that we’ll stop where we were some times ago : once we had stopped the most likely issues from legit processes, have definitely established that any future issue comes from the user deliberately putting the security of the system at risk, and can detect a locked up OS and take some basic actions, I think it’s safe to say that we’ve reached a good compromise. After all, we’ll only know what malware is up to once we’ve seen some on our OS, which is not likely to happen anytime soon.

So that’s all for now. Hope you found today’s rather theoretical post interesting, myself I just loved thinking about this and sharing my thoughts with you !

(Ace Attorney : Phoenix Wright sprites from http://www.doulifee.com/Storage/aceatt/. I won’t do it again, I promess !
Round robin description mostly from Andy Tanenbaum’s Modern Operating Systems)

About these ads

From → OS development

3 Comments
  1. I still have System design 1 through 5 to read but I think this is your first or second best post to date.

    I too have been thinking about the reactivity feature, especially because I use Vista (yeah, nobody’s life is perfect), and also because Opera hangs a lot on network operations. Studying how SWT (a Java-specific toolkit – like Gtk or Qt) works has strengthened the idea in me that having a main thread devoted to UI interaction and painting and auxiliary threads for all others tasks in an application is probably the way to go. I will experiment on that by having the main thread boost its priority in response to user events and fall back to “normal” after responding. Had Vista been properly coded, designed, or whatever in the first place, there’s no reason why the Calculator window can’t be refreshed more than 30 times a second. Just like the mouse pointer (although I understand the mouse is tied to an interrupt) very rarely suffers from lags, even in Vista. Boosting priority may be an answer implemented not by the OS but by applications.

    And as a user, I don’t really care if the video encoding that’s running in the background at “below normal” priority takes milliseconds or seconds longer to complete. There is a limited number of instances when I launch realtime processes as a user: music player, sound capture, VoIP calls, or video player, in which case all I care about really is the video and audio not skipping. In fact, the whole of my realtime experience has something to do with sound. At this moment, I can’t think of a situation when I’ve had two (or more) realtime processes running side by side. Is that even common place in the desktop space?

    On the processes side of your post, why is it that killing a process would lead o memory leakage? Doesn’t the OS keep track of thread inheritance and memory allocations by each thread?

  2. I still have System design 1 through 5 to read but I think this is your first or second best post to date.

    Well, it is arguably one of those which took the longest time to write too ;) Writing this alone took me a full afternoon, but I just wanted to show a bit more seriously than before that this project is still alive.

    I too have been thinking about the reactivity feature, especially because I use Vista (yeah, nobody’s life is perfect), and also because Opera hangs a lot on network operations. Studying how SWT (a Java-specific toolkit – like Gtk or Qt) works has strengthened the idea in me that having a main thread devoted to UI interaction and painting and auxiliary threads for all others tasks in an application is probably the way to go.

    Well, for “usual” applications (i.e. those who can and do use the standard UI widget toolkit), I envision having a real time system process solely dedicated to GUI rendering and basic interaction management. Let’s call our application “appli” and that system process “guirender” for the purposes of the explanation.

    Suppose the mouse is moved. Mouse driver is awakened, translates the mangled babbling from the hardware to an hardware-independent pointing event, and then wakes guirender and tells it about this event.
    Guirender takes care of the basic consequences of the mouse movement : if buttons are hovered, it highlights them according to the GUI theme in use. If text content in a Memo is scrolled, it updates scrollbar position and the visual aspect of the memo to reflect this. And so on : all basic functions of the standard UI widget are handled this way. The application using standard widgets has nothing to do, in fact it’s sleeping during all that time.
    Now, suppose that a button is clicked. Guirender takes care of the visual aspect of the button during the click, but the rest is up to the application : guirender wakes it up and spawns a pop-up thread to take care of the OnClick event, then goes back to sleep.

    This way of doing things is interesting in that it enforces perfect reactivity of the UI under all circumstances, without inconsiderately giving a high priority to untrusted processes. On other desktop OS, it’s not uncommon to see the UI of an application become totally unresponsive when the system is under load (Windows and KDE in particular used to be good at this last time I checked). But with this architecture, the standard widgets are guaranteed to keep perfectly smooth performance.

    I will experiment on that by having the main thread boost its priority in response to user events and fall back to “normal” after responding.

    This can help, but sadly it does not replace a good scheduling policy. If you boost thread priority, the result which you’ll get is that each time the process associated to your application runs, you maximize the chances that the thread responsible of rendering the UI and handling user interactions will be run by the scheduler. But let’s be a monster like me and simultaneously run 3 raw physics simulations in the background without adjusting their process priority. This means that your application’s process gets 1/4 of the CPU time. So even if you set the priority of the thread reacting to user events so high that it’s basically the only thread which ever runs, it will still only get 1/4 of the CPU time. If that amount of CPU time is not enough, the UI will still feel sluggish (like my KDE desktop was the day I decided to try that).

    Had Vista been properly coded, designed, or whatever in the first place, there’s no reason why the Calculator window can’t be refreshed more than 30 times a second. Just like the mouse pointer (although I understand the mouse is tied to an interrupt) very rarely suffers from lags, even in Vista. Boosting priority may be an answer implemented not by the OS but by applications.

    As previously said, boosting the priority of threads does not help much when there’s a large number of (legit or malicious) cpu-bound processes running and when round robin scheduling is used.
    Boosting the priority of process may help more, but I’m not sure that a process can boost its own priority (sounds strange from a security point of view) and it’s still a brute force solution which does not scale.

    Example : Suppose we use my simple scheduling model of “1 priority level = 1 quantum of time during which the process runs when called” If you have a process A of priority 4 and four processes B1, B2, B3, B4 of priority 1 in the background, A gets half of CPU time. Now, if B1, B2, B3, and B4 fork and give birth to C1, C2, C3, and C4 (also of priority 1), A now only gets a third of CPU time, which might be insufficient… See my problem ?

    That’s why, in my opinion, real time processes are a better way of ensuring reactivity of the system in situations where this is required : they scale better. I may well have 20 processes hogging my CPU in the background, if my UI is managed by real time processes, it will remain perfectly smooth, since all those processes are simply stopped while interaction with the user is managed.

    And as a user, I don’t really care if the video encoding that’s running in the background at “below normal” priority takes milliseconds or seconds longer to complete. There is a limited number of instances when I launch realtime processes as a user: music player, sound capture, VoIP calls, or video player, in which case all I care about really is the video and audio not skipping. In fact, the whole of my realtime experience has something to do with sound. At this moment, I can’t think of a situation when I’ve had two (or more) realtime processes running side by side. Is that even common place in the desktop space?

    Well, I can think of some circumstances where there could be several realtime processes running side by side in my OS, though indeed it’s not the common case. These are more or less linked to the fact that I see device drivers as a perfect use case for real time processes.
    As an example, getting smooth video playback requires good performance from the disk driver (which fetches frames), the video decoder (which decodes them and sends them to the audio/video output), and the AV output drivers (which update screen and keep audio buffers full and in sync). Suppose that we see smooth video playback as a critical task and put RT priority on all those processes. While each of those only work intermittently (after all, it’s in the very nature of RT processes), chances are that they may work simultaneously at one time or the other.

    On the processes side of your post, why is it that killing a process would lead o memory leakage? Doesn’t the OS keep track of thread inheritance and memory allocations by each thread?

    We can keep track of that, sure, but that does not mean that after a thread is killed we can immediately free all memory which it has allocated. That’s because threads from the same process are supposed to share a common address space, so the following scenario is legit :
    -Thread 1 allocates memory and puts some data in it.
    -Thread 1 stores a pointer to the allocated memory in a global variable.
    -Some times later, thread 1 is killed by the operating system.
    -A bit later, thread 2 is spawned and tries to access the allocated memory using the pointer found in the global variable.

Trackbacks & Pingbacks

  1. Desktop OSs and the power management dilemma « The OS-periment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.