Desktop OSs and the power management dilemma

Wow, this post required a lot of polish time, considering that I started work on it on Tuesday evening. And I’m still not quite satisfied with it, as it still seems a bit disorganized and confusing to me when I re-read it, but I had to publish it someday. Obviously, reflexion on that subject is far from being over. This is really long-term thinking, as I plan to get a full kernel, a working and fully functional GUI, and some programming tools on my OS before I even consider diving in the mess that x86 power management through ACPI is. But I think that it’s worth taking a far look at this now, as that aspect of mobile computing seems often overlooked.

The last few theoretical posts were mostly about very general OS theory, with topics such as memory allocation, scheduling, and security being covered. Those apply to all desktop computers in the world, and even to all computers up to a point. But today, I would like to go a bit more platform-specific, and dedicate this post to all users of battery-powered computers, also called “mobile devices” by some, and begin with a first observation…

Power is scarcer than ever

You might rightfully find this observation paradoxical. After all, we live in a Li-ion world, and battery storage density has seen a five-fold increase in the last few years, right ? Modern CPUs and GPUs have a power consumption in the mW-W range, LCD screens have switched to the LED backlight and become a bit less of a power drain, OLED is coming in small devices… So what’s with this alarmist title ?

Well, look at the average cellphone. In 2000, the Nokia 3310, apart from being bullet-proof, lasted easily one week on battery. Nowadays, phones which last more than 3 days are becoming the exception rather than the norm. What exactly has happened ?

The will to make phones smaller and lighter has led to dramatically shrinking phones. Therefore, the effective increase in energy storage space is relatively small, from 3.2Wh in the 3310’s BMC-3 to 3.7Wh in the most powerful variant of its modern BL-4C cousin.
Meanwhile, phone displays have grown bigger, brighter, colorful, and detailed. Cameras and more complex phone apps have made their appearance. On-screen menus have grown much more complex as new features kept being added. To keep this mess smooth running, processors had to get faster, too.
This trend is accelerating : touchscreen smartphones, the high-end in today’s phone market and probably the low-end of tomorrow, waste power in every single area you can think of. Huge screens with capacitive sensors, 3G connections kept permanently on, GPU-rendered eye candy everywhere in the user interface, and all that running on top of a poor 5-6 Wh battery… It’s already a miracle that some of them manage to run for two days of average use.

In the laptop market, the situation is a bit less miserable, but it’s still not exactly brilliant. With the smallest battery, my notebook lasts for a bit more than 3 hours of light work if you’re cautious about things like keeping screen brightness low and wi-fi and bluetooth off. Five years ago, you got 2h30 with a similarly priced model targeted to the same audience and from the same manufacturer. At least it’s not going down, but there’s nothing mind-blowing here, especially considering how much hardware has evolved in terms of power management. So what’s the trick, you might ask ? Simple : the old laptop runs XP, the new one runs Windows 7. And which version of Windows you use does matter.

The role of software

Now, note that the purpose of this article is not to bash Microsoft in particular. There are other guys who do that better than me, especially since I switched to Windows 7 as my main OS. In my opinion, this is just a symptom of a wider problem, which affects the whole computer industry : carelessly wasting hardware resources.

Let’s start with a popular example of resource waste : have you ever wondered how MS-DOS could boot within a few seconds while current desktop operating system can sometimes need minutes while running on hardware that’s 1000 times faster ? Well, obviously recent operating systems do a lot more during boot. In fact, that’s the trick. At the risk of going extremist, I’d say that the boot sequences of desktop operating systems should be thought in the following way :

What is absolutely needed, technically speaking, to show a working login prompt or desktop ?
What does the user expect to have at hand the moment his desktop or command line prompt shows up ?
Anything else can wait.

Currently, the way I see it, OS manufacturers do it more like :

Wow, this feature idea is cool, let’s add it !
It’s so cool I’m sure the user will want to use it right away, let’s put it during system startup so that he gets it with no lag !
Oh my, my, why is my operating system so slow ? I’m going to do more caching and parallelize startup, conveniently ignoring the real issues. And after all, a 2 min boot time is good enough.

Now, this rant was about resource waste in general, and you might argue that it does not directly relate to battery life. So let’s invoke another textbook example of resource waste, with this time the wasted resource being electrical power : GPU-accelerated GUI rendering.

What led this abomination to rise is probably one of the following ways of thinking :

It would be really cool to have transparent menus and fading animations everywhere !
Software rendering alone can’t do it smoothly, but modern GPUs provide enough power for that.
Let’s do it !

or…

Damn, when I move a window it’s just too sluggish. I must make my OS snappier !
Optimizing code so that windows are not refreshed each time they move one pixel away is too hard and breaks compatibility with some awfully coded soft, so I’m going to use GPU power instead.

I bet that never, ever, did the “is it useful ?”, “do users prefer it over a better battery life ?”, “can’t we make a nice-looking and snappy GUI which doesn’t eat batteries for dinner ?” questions got seriously asked. And in a blink of the eye, here goes 20% of your cellphone’s or laptop’s battery life.

We could apply this minimalist philosophy everywhere with good results on all software bundled with the OS. Now, there’s a catch though : as an OS manufacturer, I can only keep control on that software. So I have to find somehow a way to deal with third-party software which carelessly wastes power, like say a daemon running permanently in the background in a while(1) loop, preventing the CPU from being halted and waiting for an interrupt when nothing is happening.

When third-party software comes into play

The slide issue

Non-cooperative software is one of the biggest challenge an OS may meet. You can publish as much documentation as you want, write hundreds of pages of recommendations and guidance for developers each year, but your OS is still entitled to perform reasonably well while running quick-and-dirty softs written by some underpaid scam or beginner which didn’t read the doc. Otherwise, the OS is guilty.

When doing power management, optimizations spread in two categories :

Manual optimization (software says “look, I don’t need much power, if I’m the only one running you can underclock the CPU”). It’s great when it happens, but we obviously can’t assume it’s there.
Automatic optimization (OS software decides, based on some assumptions, to forcefully slow down or turn off some hardware to save power).

In this case, we’re considering the latter option, which is obviously the most difficult, due to the difficulty to find good assumptions.

Here’s a simple and well-known example : we’ve all seen the following scene at least once, and probably many times. Someone is doing an oral presentation with electronic slides, and suddenly the video projector’s screen turns dark blue. The guy mumbles something, goes next to his laptop, and presses some key or touches the trackpad. The image comes back, and the guy continues its presentation.

Why does the display keep turning off ? Because most OS manufacturers assume that if the user has not been interacting with the computer for some times, he’s away and we can save power in a multitude of way, noticeably by turning off one of the biggest battery-eater : the screen. Obviously, this assumption is false in that case.

Not so long ago, it was also fairly common to see computer screens keeping turning off when the user was watching a movie. It still happens on some Linux distributions.

Now, one might say “hey, just keep the screen on when a full-screen application is running !”. It’s better, but still insufficient. First because some people don’t watch movies in full screen, either because they are lazy or because they don’t know how to do it. Second because there are cases where it may backfire : if the user leaves the computer on with a game running, but paused, the screen should be turned off.

We can imagine keeping the screen on when a media is playing or an application is often doing 2D/3D accelerated drawing. After all, we own our media and drawing frameworks, isn’t it ? It’s not as easy as that, though. As an example, Diablo II had an animated menu, so it wouldn’t stop drawing things when paused. And we still did not address the slide issue : when the computer is displaying a slide, it is idle in all ways you may imagine.

About the slide problem, we can claim that the computer’s screen should never be turned off when an external screen is plugged in. That’s a fair enough assumption : if an external screen is available, an electrical socket is near, and the user can plug his computer in if needed.

Then, what about Diablo II’s menu ? Is there any possible way we can easily differentiate a running game from a paused game with spinning pentagrams ? Well, that sounds difficult to do in a way that’s not game-specific, so let’s leave it there : it’s better to keep the screen on when it’s not needed than to keep turning it off when the user wants it to keep displaying things.

As you can see, making a proper, non-annoying automatic power management system is much more complicated than just saying “aw, if the mouse is still and the keyboard has not been pressed, the user is not there”. We should be very careful if we don’t want power management to become an annoyance which users want to disable or heavily weaken. And it only starts here.

Processing power and power management

The previous example was about screens, whose power management options are fairly trivial : either keep it on or turn it off. Some have attempted to introduce a more fine-grained power management model with a light sensor which detects ambient luminosity and adjusts screen brightness depending on it. It failed miserably, as it resulted in distracting screen brightness fluctuations for no obvious reasons. Others have tried to introduce a three-step screen on->screen dimmed->screen off model, and it has failed too : as it turns out, people consider a dimmed screen as just as annoying as a black/blue screen.

Now, consider a processor, CPU or GPU.

You can nearly turn off a CPU or GPU when you don’t need it. But people and OSes tend to need to keep at least one CPU running very often, so this isn’t very practical. In many cases, we need a more fine-grained power management model. CPUs provide it in the form of architecture-dependent features which allow to turn part of them off or underclock them, resulting in a CPU which is still running, but with lower performance and power consumption.

Now, the obvious question to ask is : when can we decide to do that ?

M. Obvious would say “well, I don’t mind if my computer is running slower as long as I can’t notice”. Good, we’ve got a scope statement. Now how do we do that exactly ?

We might base our decisions on preconceived ideas, like “games need full processing power” (including Free cell and Minesweeper) and “if something actually requires power, it will run in full screen” (yeah, that’s exactly how Adobe software works).
We might have some performance criteria for a number of tasks. If they are not met, we have processing power grow higher until they are. When these tasks are over, we lower the processing power to the earlier level.

Obviously, I’m an advocate of the last option. Because it’s not better does not mean that it will be simple, though.

In the beginning, it’s easy to find performance criteria. As an example, multimedia playback should not skip. Ever. If, despite our multimedia infrastructure running on top of real-time processes, video playback does end up skipping, it means that we don’t have enough processing power to decode the current stream and that it’s time to make our CPU a bit beefier.

When the display of an application is regularly updated (e.g. in a game), reaching 30 frames per second is a reasonable goal. A lot more is a waste, a lot less is a shame. So in that case too, a performance goal is easy to find.

Applying performance criteria on device drivers is a bit more complex, but for human interface devices the 30 fps rule does still apply. For a disk burner, the performance goal is not to fail at the task of burning the disk. For a printer driver, it should not take more than a few seconds to print an average page. And so on…

Until now, the sole possible issue was the inertia of the system. We would preferably not have to wait until music playback skips before the OS does something. This part will need more work before being ready for prime-time. In particular, we could imagine keeping around a “profile” of each application, so that such errors occur only once before the OS has memorized what it should do and does it well next time.

But now, before everything starts to sound a bit too easy, let me ask you one thing : I just wrote a physics simulation. It continuously does computations, and outputs results to a text file from time to time. It never directly interacts with the user. At which speed should the OS let it run ? (This also works with other CPU-bound tasks like applying a Gaussian blur in an image editor, rendering a complex 3D model, and computing decimals of your favorite irrational number.)

The “fair” answer to this problem is to complete the task at full speed, in the shortest possible time. What would be the point of buying powerful hardware otherwise ?

Now, let’s take a power-hungry background task, like doing automatic backups, updating software, indexing files in the user’s home folder, detecting new hardware, automatically converting the user’s music library from MP3 to Ogg Vorbis, or whatever else you might think of. From a technical point of view, these tasks are in no way different from the earlier ones. On the other hand, from the user’s point of view, they do not actually exist and thus should not lead to a major hit in battery life.

Things start to become more fun, isn’t it ?

We now have to differentiate user-started tasks which should run at full speed from background tasks which should have a minimal impact on power consumption. From a computational point of view, they are nearly identical. Well, I’d say that this is exactly the kind of madness we’re used to see in the world of scheduling…

Power-efficient scheduling

What is scheduling exactly ? It’s the core of modern multitasking. Scheduling is about deciding which process to run, on which CPU, when, and for how much time. Introduced because computers with a single CPU couldn’t realistically run several programs at the same time, it solved by quickly switching from one program to another. Just like cinema quickly switches from one still picture to another to fake motion.

But now that we have power management in mind, a new dimension arises : not only do we want to fake simultaneous program execution, we also want to save battery. The obvious way to save battery is to keep the computer idle instead of working. So shouldn’t we, in some case, make sure that background processes are just turned off, or run much less often with the CPU being idle in meantime ?

My idea is that after the “real-time” class of process priorities, whose task is to have processes run faster than any ordinary processes, we could introduce a “background” class. Those process are just like normal processes, with these differences that they are not taken into account as power-hungry tasks by the power management system, and that when there’s no other process around (i.e. the computer is “idle”), they only run intermittently instead of running all the time like they normally would. As an example, they could run for 1 second every three seconds. Since desktop computers spend only a few time actually doing user-specified work and most of their time waiting for IO and executing background tasks, I believe that there’s some room for an increased battery life there.

Now, of course, this does only apply to software which voluntarily wants to run as a “background” process as is. If someone has an idea about how to detect a background process vs a process which the user knows about and wants to go at full speed, I’m all ears…

To sum it up

Wow, this was a hellish complicated post, so I think that we need a little summary here.

As hardware keeps getting more powerful, the difference in power consumption between the hardware running at full speed and the hardware in maximal power management mode is increasing. Therefore, to keep satisfactory performance when running on battery, power management becomes more critical than ever.
Like any hardware resource, electrical power is easily wasted. Operating system manufacturers should care more about which features they add if they want to make the computer do less, and thus last longer on battery. In particular, they should approach features which tremendously increase power consumption, like GPU-accelerated UI rendering, with much more caution than they do now.
Managing the power consumption of third-party software is another major problem. We can’t assume that exterior software will tell the OS about its preferences in terms of power management, so we need to be more clever and check what’s happening ourselves in the light of some assumptions.
When designing such assumptions, we should challenge them with many use cases, otherwise problems will occur. If there’s a choice to make, favor design simplicity and slightest user annoyance over power savings.
CPUs and GPUs are particularly challenging in that their power management methods are more complicated than an on/off switch and that they can heavily impair software performance. To address this problem, we should look for relevant and easy-to-(in)validate performance criteria, when possible, to guide the task of the power management system. We should also keep “performance profiles” about each application, to help us in our task. Finally, it’s important to keep “performance profiles” about our applications, so that the power management mechanism can learn from its errors and not repeat the same mistake each time we open the same application
The performance criteria for computationally heavy user software (run at maximal possible speed) and background processes (don’t harm my battery !) are contradictory. Therefore, we should mark background processes with some flag which helps making power management decisions about them. An example, aside from not increasing CPU/GPU speed when they’re running, could be to have them only run in an intermittent fashion when nothing else is running.

That’s all for now. Thanks you for reading !

The OS|periment

Musings on personal computer operating systems