On power management
These days, computers tend to come with batteries included, in all senses of the expression. After all, why buy a bulky box that must stay at a given place if you can do the same work on a device that may be carried around? However, this move towards increasing mobility and the associated desire for increased battery life have consequences on OS design, and in what’s hopefully my last theoretical post before I can get my home computer back, I would like to discuss those a little bit.
Fundamentals of power management
In computer chips, there is a tradeoff between performance and power consumption, and power savings can frequently be achieved by reducing hardware functionality. This can be done in several ways, most common of which being to adjust processor clock rates and to turn off unused parts in a fashion that is as fine-grained as possible. In any case, nearly all computer hardware produced nowadays features a way to dramatically reduce its power consumption when it’s not in use.
However, before an OS can efficiently use such functionality, there are a few caveats that it must take care off. First, one has to define what hardware use consists of. Second, one has to decide when and how unused hardware should be put to sleep, considering that subsequently turning it back on will take a certain amount of time, introduce unwanted latencies, and that excessive power-cycling may damage hardware and reduce its useful lifetime.
Defining hardware usage
Sample problem: screen power management
To illustrate why defining hardware usage can be problematic, let’s discuss screen power management. Mainstream computer display technologies, such as the LCD and OLED family, use electrical power in a very inefficient way because they sacrifice power efficiency in favor of other design criteria such as cost, color fidelity, or fast refresh rates. As a consequence, the screen is almost always the single most power-hungry component of a personal computer by a significant margin. To reduce a screen’s power consumption, an operating system has access to two power management controls : adjusting display brightness, or turning the screen off altogether.
There is only so much that can be done by playing with brightness, even with the help of ambient light sensors (that may or may not work well), and turning unused screens off thus remains a highly desirable endeavour. But how does an OS define an “unused screen”? Any kind of user-facing software displays something on the screen, and that’s often the main way it communicates with its users. So if we simply defined “screen usage” as “displaying something on the screen”, well, a modern OS should never turn the screen off. And that’s bad for power consumption.
So instead, modern operating systems often try to use assumptions that lie somewhere between unproven and plain wrong. Like this one: “Users will always react to software output by sending input in return, so if no user input has occured for X minutes, we can safely assume that the user is not around and turn the screen off”. But everyone knows how that turns out in practice: having to periodically shake computer mice during any kind of passive computer use such as oral presentations and movie-watching is decidedly not such a good user experience.
So in the past few years, another approach has emerged: in situations where the OS is not able to accurately determine hardware usage status, the responsibility of power management should be partially offloaded to applications. That is, in essence, the core principle behind Android’s “wavelocks”: if an application needs a device to stay awake, it notifies the power management subsystem that it is using it, and said subsystem subsequently uses this information to take more accurate decisions.
Towards responsible power management decisions
Applications can abuse such power management capabilities, though, either due to malice or poor coding, and dramatically impact device battery life by doing so. This can, in turn, lead users to inaccurately complain about a device having a “small battery” or a given OS being “bad at managing power”. To address this problem, OSs can use two very different strategies: either limit the extent to which applications can manage power, or make the user aware of the existence of resource hogs on his system so that he can either notify the developer of a problem or get rid of the offending software altogether.
Experience shows that these two approaches are complementary, and that trying to solely use one of them is bound to failure. An OS which follows a purely authoritarian approach to power management will often have to restrict what third-party software can do to a ridiculously small extent, often preventing it from replacing bundled system software even when it is performing inadequately. Conversely, an OS which lets software do too much will often have a completely chaotic user experience, in which the device will exhibit highly inconsistent behaviours depending on what software is running on top of it.
This leads, in my opinion, to a call for a balanced solution: OSs should take great care to define a power management policy that is flexible enough that most software can use it without hassle, but sane enough that random third-party software cannot completely mess up the computer without explicit user consent. As I have already mentioned before in security discussions, the main criteria should be, in my opinion that harmless software shouldn’t require special user attention. That is because constant OS nagging is not only annoying, but also known to dramatically reduce the power of the warnings that are being displayed.
The issue of waiting times
After having qualitatively defined how power should be managed, it is important to also look into more quantitative discussions, such as “how much time should the OS wait before starting to save power” or “when software can smoothly act on a power management parameter, such as screen brightness or CPU clock rates, what should it do?”
The reason why there should be waiting times at all in a power management system is that power management decisions do not come for free. As an example, when a computer screen is turned off, a user is temporarily deprived from the information that’s coming from running software, and has to take appropriate action (such as shaking a mouse or pressing a “lock” button) before he can be aware of what’s happening again.
Besides such direct user experience costs, there are also technical costs associated with power management, which can also indirectly hurt users: the hardware can take some time to be brought back on, as is the case with LCD screens and spinning storage media, and during this time operations can be carried out with an unexpectedly high latency. Power cycling can, as mentioned previously, damage hardware, but ironically can also come with an extra cost in terms of power use. As an example, the power management operation itself can consume power (as is the case when spinning up or down a hard drive disk), or it can force software to take actions that indirectly consume power (cached mass storage data should be written to the disk before it is turned off, but frequent serialization is more costly than infrequent serialization).
In the end, it appears that the “technical cost” of power management should be measured through reproducible tests (such as profiling a typical disk workload and see how much power it consumes with various power management algorithms), carried out on multiple machines to check whether all hardware is comparable or benchmarking must be carried out on system boot, whereas “user costs” should be measured through the traditional methodology of testing the thing with real users or using proven UX metrics like latency. User-accessible power management settings should then be added if some statistically significant variability is observed in user’s preferences.
Managing “smooth” settings
Power management settings which can be smoothly played with, such as CPU clock rates, also represent a problem for OS designers, because it is difficult to figure out what is the best strategy when playing with them. How should one adapt the hardware performance to the software workload at hand? Here again, it seems to me that both technical and user experience constraints should come into play when taking this kind of decisions.
From a technical point of view, one should not forget that the relationship between performance (measured using an objective criteria such as CPU clock rates or drive spinning speed) and power consumption is often nonlinear. As an example, a CPU working at a reduced clock rate will generally consume more power per clock tick than when working at full speed, since there is a constant power draw associated with keeping the CPU on. One can deduce from this that the best strategy for managing CPU-bound workloads, technically speaking, could well be to keep CPU cores working at full speed as long as there is work to do, and switch them back to a minimal power consumption mode as soon as they are done.
From this, we infer a basic power management strategy that should be used on all hardware unless a more complex one is proven to be more effective: as long as computer hardware has work to do, it should process it at full speed so as to provide optimal performance and go back to sleep as soon as possible. Thus, intermediary power management modes between full performance and full sleep should only be used when it can be proven that extra performance won’t help the system to fall asleep more quickly, or when transitioning from a full power state to an asleep state after the workload is dealt with.
Now, why would we want a smooth transition in this latter case? First because as mentioned previously, transitions between various power management modes can be costly, and in a period of irregular hardware activity we may not want to pay the price of a full power cycle. Second, smooth transitions warn a user that power management functionality is being activated, and allow him to take action to avoid that if such an activation is unwanted (as discussed before, experience shows that power management heuristics can go awfully wrong sometimes). This latter consideration is why many modern operating systems smoothly decrease screen brightness before automatically turning the screen off, thus warning users that a screen blanking is imminent.
If we are to take the simplest case of a linear ramp from the highest hardware peformance state to the lowest power consumption state (more complex schemes would have to be justified by proven technical and user experience concerns), we have yet to define how long such a ramp should last and when it should start. For power management settings which have no visible effect as long as no operation is running, such as CPU clock rates, one could envision directly turning the aforementioned wait into a more gradual move from an active state to a sleeping state. Other settings, such as screen brightness, directly impair user experience when they are tweaked, and should thus rather be held at the maximal performance value for an extended period of time and only be gradually turned off a few seconds before the end of the aforementioned waiting time.
Until computer hardware manufacturers come up with batteries that have a near infinite storage capacity, or stop shrinking batteries anytime hardware chip power consumption decreases so as to create devices slim enough to slit people’s throat, power management will be a serious concern in OS design. The main challenges of a power management subsystem is to find out when hardware is not in use, which can be done with help from software provided that it remains sufficiently well controlled, and to take appropriate action in such an event so as to optimize user experience, power savings, and hardware longevity.
Since most hardware performs less efficiently in low-power modes, gradual transitions from an active state to a full power saving state should only be used to improve user experience and better handle periods of irregular activity. Furthermore, as could be expected, it appears that a good power management subsystem may only be built through extensive testing, including both technical benchmarking and user-centric tests. And that’s all for today, so thank you for reading!