Are we using mass storage too much ?

So as I said before, I won’t be able to post as often. This also means, I assume, that I must make sure my post are of better quality than they used to be in the past few week. So this week, I’d like to discuss an issue which has been puzzling me for some months and which I think is often overlooked : what role should mass storage play in a personal computer OS ?

Introduction : why do LiveCDs suck ?

I think everyone reading this blog has probably used a linux LiveCD at some point of his life, and felt how painful it is. Booting a desktop operating system, alone, is generally a matter of minutes. And this is only the start. Every single time you try to do something, from opening a menu to connecting to a Wi-Fi network, it always take ages. Meanwhile, your live desktop will become totally unresponsive, and your sole distraction for a few seconds will be the horribly high pitched sound of a CD drive spinning at full speed. It generally takes a few minutes of using a liveCD before an average user starts to consider them the most retarded way ever invented to run an operating system and classifies them as something which is to be used either in case of an emergency (e.g for recovery) or as an annoyingly slow operating system installation medium.

Why is it so ? Most of us are quick to perceive that it has something to do with the relatively low speed of CD drives. But few people then try to go beyond that basic explanation, and ask themselves why, exactly, their operating system does need to access said CD drive so often. Myself, I think this is important.

Modern operating systems are, simply put, using mass storage media all the time. Right from the beginning, their kernel is loaded from one of them. During boot, they keep opening small config files, modules (for modular kernels), and executable files. Once the OS is up and running, each action of the user is likely to require more data to be fetched, be it because of config files which have not been loaded yet or (much more often) in order to load a program which the user wants to run. Some applications need to write data to a mass storage device in the background while they run. Even when the OS or a program is closed, mass storage is yet again needed, because some config files must be updated.

When running installed operating systems on modern desktop computers, it is uncommon to encounter speed problems because of this. That’s because modern desktop hard drives are quite fast. They are kept permanently on, spinning at full speed, and it only takes them a few milliseconds to access a file. Afterwards, data can be fetched to RAM with bit rates of 30 MB/s and beyond, as long as files are kept contiguous. What this all means is that if software is reasonably small, loading it is a nearly instant process. Those long “load times” which we may experience sometimes are often due to software itself, which is already loaded, fetching a huge lot of small files from the HDD and doing occasionally some number-crunching job based on it. The sole time at which we notice how heavily modern operating systems use mass storage is at boot time, because we have to wait for around a minute, having nothing else to do during this time.

CD drives have different technical characteristics which totally change this game.

First, due to the very particular, linked list-based filesystem which CDs use, looking for a file on a CD is a much more lengthy process than it is on a hard drive. It may take several tenths of second. That is, if the CD is spinning already. Because between the state where a CD drive is off, silently waiting for an order, and the state where it is spinning and ready to look for some data to fetch, several seconds may have elapsed. Those seconds are very precious as far as responsiveness is concerned, as anything which takes more than a few seconds on a computer is generally perceived as slow by the user. And we lose them very often, because CD drives stop to spin very quickly once they have nothing to do any longer. Maybe it’s in order to preserve the CD itself, maybe it’s in order to minimize noise, I don’t know, but I think nothing can be do about it.

So in short, if we want responsive, usable liveCDs, we have to somehow make sure that the operating system does not access the CD very often. Preferably, every part of the operating system should be loaded from the CD to RAM at boot time, and the CD shouldn’t ever be needed afterwards. This is not infeasible, as an example Puppy Linux works this way. On the other hand, it does require some extra work. If we want our operating system to run on a wide range of computers without wasting too much RAM, and if we want it to load in a reasonable time, we have to make sure we need as little data as possible. Our example, Puppy Linux, fits in 150MB (~20 seconds of load time) and used to need less.

At this point, we start to understand something : “live” desktop operating systems are the enemy of mass storage. They have to boot from it, and they need it if they have some config file/user data storage needs (à la Mandriva Flash) too. But the less they use mass storage devices the better. So if we want good liveCDs, we have to make the underlying operating systems less dependent on mass storage.

On the other hand, there’s something which you could easily answer me : “CD is so passé, is this still valid in the days of USB pen drives ?”.

Frankly, I don’t know. Last time I tried to boot a computer on a USB pen drive, it failed due to a buggy BIOS, but it was a few years ago and it’s well possible that modern BIOSs can now reliably boot from USB. I would have to test that in order to check if the experience is much improved over a liveCD.

However, a problem remains anyway : USB pen drives are much harder to use as an OS installation medium. First, because contrary to a CD-R, there’s generally already data on it, which has to be backed up. Second, because software which allows one to burn an ISO file on a CD without having to think for a second are legions, while similar software for liveUSBs are much less common.

And even assuming that we’re set in an hypothetical future where installing an OS on a liveUSB is as easy as it is on a liveCD/liveDVD, I think there would still be some reasons why OS developers would want to reduce their dependency on mass storage. Here they are…

Why would one would want to use mass storage less ?

Reliability

One of the OS-periment’s stated goal is to improve reliability of personal computers, so it’s a good idea to investigate what is the weakest part of a modern computer. As it turns out, the components of a computer which fail most frequently are (in no particular order) : screen backlight, RAM, fans, and mass storage. We can hardly do anything when the screen or the RAM is failing, because the operating system is generally not informed of it. Dying fans are also a sad fact of life, since when an OS becomes aware that components are overheating no matter how high fan speed is, it can generally do nothing but shut down the computer in order to prevent hardware damage.

But now, mass storage is some interesting matter. As a thought experiment, imagine that the SATA wire of your hard drive becomes unplugged while you’re using your computer. How much time do you think your operating system would last before becoming totally unusable, knowing that something as simple as opening a menu can make your HDD LED blink ? My take on the subject is, a few seconds at worst, a few minutes at best. Question is, why do we depend so much on a component which is known to last 5 to 10 years (for HDDs) facing current usage patterns ?

We also have to wonder if we can’t do something about this relatively low lifetime on the software side. Most hard drives can stop spinning when they’re not used, which reduces wear of mechanical parts. The flash memories behind USB pen drives and SSDs are mostly damaged by writes. As far as I know, CD-R and RW are among the sole storage media which can lose data in a few years when stored on a shelf (due to the decomposition of their organic components). So while current hard drives last 5-10 years, couldn’t they last at least twice as much, without a technological breakthrough in hardware but simply by using them more cautiously ?

Finally, another area of computer reliability is resistance to brutal losses of power. There is exactly one reason why computers may stop working or start to display strange behaviours due to brutal shutdown : data was being written on the disk during shutdown, and the write could not be completed. This is also the very reason why we have to go through this silly “safely remove” process with USB drives : without the user even knowing, the OS might be writing data there in the background, data that’s so vital than normally removing the drive by pulling the plug may result in causing various problems.

If we want computers to be more resistant to brutal storage media removal or failure, we must, again, use them less, in order to reduce the probability of device removal or power failure happening during a write.

Comfort and flexibility

While I’m mentioning the “safely remove”/”umount” procedure, I must also question its very existence : why do operating systems have to write things in our flash drives in our back, in a way that can make accidental removal harmful ?

Sure, there are some nice things which should run in the background like file indexation and thumbnail storage, but these are not critical services. We can live without those. If the software which does it is written well, such functions are failure-proof : if their execution is brutally stopped right in the middle of a write by drive removal, software can notice it next time the drive is plugged in, and simply remove the half-written file and start over. The most primitive way to do this is to write a file on disk *before* creating its filesystem entry : if drive removal occurs during the write (which is by far the most likely time if we create large files), the file will simply be absent from the filesystem and everything will be as if it never existed. Some data has been lost, but since it was only about performance optimization, it is totally okay. So I suspect that something else is happening in the background, something which we don’t really need and which could as well be removed.

Aside from getting rid of this “safely remove” annoyance and being able to unplug our external drives whenever we want without an afterthought, another good side of using mass storage less is the flexibility it gives for live operating systems. Currently, when one uses a liveCD, the CD drive is forcefully kept closed, and on an average computer with exactly one CD drive there’s no way you can access an audio CD, a game, a movie, a backup, or whatever else might be written on a CD or DVD. If the live operating system could fit in RAM, on the other hand, it might well be possible to remove the now-useless liveCD from the drive and put whatever else you want in it, without the live OS crashing or forbidding you to do that. Same goes for liveUSBs : you might want to use this USB socket, so how about simply unplugging the system’s USB stick and putting something else in there without any disaster happening ?

Speed

In the current context of putting a third dimension on every single game concept and aiming at creating the most beautiful game ever conceived, video games could hardly be made less heavy. Textures, detailed 3D meshes, videos, high-quality sound effects, and music all cost disk space, there’s no way around that. Better compression can help (early 3D games used to require 4 to 5 CDs for something which could fit on a single one nowadays), but it does not come out of the labs that often.

On the other hand, operating system software does not use much multimedia resources, it’s mostly code which does only few things so it does not have to be big. But it tends to be. In fact, it seems that as time passes, the lowest-level software is, the lowest the useful feature/Mo ratio becomes. Operating systems used to fit in less than 1 Mo, where now something as simple as a printer or scanner driver already weights hundreds of megabytes, without doing much more useful things that it used to do in the past. The textbook example of bloated low-level software being, of course, Windows Vista, which weights 13 GB for what’s all in all a basic OS package, including a few primitive programs which you generally replace by more capable ones as soon as you get a little experienced with it. I’d go as far as saying that from a functional point of view, Vista doesn’t have much more to offer in these 13GB to the average user than Puppy has to offer in its mere 150 MBs.

As much as these 13GB may not seem much in the days of 1 TB hard drives, it does start to sound much when computers have to boot and load part of this bloated mess while the user is kept waiting for ever. And also when every single simple operation starts to feel incredibly unresponsive due to the amount of bloat slowing it down. Since operating systems are so big that they can’t be fully loaded at boot time, they resort to loading components when they are needed, which hurts responsiveness. On Windows or Linux, it’s not uncommon to wait for a menu to be displayed simply because the contents are left on the HDD, which happens to be busy at that moment.

In Windows Vista, boot and shutdown performance was so bad that Microsoft had to cheat on their users by altering the behaviour of the power buttons so that the computer is put to sleep instead of being turned off. I won’t enumerate the reasons why this is an incredibly silly thing to do here, sufficient is to say that once your operating system takes so much time to boot that you basically need computers to be kept permanently on (albeit in a low-power state) and eating power so that it never has to boot, there’s a big problem.

Power consumption

Speaking of power consumption, using mass storage less frequently may also help there. On a modern computer, there a few components which cannot be turned off or put in a low-power state without turning the rest of the computer off, but RAM is one. No matter what we do, we’ll have to keep the RAM on, otherwise, simply put, we lose data. This means that fully loading an operating system in RAM during boot will not, in any way, increase the running system’s power consumption.

To the contrary, it could reduce it, and to a great extent. Why ? Because mass storage device are, like GPUs, secondary devices which we can fully fully turn off when we don’t need them. Coincidentally, they are also a major offender in terms of power consumption. On laptops, modern operating systems tend to turn off HDDs as soon as possible, as battery life can benefit from this as much as it benefits from reduced display brightness (screen backlight being, by far, the most power-hungry part of a laptop). Fully turning off HDD after boot and never turning it on again could as such well result in 30-45min more battery life on an ordinary large notebook lasting around 3-4h on battery, which is far from being a negligible benefit.

In short…

Well, my thoughts on the subject are still in their infancy after all this time, and I have yet to get them fully sorted out. But the more I think about it, the more I believe that current personal computers’ operating systems use mass storage more than they could, and that there are many benefits from not being so dependent on HDDs, SSDs, and such.

Mass storage media have become operating system’s not-so-secret bloat storage place. When your operating system is so heavy that it would massively hurt performance to keep it in RAM, you don’t question your engineering choices, but rather put the extra weight in the HDD and load it on demand. When your software uses too much RAM, you simply violently murder performance by creating “virtual RAM” (swap space) in the HDD. Since there’s so much space on HDDs nowadays, distributing a primitive set of applications like Windows on a DVD is not shocking anyone anymore. People only notice the damage done through indirect measurements, like install times, boot times, and runtime responsiveness.

And so, ideas with great potential like the liveCD totally fall short. Simply because they use a slow mass storage media.

I wonder if we should try to change this. But for it to work, we need something more fine-tuned to each specific user than the usual gigantic CD/DVD with gigabytes of compressed stuff inside. This might require more work… What do you think about it ?

Advertisements

One thought on “Are we using mass storage too much ?

  1. Corey Brenner February 4, 2011 / 8:15 am

    Hallelujah.

    I long for the day when I had DOS and PC GEOS running in a RAM disk because my hard disk died. It ran for weeks without interruption, while I wrote papers, cruised the local warez boards, etc. Luckily, I had upgraded to 8MB of RAM, because the porky OS and GUI took up a whopping 6MB of it. Also, luckily, the whole thing ran (WELL) in 640k of RAM.

    Those were the days. Everything since then has been an exercise in how disgustingly pig-bloated code can get.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s