On filesystem and software management

This post can be considered a follow-up to my previous post on file management. It goes into more details towards describing what I have in mind as far as filesystem organization and everyday file and software management is concerned. If you are interested in what happens the first time software is run or where temporary files should be stored, please read on !

First, let’s put a quick reminder of some things which were said last time and are of particular interest right now. I suggest a standard directory structure which follows the following pattern :

  • /Applications : This folder contains sandboxed and independent applications, stored in OSX-like “bundles” (folders acting like files for non-knowledgeable entities) that group together code, data, and configuration, and sorted in well thought-out categories.
  • /Storage : This is where storage media (HDD partitions, ramdisks, external storage…) are mounted as a default. These can be accessed through both device name and unique UUID-like identifiers through some symlink magic.
  • /System : This is where one finds system files, which are bundled with the OS and necessary for the everyday operation of both said OS and third-party applications.
  • /Users : This is where one finds users’ private folders, plus a “Global” folder for easy inter-user communication. Unlike on other OSs, the pollution of this directory is minimal : Desktop and Trash folders, maybe one hidden folder for per-user system service configuration, that should be about it. The rest is user-created files.

From this point, let’s talk in more details about applications. As said above, they should be sandboxed from the outside world, in sense that they should live in an isolated process environment that doesn’t allow them to interact with other applications, raw hardware peripherals, or user files without asking for permission first through an OS-standard mechanism. This idea stems from the observation that while legacy OSs were mainly designed to prevent users from doing harm to each other and to the OS install, the biggest evil on modern PCs is malicious and buggy software, not users. Hiding this fact behind the outdated and fuzzy notion of “admin privileges” will not help, and what’s urgently needed is a limitation of the privileges which applications can get without a clear request to the user.

In the specific case of file management, sandboxing resolves around trapping a program in a set of private files and folders and forbidding it to take a look at the filesystem beyond that without explicit user permission (through privilege elevation, passing of file location as a command line parameter, or standard file open/save dialogs). The easiest way to achieve this result in practice is to physically put all of a program’s file in a single application folder. This is part of what makes the concept of “application bundles” so attractive.

The content of such a bundle could be…

  • A “Code” folder, that contains read-only executable binaries and libraries
  • A “Data” folder, that contains readable and writable data files
  • a “Config” folder, that contains one “Global” subfolder for system-wide configuration and one subfolder per user who has saved configuration data for the application.
  • A “System” folder, that contains metadata about the application (icon, main executable location, file associations, requested privileges…)

Now, some long-time UNIX users may be alarmed by this idea, and emit the following objections about this concept :

  • It makes it harder to separate programs from their configuration for efficient backup purposes
  • It requires the configuration folder hierarchy of all applications to be updated when users are added, deleted, or changed
  • It makes the configuration data of all users accessible by the application, which is a potential security issue

Let’s answer these concerns right now :

  • Assuming that data and configuration are perfectly transferable from one version of a program’s code to another is a lucky guess. It is true that it can be the case if the developer follows good programming practices, but it cannot be guaranteed by the OS. Besides, software regularly disappears from the internet, and it can never be assumed that a binary that was installed on the OS yesterday will still be on the web tomorrow. For these reasons, program code, data, and configuration should preferably be backed up together.
  • In the context of home and mobile computing, which is what I mainly have in mind, external HDDs have grown big enough in the past few years that such a full backup can indeed be performed.
  • Storing most application-specific data in a single folder makes software removal more straightforward : it can be as simple as having the user delete the application’s folder, while a system service monitors the contents of the /Applications folder and makes sure to clean up associated OS management structures (privileges, file associations…) in such an event.
  • Applications only need to create configuration data for a new user on the first time that said user opens them, not to maintain a full database of users. Also, in the context of home and mobile computing, users are deleted and renamed extremely rarely, so that a full database regeneration may not be such a big deal.
  • The last concern (users accessing each other’s private configuration data through bugs in the application) is an illustration of the security paradigm shift represented by my approach to security. Here, we don’t care so much about the possibility that very knowledgeable users may peek into each other’s data. If a piece of private user data is so secret, it should be encrypted anyway in order to prevent offline access. What we care most about is the possibility for a rogue or buggy application to smash the configuration data of other applications. By moving configuration data in per-application isolated storage regions, we ensure that this cannot happen, save for full-scale file management security breakage situations.

Now that this is said, let’s examine how these application bundles work from the point of view of everyday user experience : “So, I have just downloaded or extracted from a backup an application bundle, what should I do with it now ?”.

Installation should be performed in the most straightforward way, by double-clicking the bundle. In best-case scenario, this would be the only action that the user has to perform, before the application is automatically set up and copied to the /Applications folder. No drag and drop, no endless clicking on a “next” button, no root password to type. In more details, here is what would happen :

  • OS services notice that no management structure is associated to this application, and call the software installer to take care of the installation.
  • If the application bundle is a compressed file, it is first silently unpacked to a temporary location, either on a ramdisk if there is enough space in RAM* or on the system drive otherwise.
  • If the application requests privileges above a user-adjustable “danger” threshold, an installation step is to request the user to confirm the cession of those extra privileges. The action of each OS-defined privileges is clearly documented, and adding extra privileges to them is in itself a privileged action.
  • If the application is associated to some file types, the OS check if some of these file types are already associated to another application. If so, the next installation step is to select for which file types the new application should become the default handler. If not, the application becomes the default handler for everything.
  • If the application wants to be run on every OS boot, the next installation step is to let the user choose whether he wants it to happen, default choice being “no”.
  • If the installer finds pre-existing configuration data in the application bundle, it asks how the data should be migrated to the new computer, defaults being to migrate OS-wide settings and to migrate user settings to users of the same name.
  • Once all these installation steps are performed, the application installer copies the application bundle to the /Applications folder, and adjusts system and application configuration to match the user’s choices. At the end of these operations, a message is transmitted to the user in some way, stating that the application has been copied to /Applications and is ready for use.

* I actually think that in many case, a ramdisk is the best place to store a temporary file on a modern PC : it offers both high performance and guaranteed instantaneous deletion on OS reboot even when the shutdown did not go well.

It is, in my opinion, not the job of the OS to manage per-application post installation tasks, such as EULA validation. These can lead to an inhomogeneous application installation experience at best, and to the OS being accused of wrongdoing that it did not cause at worst. As such, applications should manage those themselves on first start. Also, I am against the existence of software interdependencies, such as shared libraries that are not part of the core OS, so I do not plan to do any step on the OS side to help software that puts itself in such situations. Though I might change my mind on this one later, given some relevant use cases.

Anyway, our software is now installed, and we may use it the way we want. The next step in its lifecycle is updates, a necessary evil of modern computing. Now, I am sure that many of you have experienced the joy of software-specific updaters such as the ones of VLC, Adobe Flash and Acrobat, or Oracle Java, so let’s get this out of the door : I do not want to let this happen to my OS. Application-specific updaters are unneeded effort duplication, security issues, and most of all a major source of user exasperation. Instead, the OS updater should also update individual applications, as can be observed on the repository-based software management systems that are flourishing on many OSs today.

A difference, though, is that I am neither Red Hat nor Apple, and don’t have the money, time, and megalomania it takes to manage a huge software repository containing almost every software available on my OS. Instead, I want to follow a decentralized scheme in which each application downloads updates from a unique set of repositories, so that developers may distribute their updates themselves (while still keeping the possibility to switch back to a repository system later if more favourable circumstances arise).

Being decentralized does not mean being outdated though, and I would like the updater to support all the bells and whistles that one can expect from a modern program of this kind : simultaneous download and installation of independent packages, delta updates with a full package fallback, ability to update system services on the fly without an OS restart and to install the rest without a program restart, regulated network bandwidth usage, etc…

As for software deletion, as said before, it should be as simple for the user as deleting the soft from the /Applications folder. OS services should watch in some way over file management operations in this folder to monitor file deletions (to update associated management structures), but also other possible user actions such as an attempt to copy a file in the folder or to create stuff in it.

And that’s it ! Hope you found this interesting, and in any case thank you for reading !

5 thoughts on “On filesystem and software management

  1. Rudla Kudla February 4, 2012 / 11:08 am

    I was thinking about using torrent protocol for OS and software update. That solves the bandwidth problems both for you and for application developers.

    I believe,you should consider supporting package dependencies. You will not be always able to implement all new technology in the OS as quick as possible, and then some independent developer will do it. It there is no support for dependencies (read shared libraries), the system will be unnecessarily bloated. For example on AmigaOS, there was a bunch of user written libraries, that were widely used by applications. Even if AmigaOS didn’t have a centralized app store, there was almost no problem with this and I couldn’t imagine every application including the libraries. (Well I can, that’s what we have in Windows now).

  2. Hadrien February 6, 2012 / 10:22 am

    I was thinking about using torrent protocol for OS and software update. That solves the bandwidth problems both for you and for application developers.

    Indeed, P2P protocols could be a great idea for update distribution, if this OS reaches the critical mass it takes to make it work at some point.

    I believe that HTTP mirrors will always be needed, though, for the following use cases :
    -Hosting the software’s website.
    -Providing a file that specifies the location of the update on P2P networks and may easily be checked and updated.
    -Accounting for everyone who have bandwidth-capped internet connections and cannot afford to seed (landline-based ISPs in the US, mobile data connexions worldwide)
    -Accounting for everyone who has P2P protocols blocked from his internet connexion (again, mobile ISPs are quite obtuse, but home routers’ firewalls are also often configured to block P2P and I can’t ask everyone to know how to deal with these things…)

    I believe,you should consider supporting package dependencies. You will not be always able to implement all new technology in the OS as quick as possible, and then some independent developer will do it. It there is no support for dependencies (read shared libraries), the system will be unnecessarily bloated. For example on AmigaOS, there was a bunch of user written libraries, that were widely used by applications. Even if AmigaOS didn’t have a centralized app store, there was almost no problem with this and I couldn’t imagine every application including the libraries. (Well I can, that’s what we have in Windows now).

    As far as I know, the list is more like Windows, Mac OS, Android, WebOS, iOS, PlaybookOS or whatever RIM’s version of QNX is called… But I don’t want to start a war of arguments from authority.

    My problem with third-party shared libs mostly resolve around the following :

    • Can shared libraries be made easy to deal with for users if a repository infrastructure is not available ?
    • What happens when the library vendor ships a broken update ?
    • What happens when the library vendor ships a compatibility-breaking update ?
    • What happens when the library vendor discontinues the library ?
    • If so many software use a third-party library that sharing it is worth it, shouldn’t it be part of the OS ?

    Linux distributions often end up partially maintaining the libraries and software in their repository, then fully maintaining them as they are discontinued. Apart from the huge developer workforce required, this situation frequently causes problem with software that is not distributed in the repository, which may naively depend on the latest version of a given shared library while distribution maintainers are either not done applying their patches to said version or save it for the next distro release. Is this situation, in which the distro-maintained repository becomes a vital and inescapable part of the ecosystem, a desirable outcome ?

    All that being said, I do not plan to make it completely impossible to distribute shared components. I simply couldn’t, anyway. I may even do something for some very common shared component use cases, such as interpreted software (I can’t decently make the JDK, Python, and Mono part of the OS distribution, and statically including those in every software would be wasteful). But on the other hand, I think that every dependency on a shared component that is not part of the OS is a risk that can be avoided in many cases, and thus would like developers to consider the decision carefully. As such, I do not want to make the thing easy either.

  3. Alfman February 27, 2012 / 6:04 am

    My opinion is unpopular, but linux definitely has not solved “dll hell” issue for all of us, it’s simply moved it upstream so it’s a problem for package maintainers to solve instead of blissful end users. That’s a nice, but limited “win”. Developers who work on complex apps having numerous dependencies will realize how painful installing “bleeding edge” packages can be because it can result in a cascade chain of broken & incompatible packages from the repository. Each package upgrade might in turn break another such that we end up having to install & manage rather long dependencies ourselves with no clear path for reconciling packages in the future when the repository is updated.

    Typically the rebuttal is that is that developers should work in-tree with “unstable” repositories where all developers try to keep things roughly in sync. That’s a logical answer assuming we’re ok with breakages there, but I find the necessity of such a centralized model to be fundamentally troubling. With shared dlls, we’re mostly stuck with centralized repo maintainers who will fix all the dependency issues before users encounter them.

    I would like to see a more decentralized solution where “maintainers” take the role of a CA and allow the developers be responsible for making packages that can work independently from each other.

  4. Hadrien February 27, 2012 / 10:40 am

    Pretty much the same here. I am not against the existence of a centralized repository that provides a nice overview of available software and a trusted/fast mirror, but I also think that decentralized distribution should not be a second-class citizen and that software should remains roughly independent from each other.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s