There has been a lot of heat around the internet recently following Microsoft’s release of the “Consumer Preview” of Windows 8. Like with the former “Developer Preview”, people unilaterally dislike the schizophrenic separation between the newer, touch-optimized “Metro” interface on one side and the more traditional Windows 7-like desktop on the other side. Everyone also assesses that neither interface, taken alone, is able to provide a comfortable and full-featured user experience on both desktop/laptop and tablet computer form factors. In this article, I will discuss what has in my opinion gone wrong in Windows 8’s Metro design, comparing it to my own ideas for window management.
On device-specific APIs in a general-purpose OS
People following my OS design rants for some time should know that as far as I’m concerned, “device-specific” is a swear word. Any operating system which pretends to be versatile and suitable enough for the incredibly diversified world of personal computers, should not fall for the easy path of designing for only one hardware architecture, one input device, one screen size/resolution, one kind of user, a small bunch of System-on-Chips, and so on. The reason for this is that OSs built for a specific hardware or specific purposes have always proved unable to withstand the test of time well. As hardware evolves, software which has stuck itself at one specific evolutionary step struggles, and either comes out strongly diminished or dead. The only way to reduce the likelihood of such an outcome is to make software that can readily adapt itself to predictable and less predictable future hardware evolution, with little harm done.
Not all software can be like that, of course. Much hardware, such as x86 GPUs or ARM SoCs, is designed so badly that drivers must be rewritten in depth each time a new chip family comes out, instead of just requiring a small patch for the new functionality. In the merciless world of enterprise software, it is frequent to ask underpaid workers to work under impossible time constraints so that developers are forced to make lots of assumptions in their code in order to make it on time. And sometimes, there is always one bit of alien software which truly requires hardware-specific functionality to work. My goal, however, is to use a reasonably general and future-proof option whenever this is possible.
Meanwhile, Windows 8 with a Metro interface blindly follows the current finger-driven touchscreen fad while pissing off keyboard, mouse, and stylus users, assumes that memory leaks will never happen by making it ridiculously hard to clean up OS state (“shutdown” being merely a variant of hibernation), does not even try to support background processing, has no form of serious hierarchy in its main menu, and so on… Like iOS, it is a perfectly purpose-built OS, ready to blow up as soon as the next generation of personal computers and user interfaces (Neural interfaces ? Holographic displays ? 3D sensors with haptic feedback ?) arrives.
The specific issue of window management
One specific piece of UI which ends up badly butchered when people start to design OSs with a specific piece of hardware in mind is window management. Let’s start this part by stating a definition : on an OS with a WIMP graphical user interface that is capable of running third-party software, a window is a bitmap image with controls drawn on it, which serves as a dedicated display and input area for each individual application, and which may occupy a variable portion of the screen at a given point of time in a user-controlled fashion.
There are three main forms of window management available in today’s GUIs. One is the overlapping window manager, which is used by most “desktop” OSs. It represents windows as a stack of resizable rectangles projected on the 2D screen, where raising windows to the top of the stack is done by activating them with the pointer and where only the window which is at the top of the stack may be directly operated. Another, which is used by most “mobile” OSs, is the full-screen window manager, in which windows occupy the screen in an all-or nothing fashion, and may only be switched by using an Alt+Tab-like task switching mechanism. Finally, tiling window management offers something in-between these two approaches, slicing the available screen area in a number of regions, each of which is dedicated to one fully displayed window.
Tiling is a very interesting compromise, because it combines the flexibility and versatility of displaying multiple windows on screen at the same time with the simplicity, elegance, and screen usage efficiency of a full-screen window manager. Many tiling window managers support launching software with their window maximized, then resizing its window to make room for others if there is a need to display several softs at the same time, combining the best of both worlds. With this in mind, it is not so surprising that Microsoft have chosen this method of window management for Windows 8 : in principle at least, tiling is simply the most polyvalent form of window management known today, and as such is exactly what a general-purpose OS needs.
However, talking about “tiling” itself not enough. Just like there are several ways to slice a rectangular screen area in rectangular pieces, there are many ways of implementing tiling from both programmatic and UX points of view. The “flat” nature of tiling window managers raises interesting GUI design questions in such apparently trivial realms as keyboard and pointer focus, window creation, deletion and resizing, titlebar visibility, or synergy between widget toolkits and window management. As we are going to discuss, this is actually there, in the implementation, that the main problem with Windows 8’s window manager lies.
Building better tiling window managers
So, when designing a tiling manager, what does one want ? To provide people with benefits of seeing multiple windows at the same time (or, more precisely, being able to quickly switch mental focus between two windows), without the annoyances that overlapping windows bring. Use cases of seeing multiple windows at the same time include easy documentation use in complex softs, content comparison and transfer, monitoring a “minor” task while working on a “major” one, and checking what an UI notification is about without completely leaving what you are doing.
Windows 8’s window manager, limited to seeing two windows at the same time in a fixed 25%:75% horizontal split, addresses exactly one of these uses cases, which is minor task monitoring. A highly uneven split is unsuitable for doc consultation, content comparisons, and transfers. And since Microsoft want you to use your 24″ screen in full screen mode and get windows out of nowhere on the left side of your screen, it is unlikely (although unconfirmed so far) that something as advanced as spawning a new window out of a notification will be possible.
How would I address all of these issues myself ?
Getting a minimal amount of control
Well, first, I am truly not fond of the gesture-controlled chromeless interfaces that are all the rage nowadays at Cupertino and Redmond. I can understand the need for them on a phone form factor, where extreme screen estate constraints calls for extreme UI design, but on screens of 7″ and more, forcing mysterious, invisible and hardware-specific controls on users gets quickly ridiculous. What exactly is so bad about having permanently visible OS controls that give quick access to task switching and status information, yet can still fade out of the screen for the sake of immersivity when there is a need for it (i.e. on small screens, user’s demand, or software’s demand) ?
Here is a quick visual overview of how it can work :
Adding and removing tiles
Once a minimal level of UI firepower is available, it is possible to envision tiling arbitrary amounts of windows using a consistent and familiar workflow. This is because tiling is, in essence, akin to showing a selection of opened windows on screen. So if the core UI already provides a mechanism for selecting and concurrently handling multiple objects on screen, we can easily adapt it to the selection of several windows for simultaneous display.
As it turns out, there are such mechanisms in the realm of file management.
- On keyboard+mouse and keyboard+stylus combos, users are able to add files to a selection by holding the Ctl key pressed and picking them one by one or in a “rubberband” fashion. The action is fully reversible by a second click while the Ctl key is still pressed.
- On purely pointer-driven interfaces, a “Select” contextual command is generally used. Some UIs manage this by performing a secondary click on each object that is to be selected and choosing a “Select” option in the context menu, whereas others prefer to use a “Selection mode” in which the effect of primary clicks on objects is modified to act as a multiple selection method until a “Done” switch is pressed. When more screen estate is available, putting checkboxes next to everything becomes possible, but remains quite ugly.
- Purely keyboard-driven interfaces tend to use similar mechanisms as pointer-driven ones, in that they emulate hovering using arrow keys and use a keystroke to select something that is being hovered. Some also use a “Selection mode” mechanism in which the action of the primary “click” is temporarily modified.
So, how can we adapt these workflows to window management ? For keyboard+pointer, it’s quite easy : just keep the Ctl+click mechanism, which works just fine, and use it on task managing UI elements. When only a pointer is available, secondary click->”Select” or “Add Tile” on a tab or the task manager could well be a valid option. Keyboard-driven interfaces are more tricky, but one could imagine adding a Ctl modifier to the regular task switching mechanism to enter a “Tile management mode” where one can browse through running tasks and add and remove tiles (kind of like switching keyboard focus to the task manager).
Once there is a mechanism for adding and removing tiles on the screen, one should think about how these are presented to the user. Here, one needs to start with a simple and predictable screen estate allocation algorithm, then make it able to handle user resizing tiles and moving them around. As far as I’m concerned, the automatic tiling algorithm of choice on an horizontal screen would be to…
- Insert full-height vertical tiles on the left until a minimal window width is reached
- Then insert a full-width horizontal tile at the bottom and start over
Let’s explain why I like this solution :
- Modern screens have significantly more horizontal screen estate than they have vertical screen estate, while reading and others form of visual parsing (which is really the #1 or #2 thing which we do when we use our computers) works best on vertical columns of text. To compensate for this mismatch between hardware capabilities and user needs, it is a good idea to favour vertical tiles.
- Since I envision a scheme in which task management mechanisms follow the left screen edge whereas application controls follow the right screen edge, inserting tiles on the left minimizes user disruption when using the simplest 2-tile configuration on a large screen : the controls of the initial tile can remain where they were if they still have enough room. Also, inserting tiles on the side of the task management panel allows for some elegant visual metaphor in which tiles come out of the task manager, whereas if tiles come from the right they just come out of nowhere.
- Although software GUIs should be able to transparently handle having their window change from a vertical to a horizontal shape and vice versa, be it only in order to handle those shiny accelerometer-driven devices we see nowadays, very narrow window form factors are rarely good. If we consider reading again, very thin columns of text tend to be every bit as eye-straining to read as very long ones, because your eyes are jumping lines all the time. They also tend to look very ugly on the right side (for left-side text alignment) or in the middle (for justified text), due to word wrapping reaching its limits. This is why an horizontal window must still be created from time to time.
This algorithm can also be adapted to vertical screens by performing horizontal cuts as a default. To put it simply, one starts by cutting in the longest screen dimension, then cuts in the shortest one when necessary.
As an additional note, let’s remark that the automatic tiling algorithm does not have to slice lines of windows using a uniform split that gives each window an equal share of the available height or width. This is a fair default when no clue is given, but it windows have such things as “preferred widths/height/aspect ratios” or “widget load” (which could be specified by the widget toolkit on window creation), then the tiling algorithm may make use of this data to slice its tiles in a more relevant fashion. And in case several windows have been previously tiled together, user preferences should also be recalled. Speaking of which…
While having a good automatic tiling implementation is of crucial importance in order to make the feature feel seamless during everyday use (just like any other form of “sane defaults”), it is not enough. Fine-tuned adjustments will always be needed. Let’s explore some of them.
The most basic adjustment that can be made to a tiled window setup is resizing tiles. Since on a tiled window manager, enlarging a tile comes at the expense of another tile’s screen estate, resizing tile is in effect equivalent to moving the boundary between two tiles, and can as such be implemented as such. In practice, workflows for resizing a window in an overlapping window manager (dragging boundaries) can be reused, except that this time, two windows are being resized at the same time.
External window edges could also count as a resizing target, in order to allow for the scenario pictured below :
To conclude, it should be noted that to make it easier to obtain regular layouts when there are 4 or more tiles on screen, window edges should slightly snap to other windows’ edges as they cross them in their course. Making such snapping effects noticeable and useful but not annoying is pure implementation polishing magic. Especially when one takes into account that snapping must be easily reversible which is hard to achieve in this context if it is not a user-initiated action.
Besides resizing, it is also important for users to be able to move tiles around. And admittedly, although all tiles can still be moved around either by dragging their titlebar around or pressing a modifier key and dragging the tile around by the middle, this step gets a bit more complex on a tiling window manager than on an overlapping one. Explication follows.
On a tiling window manager, tiles are not moved independently from their neighbours. They can only change position if other tiles are moved and resized in a fashion that leaves empty space in the requested position. In effect, this splits tile motion algorithms in two categories : the simple case of tile motion that follows a regular line of tiles, and the more complex cases. Let’s explore both.
When tile motion follows a regular row of tiles, it can simply be performed by regularly swapping tiles across the line, kind of like in a sliding puzzle where the empty square would be the tile which we are moving. The only thing which could prove difficult in this context would be to find a proper dragging threshold for performing the swap, and that is an implementation detail.
Other varieties of tile motion can provide significantly more difficult to design, though, and require deeper thoughts to be integrated into clean and unambiguous user workflows. As an example, on the tile arrangement above, what should it mean for the window manager if a user grabs the “media player” tile in the middle and drags it upwards ? If could either mean that he wants it to become a vertical tile on the top row, stored between the topleft tile and the topright tile in a single row, or that he wants to swap the position of the media player and the top row.
In practice, in this case, disambiguation could be introduced by using a threshold effect : for a “small” amount of dragging, it is understood that the user wants to achieve vertical tiling, whereas for a “large” amount of dragging, it is understood that the user wants to swap the bottom tile with the two top tiles. However, this means that the window manager now has to manage three levels of dragging threshold : one for actually performing motion, one for performing vertical tiling, and one for swapping tiles. Making such a large amount of thresholds work will require some significant minimal tile size, along with hours of threshold fine-tuning, but is probably doable.
The same logic could be used when dragging, say, the topleft tile downwards : for a small amount of dragging, the topright tile becomes horizontal and the topleft tile ends up on the same row as the media player, whereas for a larger amount of dragging the window manager a stack of horizontal tiles is created.
One last scenario involves having three tiles stacked in parallel, and dragging across the direction perpendicular to the stack. I will not graphically represent it because my drawing tool is (voluntarily) too coarse-grained for that, but sufficient is to say that handling such a scenario fully while keeping consistent with previous rules requires five levels of dragging threshold : one for engaging motion, one for performing tiling in the direction orthogonal to the dragging direction within the first neighbour, one for swapping the tile with its next neighbour, one for performing orthogonal tiling within the second neighbour, one for putting the tile at the end of the stack.
The purpose of this last scenario is to show that if we want to support every kind of window motion in the tiled space, then tile motion along anything but a line can quickly become complex on crowded screens. Which means that it can be interesting to either not support some of the possible motions or encourage use of highly regular tiling by the user, as an example by a hypothetical snapping mechanism like the one evoked in the resizing part.
To conclude, let’s go back to the multiple windows use cases mentioned earlier :
- Easy documentation use in complex softs -> Check (One soft window + one browser window in a free vertical split)
- Content comparison and transfer -> Check (Two content windows in an even split, which is what this UI does by default when tiling two equivalent windows)
- Monitoring a “minor” task while working on a “major” one -> Check (Two windows in an uneven split, which can either be determined automatically by the UI or set by the user.
- Checking what an UI notification is about without completely leaving what you are doing -> Check (Provided we get some help from the notification system, it should be easy to open new windows in a new tile under some circumstances)
As a conclusion, this approach to tiling seems to offer a universal approach to window management, similar in versatility to an overlapping window manager but with the comfort and screen usage efficiency of a tiling window manager. However, to preserve this comfort, care must be taken to engineer the usability of this system so that it does not become exceedingly complex when showing large amounts of tiles (more than three in one direction of space and two in another). One possible answer to this problem would be to help the user to make regular, grid-like tile arrangements, which are much easier to handle, through a “snapping” mechanism. Another way to simplify tile handling would be to remove of some of the possible motion path for tiles in order to make dragging gestures less ambiguous. These considerations still have to receive a fair amount of care before the tiling model presented here is ready for prime time.
Thanks for reading !