It always feels quite infuriating to try an OS or multi-platform technology that you know well and like on a new device only to discover that everything suddenly became awkward to use or stopped working at all. In this blog post, I will explore this phenomenon in the context of graphical user interfaces (GUIs), attempt to explain the main reasons why software scales up or down so badly when moving from one device to the other, and propose solutions to minimize these effects on this OS.
What changes from one device to another ?
Available screen estate
Lots of people have once made this experiment : take some toolbar-savvy software like Word 2003, OpenOffice, or a rotten install of Internet Explorer. Run it on a 10″ netbook. Observe how little space is left for content. Clearly, the software which you’re using has some problems making optimal use of the available screen space. You can also experiment similar effects with large settings dialogs that spread below the lower screen boundary, virtual machines that suggest overly large screen resolutions to guest OSs and force you to pan across the virtual screen, websites that override mobile browsers’ ability to restructure text flow with a default paragraph width that’s unsuitable for comfortable reading on a 3″ screen, and so on.
So far, a constant aspect of the problem has been that the aforementioned software has an overly well-defined, inflexible UI layout, that cannot deal well with a reduction of screen estate. But the most naive methods to flexible UI layouts, typically using percentages of available screen estate as pioneered by web technologies (“set the width of that element to XX% of that of the web browser”), also have strong shortcomings, this time when the screen size increases. How many times have you browsed an old website with your new widescreen computer only to discover that text paragraphs now scale over the whole width of the screen in a fashion that is extremely painful to read ? How many times have you explored an e-commerce website only to discover that the “add to cart” button goes miles away from the article it’s referring to as soon as screen estate grows ?
At this point, it is clear that adaptation to available screen estate is not a simple problem that can be solved by trivial means. Some UI design nihilists actually believe that it is the job of software designers to take care of that problem, that they have to think about each screen size which their software should be used on, including future ones, and use a mix of today’s primitive layouting functionality and conditional instructions to get the job done. I hope that the reminder of this article will convincingly demonstrate that such barbary is, in most cases, not necessary.
As consumers gain interest in smaller and more expensive computer form factors, computer screen manufacturers get a larger margin of maneuver to experiment with technical compromises. One of such compromises is the amount of pixels per square inch of a screen : increasing pixel density at constant size is prohibitively expensive on relatively large screens, but much less of a big deal on smaller screens. This is why many of today’s netbooks, tablets and cellphones have displays that feel dramatically crisper and much more readable than a regular large LCD monitor. But this advancement of technology sadly comes with a price, too : each time you install Windows or another desktop OS on those shiny high-precision screen, you feel that pretty much every control on the screen becomes smaller, harder to distinguish and target. Why is it so ?
The problem here is that the main length unit used by current UI toolkits is the pixel. But as mentioned above, due to the recent evolution of computing hardware, the size of a pixel is not a constant anymore. Pixels become thinner, with densities that vary from 96 ppi (Pixels Per Inch) on a cheap PC monitor to more than 300 ppi for some high end cellphones.
So, what should software do ? Well, obviously rely on a length unit that does not depend on the underlying display technology. Every desktop OS released since 2000 or so is able to use monitor-provided information to determine how much pixels is a cm or an inch, and a sensible UI toolkit should only use raw pixels for high-precision operations that require knowledge of a screen’s pixel structure (such as drawing a thin white border around a dark button to make it visible on a dark background). As an aside, even real-world length measurements are not a good fit for, say, text input fields, because the size of characters is typically something that can be adjusted in system settings for people who have bad sight and UI should still behave properly even if the size of characters is, said, increased by a factor of 4. But again, more on that later.
This is the last big thing which changes from one device to another, from an UI design point of view. With current design workflows, a UI that works well with a mouse and a keyboard may not work very well when controlled with a pure keyboard, a laptop’s trackpad+keyboard combination, a finger-based touchscreen, or a pen tablet. For pen tablets, it’s also worth noting that best UI design practices may change significantly depending if the pen is directly in contact with the screen or if the interaction is mediated by a third “graphics tablet” device.
Nothing could be a better example in this category than Windows 7 running on a touchscreen-based computer (which was actually claimed by Microsoft to be a supported use case at the time this OS was released). Buttons are too small, text selection and manipulation is extremely painful, and your hand keeps hiding important information. While cellphones, pen-driven tablets, and desktop computers share extremely close UI design practices (let’s face it, WIMP is currently not going anywhere), there still is a need to take into account the individual characteristics of each mode of user-computer interaction, and currently this is the job of the UI designer, who obviously doesn’t have the time to take into account every device in the universe when designing his application.
What would these characteristics be ? To take into account all the aforementioned hardware, one could separate absolute pointers (pen tablets, touchscreens) from relative pointers (keyboards, trackpad, mouse). For absolute pointers, important characteristics would be the pointing precision (a pen can be targeted with millimeter precision if needed, but a touchscreen will never do any better than the ~cm² area of a fingertip), the accessibility of the various parts of the screen (on most touchscreen devices, the bottom and sides of the screen are more accessible than the top and center), and whether or not pointing something implies hiding a part of the screen with the user’s hand.
For relative pointers, it may also be important to take into account the range of screen estate that is easily accessible around the pointer (With trackpad much more than mouse, pointing precision decreases strongly with distance, requiring more strenuous trajectory corrections in the end for distant objects. With keyboard, each displacement in an interface costs a number of directional pad clicks). For both, it may also be important to consider the presence or absence of “hover” interaction (positioning the pointer above an object without selecting it to get extra information about it), fast and discoverable scrolling/zooming shortcuts, and so on…
How could these differences be abstracted away ?
The “reinvent the world” approach
At this point, it should be clear that current tools and approaches to UI design strongly lack in generality, and that it is their inability to acknowledge the technical differences between various personal computers that make current UIs unsuitable for cross-device portability. It should also be noted that the same techniques may be used to make UIs adapt themselves to a wider range of users, although this is a much more complicated problem to tackle. So in one way or another, a more flexible approach to UI design should be proposed, along with the tools necessary to achieve it.
The most radical option would be to design user interfaces at the most basic level of human-computer interactions, then let UI designers give more and more details on the interactions that they specifically target. As an example, for a basic clock application, a developer would begin asking the computer to express a regularly updated “Time is HH:MM” textual information, then in the specific case of a visual interaction the developer would ask for “Time is” to have a weaker format than “HH:MM”, then he could specify the font and color used to display the time itself, or he could fully override the information display for visual interactions and show something like a vector analog clock…
Characteristics of this “bottom-up” approach to UI design are that :
- It’s general. Every user, even if he’s blind or some even more unusual handicap, will be able to use the software to some extent. Conversely, such an UI toolkit can easily be tuned to adapt itself to users presenting various handicaps when specifically developing for them.
- It’s powerful. Developers have a truly universal view of user-computer interactions and do not need to frame their mind within the boundaries of existing hardware’s limitations. The UI toolkit knows what is to be displayed, what is the available space, what is the relative importance of things, and it can show as much content and chrome as is deemed reasonable, no more, no less.
- It’s unconventional. Developer must re-learn UI design and can only reuse a slight amount of existing knowledge.
- It’s very complex. This UI design methodology is command-oriented and opposed to graphical design by its very nature, so we override the human brain’s capability to visualize things and ask developers to design interactions at a very abstract level. While teachers would probably say that this is a very good thing, it is like devising language-agnostic algorithms before writing code : who, except in very mathematical realms of CS like cryptography, actually did that after the algorithms exam is over ? Implementation of the UI toolkit itself will also, obviously, prove quite difficult.
The evolutionary approach
At the other extreme, we could say that we already have a lot of UI designers out there, with a given set of knowledge and habits. So we should probably try to capitalize on that instead of reinventing the wheel, and only bring changes which we see as strictly necessary without touching the rest.
In this approach, we would start by examining existing UI design tools, such as Qt Designer and Delphi’s form builder, and consider how their workflow could be altered to make them more device-agnostic. As mentioned above, the pixel density issue could easily be addressed by defining the widgets in centimeters instead of pixels, a trick which even mainstream OSs start to get now. However, tackling the problem of varying display sizes and input resolutions will be more tricky, because it involves dealing with changing information display abilities.
Let’s study the structure of existing UIs, in the hope of finding constants that can be exploited to simplify our problem. Due to the mono-tasking nature of the human brain, most of our software is built around a central interactive content area surrounded by tools that allow more advanced interaction with the content, the “chrome”. Depending on the kind of software at work, chrome can take up more or less of the interface : for software that is centered on content consumption, like web browsers, PDF readers or video players, a very small amount of chrome (~10% of the screen area for a vertical interface) is desirable. For software which is dedicated to content creation, like graphics editor, IDEs, or office suites, chrome becomes a vital part of the workflow and is allowed to take a larger share of the screen if necessary. And in the extreme case of settings panels, there can be no content in the traditional sense of the term, only chrome. Globally, for most interfaces, it appears that there is a notion of “maximal acceptable share of chrome in the interface” after which software starts to feel unusable and cluttered if controls are not automatically hidden away, or even removed if the hiding spots themselves are full.
The reason why content is generally considered a good adjustment variable to accommodate for varying screen sizes is that it scales up and down much better than chrome. Justified scrolling and zooming is an acceptable form of interaction with content, whereas with chrome it feels weird and unusable and is much better replaced by clever use of hierarchical menus. But even content scalability itself does require some work. For a simple example, reading comfort requires text columns to fit within the width of the screen, and stay below a certain absolute width (~15cm on a laptop or desktop monitor). Font size must never become an adjustment variable, but the sweet spot for it is lower on small screens that are close to the eyes than on big screens that are far away from the user, so it should never be directly specified by the UI designer. On content areas which consist of a list with multiple columns, the amount of acceptable columns of details obviously changes with screen area. And finally, for some software like mail clients, content is stored hierarchically, and whether one can give a full view of the content hierarchy or only show one part of the hierarchy at the time depends on the amount of available screen estate.
Automatic chrome amount adjustment may bizarrely be more straightforward to implement. Chrome is mostly organized in docks, toolbars, shelves, ribbons, or whatever other denomination glorified group boxes can get. When there’s too little space, some chrome must be removed from the immediate sight of the user. Choosing what is to be removed can be done by giving a numerical importance to each control (which can also be used to automatically organize them). Choosing how to remove them can be done by organizing controls in a hierarchy of groups at UI design time : if a group of icon has been deemed as clutter, it is just folded into a combobox-like structure. Tabs can also be used to visually combine unrelated sets of controls, like many graphics editor do. While I have so far described the limitation of the chrome area as a very rigid process (once chrome takes up more than x% of the interface, chrome is removed), it is also possible to imagine more elastic processes (chrome is removed more or less quickly depending on its importance, it is possible to conserve important controls for a longer time than unimportant ones if their usefulness compensates their screen area usage)… Many things are possible.
The last scalability problem that must be tackled is the variability of input methods. Here, I think that standard UI controls and layouts would a developer’s best friend, which means that I consider the scalability of all software which fundamentally needs nonstandard controls to work well, such as games, as an unsolvable problem. Well, strategy games have always sucked on anything but the keyboard+mouse combination anyway. More seriously, here are some examples of stuff which I consider that only standard UI widgets can do well in a straightforward fashion :
- Distinguish what is interactive from what isn’t. As an example, when moving from a pointer with high input resolution (mouse, pen tablet) to a pointer with low input resolution (touchpads, touchscreen), buttons and other interactive stuff need to be enlarged, but static blocks of text and pictures can stay as is.
- Relocate controls where it’s most comfortable to deal with them. While owners of mouses and pen tablets pretty much don’t care about this, it becomes much more important as soon as you have touchpads or touchscreens in mind. In the former case, a user can only precisely move the mouse pointer in a certain range around its current position, while in the latter case only some edges of the screen may be easily accessed without strenuous arm movement. Also, as the aspect ratio of screens keeps growing, there ends up being preferred screen edges for control positioning.
- Address the issues of hovering and visibility. Since the appearance of finger-based touchscreens, one can not assume anymore that a user will be able to quickly know what is interactive and to read descriptions of what each button does before clicking/tapping it. Worse yet, by their very nature, these input devices force users to hide the content he’s interacting with with a hand. I’ve ranted about touchscreens elsewhere, but since they are apparently here to stay, UI toolkits must learn to deal with their limitations. As an example, by allowing for easy text selection even though the selected text is not visible while a finger is deposited on it (this is typically done using a “magnifier” visual overlay).
All in all, I believe that an UI toolkit which addresses all the cross-device scalability issues without totally reinventing GUI design is envisionable, as long as developers accept to only set a limited amount of constraints on their UI and let the OS’ GUI layers choose the rest in an efficient way. Such an evolutive approach to scalable UIs would have the following characteristics :
- Easier to deal with. By definition, this approach offers the most comfortable transition path for developers who are used to traditional pixel-based UI design and want to get started with more flexible approaches. For newcomers, this approach also has the potential to be more fun, as it doesn’t carry any complex interaction formalism.
- Hard to design. The beauty of “reinvent the world” approaches is that you can ditch every bad habit of today’s UI design and start over with a clean slate. If you aim at fixing the existing stuff and keeping a maximal level of familiarity with existing development tools, on the other hand, there is a complex recycling step involved, in order to “map” existing UI design workflow onto the more modern device-independent toolkit.
- Purely graphical. Current GUI design is fundamentally based on the assumption that the user owns at least one partially functional eye, and consequently this approach would require making this assumption too. The situation of people who have a bad sight (color blindness, presbyopia) can certainly be improved by enforcing OS-wide color schemes (as opposed to application-specific ones), but fully blind people will still require specific UIs and software.
- Somewhat hardware-specific. Who knows, maybe tomorrow we’ll invent new input devices that allow for direct display and pointing of object in the 3D space. In the case of such a breaking evolution in input devices, the API would have to be extended in order to adapt to the new use cases. It should be noted, however, that software compatibility would not be broken by such an event, as long as they rely on standard UI widgets that can themselves be patched.
At this point, I think you already guess that I lean towards this latter approach. However, I also expect some persistent criticism from the UI nihilists that I mentioned earlier. “Sure”, they’ll say, “you can try to get close to current workflows, but in the end developers will still have a lot to re-learn, it will be very complicated, and in the end it would be much simpler for them (and you !) to just get an iPad, hack iPad-specific code for it, and admit that the era of the general-purpose PC has come to an end and that we must now all become servant of great tech companies for the most fundamental services”. Well, I say, just see for yourself. Here is an example of the kind of workflow which I’d like to get in the end.
When a basic image editor meets device independence
Let’s say that we want to code a simple image editor for basic sketching and photo editing, in the spirit of MS paint. So we fire up our favorite IDE, create a new project, and get to the UI design tools. Initially, what we see is an empty window, equipped with the standard system widgets for reasonably large form factors : a main menu, a tab bar, and a global close button that is used to close the application. On the platform where we are coding, the conventional location of these controls is on top.
For an image editor, the content area is easy to deal with from an UI design point of view : we just put a scrollable and zoomable canvas widget on it, and the rest will have to be managed by our code (it is, after all, its primary purpose). But then, we want to add some controls, so we select a toolbox in our GUI design tool and click anywhere in the content area. The empty toolbox is then automatically positioned according to the preferred location of the device we’re coding on. After that, we may set its maximal acceptable width in percentage of the window area, by dragging and dropping the edge of the toolbox as is usually done for resizing controls.
Now that we have this toolbox, we want to fill it ! We add a few buttons with a radio button behavior : selection, zoom in and out, pencil, and eraser. This basic feature set is sufficient to introduce and test our paint engine, we can add more features later if we want to.
Then we start working a bit on the paint engine itself. At the moment, since we do not have tool options yet, we just draw in black on a white canvas with a round brush. The eraser is the same as the pencil, but it writes in white. It is wise to set the diameter of that brush, which will later become the default brush diameter, to a device-independent quantity : the pointing resolution of our input device. This way, pen tablet users will automatically get a precise brush of a few mm², whereas touchscreen users will get a cm² brush that is more adapted to the limitations of their device. We’ll also have to implement a selection mechanism, and associate it with the system’s standard “magnifier” mechanism for dealing with stuff that is under the user’s hand. In addition, we’ll have to connect the zoom in and out tools to the associated functionality in the canvas. I skip on that part, as it is not very interesting from a UI design point of view.
Then, we say “All that is fine and good, but how do I create and open pictures ? Or print them ? Or export them when I want to make a compressed copy of my bitmap without it becoming the working copy ?”. So we start designing that part, which belongs to the application’s main menu. Thankfully, we are working on a productivity application, which is a very common use case, so our IDE provides us with a “productivity main menu” template which does the job perfectly. And since the GUI toolkit is very well done, the amount of extra coding which we have to do in order to associate these file management controls with the opening of new tabs in the tab bar, each with their own content areas and tool states, is minimal.
Then we start to think “Okay, all that is fine and good, but our feature set really feels quite limited at present time. Who would use software which can only draw black lines of constant thickness on a white canvas ? What about undo/redo functionality ?”. So we want to add two things : undo and redo tools on one hand, and tool options (brush thickness and color) on the other hand. First, we add undo and redo tools, in the most naive fashion, and what we get is this :
Hmmm… Not quite getting there, are we ? The toolbox has grown to accommodate for the increasing number of controls inside, but in the process it has suddenly become an unusable mess, in what could be called the “GIMP effect”. So, what happened ? Well, we didn’t group our controls together yet. Grouping controls has several advantages, one of which being that it introduces a proximity constraint : the GUI toolkit does its best to make sure that in the finally rendered UI, the grouped controls end up next to each other. Let’s now make three groups : one grouping zoom tools, one grouping drawing and erasing tools, and one grouping history functions (undo and redo). What we get is this :
That result is significantly better, and will do for now. This gap next to the selection tool is hurting my sense of aesthetics, though, so if we stopped there, I’d probably add a cropping tool or an elliptic selection tool around there at some point in the future, to restore some symmetry in this UI design. But for now, let’s add tool options to our UI. To this end, we add a second toolbox to our UI, exactly in the same way as we did before.
As a default, the GUI toolkit tries to centralize all tools in a single region of the screen rather than spreading them all over the application’s window, which potentially involves longer pointer trips and may reduce usability. In this case, it made some room for the second toolbox by folding groups of controls in the first one. The exact operation of folded groups varies depending on the input hardware : if a form of secondary click is available, it may be used to choose a tool within the group, otherwise the group behaves like a regular combobox. This latter option is rather cumbersome in practice, so if the UI designer has explicitly specified that some controls must remain easily accessible, the toolkit may choose to pack all the remaining controls in a single “more…” combobox, or even hide them altogether.
Now, all we have to do is to write code that dynamically fills this combobox with the options of the currently selected tool, using a Qt-like “form layout” based on (label, control) pairs.
And that’s it for a basic image editing application. So, in all honesty, is this GUI design methodology so foreign as compared to currently existing stuff ? And yet it didn’t need to rely on device-specific control sizes and positioning.