To anyone who was waiting for BSU 5 (which, let’s spoil it, will be about file management and virtual file systems), bad news ! For this new article will be a revised version of the last one, whose plans have seen much change since its publication, thanks to Brendan’s feedback.
Going further in the “long-term thoughts” part of this summer update, I’ll now be bringing up some long-term ideas about how I’d like to do graphics in the early days of this OS’ graphic stack, the a framebuffer-based backend to begin with even though I obviously keep the option to do things otherwise latter. I’ll noticeably explain what motivates this choice, what the dirty implications are, and how I plan to live with them.
Ode to the joy of modern graphics
Graphics are a mess, both on IBM compatibles and on ARM devices. Every GPU manufacturer in this world has forgotten what the “standards” word means, and the only thing which allows user-space software to consistently run across multiple hardware is abstractions layers implemented in software. And when I say “consistently”… Well… You know why consoles and Apple iDevices are so popular among game developers as compared to desktop PCs and Android devices respectively, isn’t it ?
When you begin in the world of alternative OSs, you are pretty much alone with your keyboard and your chair. As such, there is strictly no way you’ll manage to have the manpower it takes to write a range of hardware-specific drivers for every GPU in existence, and rewrite them each time a new chipset family comes out due to manufacturers enjoying breaking stuff for the fun of it. Even mature codebases like Linux, with thousands of developers, don’t manage to get it right, so the logical conclusion is that it’s simply impossible without having the manufacturer writing the driver for you. Even that won’t save you, considering that according to Microsoft’s statistics, the manufacturer-provided GPU drivers are the main source of crash on Windows, the OS which they are mainly developed for.
As such, there is a need for GPU-independent abstractions, which allow one to draw stuff on screen and retrieve information about said screen in a standard way, even though the full power of the GPU is not available. Thankfully, such a thing exists in the x86 world, in the form of the VESA BIOS Extensions (VBE) for all BIOS-compatible computers sold today, and the equivalent UEFI functionality.
VESA BIOS Extensions
The VESA BIOS Extensions (VBE) are a set of BIOS services that you are guaranteed to find on any desktop or laptop sold today and for many years to come. They essentially provide two useful functionalities :
- Switching pixel resolutions of the main screen and giving software access to an abstract version of the screen, called a linear framebuffer, which is basically a giant bitmap of known format with extra padding between the pixels and scanlines.
- Getting the EDID information of said screen, which basically tells you everything about it, including its size and the pixel resolution it supports.
The other side of the coin is…
- 16-bit code : Like most other BIOS services, VBE functions require you to switch your computer back to its startup state or to emulate their real mode code. Emulating is complex and switching is basically only a reasonable thing to do at boot time, so this forces a specific design on us : the bootstrap component has to do the mode switching job, then transmit the EDID and frambuffer information to the OS through some standard mean.
- Poor manufacturer support : Although VBE has catched up with the evolutions of screen technology (even supporting stereoscopic “3D” displays in its latest edition), GPU manufacturers have not well catched up with the evolutions of VBE. Main consequence is that on most hardware, only 4:3 screen resolutions are available, which on modern 16:9 and 16:10 screens mean sub-optimal graphics and distorsions due to wrong pixel aspect ratio unless some sort of software correction is being applied.
- No multiple displays : VBE only supports displaying data on one screen at a time, and that generally happens to be the main computer screen. Supporting stuff such as video projectors and multi-head setups requires dedicated video drivers.
Equivalent UEFI functionality
The PC BIOS is a piece of software which strongly shows its age, and many big industry players are fed up with it. A replacement has hence been devised, in the form of the Unified Extensible Firmware Interface (UEFI), which is basically a cleaner, more powerful and modern, well-standardized, but also weirdly licensed and incompatible version of most BIOS services. Deployment is currently under way, and goes at a snail pace. But by the time this OS is released UEFI could actually be commonplace.
UEFI is based on the notion of protocols. My current mental model of the concept is that a number of virtual or physical EFI-compatible devices (Bus controllers, GPUs’ frame buffers, displays, the CMOS clock, etc…) register themselves in the EFI database as being able to handle a number of protocols, which are themselves standard firmware services. One may lookup the EFI database for devices supporting a specific protocol, and then use that protocol to interact with the device. UEFI also provides facilities to identify a device found this way within the system, by providing the information necessary to look it up on the PCI bus or in the ACPI tables.
Among the protocols of UEFI, two are especially of interest to us.
- The Graphics Output Protocol (EFI_GRAPHICS_OUTPUT_PROTOCOL) is provided by each compatible frame buffer (that is, each independent video output of a GPU) within the system. It provides information on the video modes supported by both the frame buffer and the display (as opposed to VBE, which gives information about the frame buffer and the display separately), allows one to set such video modes, and gives access to a linear framebuffer similar to the one provided by VBE.
- The EDID_ACTIVE_PROTOCOL protocol is provided by each active display, allowing one to retrieve its EDID without caring whether said EDID comes directly from the display itself or from the UEFI-compliant firmware (which can optionally override the display’s values).
An internal hierarchy of parent and child handles allows one to relate a display to the GPU which outputs to it through mechanisms which I’ve not figured out yet, and a naming system (EFI_COMPONENT_NAME2_PROTOCOL) helps the user figure out what each EFI device and display is.
Characteristics of the UEFI graphic output architecture are :
- Only works on boot : This is for a different reason than VBE, though. BIOS services are always there, but since they are written in 16-bit code we can’t easily execute them. In the EFI world, on the other hand, there is an explicitly defined boundary between “boot-time services” and “run-time services”, and once the OS has declared that boot time is over, all boot time services are stopped and freed from memory. This may be a problem, as the linear framebuffer should probably remained mapped in memory, but nothing in the spec guarantees this. Testing is required there…
- Uncertainty : Apart from the problem above, another potential issue is that nothing in the EFI spec forces graphics adapters to implement the Graphics Output Protocol. EFI only specifies how it should be implemented, if it is implemented. Another potential issue is that UEFI-compliant drivers are not forced to provide software with direct access to the hardware and can instead only provide a blitting function. This is a problem because as stated above, the driver which provides the blitting function goes the way of the dodo during system boot, and that blitting function would obviously end up not being accessible afterwards.
- More modern hardware support : As stated before, EFI’s Graphic Output Protocol supports multiple displays well and only broadcasts which modes are supported by the connected displays. The spec also attempts to make EFI firmwares less lacking than their VBE counterpart in the realm of low resolutions, by asking that integrated graphics controllers support the native resolution of the host display, which makes it difficult for IGP firmwares *not to support* widescreen framebuffers. The spec does give them an occasion to run away, though, by giving them the vague option to implement “A mode required in a platform design guide” instead. Let’s hope that not too many EFI firmwares will choose this latter option and just put 640x480x8bit as a requirement in the platform design guide…
To sum it up, UEFI offers a framebuffer abstraction that’s potentially much better than VBE’s in every way, but that’s not available in current hardware and emulators, and can go wrong in a number of way depending on manufacturers’ interpretation of the EFI spec. Still, it’s a good effort already, and maybe the UEFI mafia will have me join their secret society on my own free will after all…
Some graphics stack idea
Now that we’ve seen what kind of graphic hardware we’ll build drivers on to begin with, let’s see what kind of graphic stack we could build on top of that. Please note that layers do not equal processes, although they beautifully make up for isolated components : two layers may be united in one single process (as an example by implementing it in a library form), one layer may be split into several processes (the desktop shell, typically), etc…
Now that this is said, let’s explore the current graphic stack concept, from its bottom to its top.
Layer 1 : Video driver
The role of this device-dependent layer is to abstract away the various differences between heterogeneous actual hardware interfaces into a small number of OS-defined hardware-independent interfaces. Drivers should keep lean, with OS-defined interfaces that remain relatively close to the bare metal, as every piece of code that is handled within drivers has to be rewritten many times in every single driver. This causes reliability issues, effort duplication, and platform inconsistency, which as any Linux OpenGL user can attest is not a good thing.
From a “bare metal” point of view, there are two families of hardware which this OS will support. On one side, there are framebuffers, like those mentioned above, which are used for software-rendered 2D graphics. On the other side, there are full-blown GPUs, for the hardware which gets the luxury of native drivers, and those are designed for hardware-accelerated raster 3D graphics and have a very complex internal structure.
My take to manage this dichotomy is to have two different system abstractions at this level.
One is for framebuffers. It is very simple, as framebuffers are pretty clean hardware abstractions. The abstract OS framebuffer would probably mostly differ from actual hardware frambuffers by the fact that it has a consistent 32-bit RGB pixel layout, if the software rendering stack can affort the performance hit of pixel conversions of a screen-sized bitmap that this implies. Run-time display resolution change will not be a requested feature from framebuffer drivers, as it has limited use and it seems that none of the modern framebuffers cleanly supports it. If there is a need for resolution upscaling, it will be done in software, as software upscaling has better quality anyway.
The other abstraction is for the native drivers of GPUs. The abstract hardware as a structure close to… well… an abstract GPU. Linux drivers attempt to do this by having the driver controlled with OpenGL commands, but OpenGL has proven to be so high-level already that this is a very bad idea. So what I suggest to do instead is to do as Linux engineers now want to do, and use the Gallium3D GPU abstraction, that is closer to the bare metal and allows us to reuse native drivers from other OSs as a nice bonus. OpenGL functionality will then be provided as a layer on top of the Gallium3D driver, typically a shared system library.
Layer 2 : Primitive rendering
This layer, which is located on top of each driver, draws on screen the primitive shapes that are used within the GUI. To start with, I’d probably use a full catalogue of common basic 2D shapes from image editors : pixels, lines, rectangles, gradients, text, etc. The initial implementation will work on top of framebuffers, since that’s all we’ll initially have. Then at some point, support for rendering these same primitives with OpenGL will be added, allowing the GUI to work on top of native drivers without requiring those to masquerade as a 2D framebuffer.
This layer also abstracts away the various color spaces of video output devices, in order to reach a consistent visual look on every device which has ICC color profiles installed. To this end, the primitive renderer internally works with an hardware-agnostic color space such as CIE_Lab or CIE_XYZ (I tend to lean towards the former, as it offers perceptual uniformity which means nicely uniform color mixing). Ideally, it would only do the conversion to RGB at the last moment, for perfect visual uniformity across devices, but the expensive nature of CIE-to-RGB conversions may force some approximations to be done in the area.
One exciting idea, which was suggested by Brendan along with the one above, is to also use a “material system” where the screen is rendered as lit from font and behind, and the color of every object on screen is defined by its light transmission and reflection properties, just like the color of real-world objects is. In an hypothetic future where native drivers would be commonplace and the computational power available for eye candy would have no limit, 3D primitives could also be added, and this material system could be effortlessly extended into a full raytracing sytem managing refractive properties, specularity, diffraction, etc…
Layer 3 : Window manager
At this point, we manage separate displays in a separate fashion. Although this workflow may be desirable when some displays are totally unrelated, like when a video projector is plugged into a laptop, there are other situations where users actually want several screen to be handled as one single continuous drawing surface by the system, like in so-called “multi-head” setups. For multi-screen support, we thus want to create a number of abstract “virtual screens” which do not necessarily have a 1:1 mapping to their physical counterpart.
Conversely, proper software isolation practice would require that we give such an isolated “virtual screen” to each of our GUI processes, so that they don’t mess with each other’s UI. An abstraction which sounds much less exotic once one puts its regular name on it : windows. Several virtual screens within the boundaries of one (or more) physical screens.
I don’t know yet if those two abstractions should be separated, as they sound very much related to each other. Each time, we make software use virtual coordinates in their drawing jobs, and then hook and adjust these drawing jobs to make them fit the reality of physical screens. For now, let’s put both of these abstractions under the common name of “Window manager”.
Layer 4 : Widget toolkit
There is a level above the window abstraction and below applications where standard system controls are drawn. In my opinion, this is also the level where resolution independence, that is, independence on the display’s output resolution and the input device’s precision, should be implemented. Why is it so ?
First because input resolution independence requires relatively high-level knowledge to be achieved, as it only affects input controls (a button would be resized, but a picture or PDF document would stay the way it is). Not before the widget toolkit level is this knowledge available, unless we choose to “taint” the graphics output of primitive rendering with metadata about the input/not input status of what we draw. Abstractions leaking this way would be a form of spaghetti code, as it makes input event management more entangled with graphic output and harder to service independently.
Second reason why I think that resolution independence should be implemented at the widget level is that resolution independence, input one in particular, makes the size of things random and unpredictable for anyone above the resolution independence layer itself. So if I put resolution independence at the primitive rendering level, some widgets will always end up being rendered outside of an application’s window (and thus, become invisible) at some point, in a conflict for space that should definitely be managed by the widget manager. One solution would be to have the primitive rendering code communicate with the widget rendering code to determine whether a widget would be visible or not, but that’s both cumbersome and inelegant.
As for the drawbacks to this approach, it makes it impossible to make a smooth multihead setup between two screens of different I/O resolution. However, I’m not sure that putting totally unrelated screens side by side and asking that they display a consistent picture is a valid use case. If it is only the color response which differs, I can understand, but if physical screen size and DPI change too, that’s pretty much a no-no. A way to solve the problem the pain would perhaps be to put the input resolution independence in the widget toolkit and the output resolution independence in the primitive rendering engine, but splitting the resolution independence like that just sounds quite inelegant, so I’d first like to know if this is really necessary/useful.
Layer 5 : Applications/Desktop shell
Above the widget toolkit layer, all system GUI abstractions for developers are available and applications can run, sending their resolution-independent GUI spec sheets to the lower layers. One special application (or, perhaps, set of applications) will be the desktop shell, which groups together the global system controls that allow one to switch tasks, close programs, open new software, see status information and notifications, perhaps log in too, etc…