Big summer update 4 – On graphics and VBE

In a way, I’m glad I have a few remaining topics for this summer update. That’s because publishing my preliminary RPC test results has sparkled an unexpected but very interesting conversation with Brendan about the theoretical foundations of this IPC mechanism, that I thought were now a given. If his criticism turns out to be legitimate, I’m good for giving up on some of my killer features, and spending a few more months in design stage. But heh… It’s not as if I wasn’t ready to spend a few more months designing stuff if it means a more perfect final product. That’s OS development we’re talking about, after all… If I wanted instant gratification with a minimal amount of effort, I would be developing fart apps for mobile devices.

Going further in the “long-term thoughts” part of this summer update, I’ll now be bringing up some long-term ideas about how I’d like to do graphics in the early days of this OS’ graphic stack, using the VESA BIOS extensions, even though I’ll obviously keep the option to do things otherwise latter. I’ll noticeably explain what motivates this choice, what the dirty implications are, and how I plan to live with them. I’ll also add some design thoughts that are not VESA-specific but could well survive a possible latter switch to a GPU-based architecture.

Ode to the joy of modern graphics

Graphics are a mess, both on IBM compatibles and on ARM stuff. Every GPU manufacturer in this world has forgotten what the “standards” word means, and the only thing which allows user-space software to consistently run across multiple hardware is abstractions layers implemented in software. And when I say “consistently”… Well… You know why consoles and Apple iDevices are so popular among game developers as compared to desktop PCs and Android devices respectively, isn’t it ?

The situation is least bad in the x86 world, because this architecture comes from a time where hardware manufacturer actually cared about low-level developers and DRM crap didn’t exist, a time where hardware specs were opened up and where it was common to write software over the bare metal. These days are certainly long gone, but that doesn’t mean that there aren’t some useful remains around. On of those remains is a culture of openness, which leads Intel and AMD to consistently release development manuals for the hardware they produce (only leaving NVidia as the bad duck there). Another one is legacy standards, like the VESA BIOS Extensions (VBE).

When you begin in the world of alternative OSs, you are pretty much alone with your keyboard and your chair. As such, there is strictly no way you’ll manage to have the manpower it takes to write a range of hardware-specific drivers for every GPU in existence, and rewrite them each time a new chipset family comes out due to manufacturers enjoying breaking stuff for the fun of it. Even mature codebases like Linux, with thousands of developers, don’t manage to get it right. It’s simply impossible without having the manufacturer writing the driver for you, and even that won’t save you considering that according to Microsoft’s statistics, the manufacturer-provided GPU drivers are the main source of crash on Windows, the OS which they are mainly developed for.

As such, you want to avoid relying on GPU-specific drivers as long as possible, and wait until your OS is mature enough that it can attract the workforce it takes to (vainly attempt to) handle this hell. This is where legacy standards come into play.

What VBE does, what its limitations are, and why hobby OSdevers like me want it anyway

As hopelessly disorganized as the world of low-level computer graphics may be, there actually are some standards body within it. Most prominent and ancient of those is VESA, the Video Electronics Standards Association. They are the ones behind the DisplayPort monitor connector and the DDC/EDID protocols which modern graphics hardware and operating systems use to communicate information about computer displays. But they’ve also created some technological wonder for hobby OSdevers : the VESA BIOS extensions (VBE). These are standard BIOS features, supported by every single GPU out there, which noticeably bring the following attractive features on the table :

  • Detect pretty much everything about screens which are attached to the computer
  • Switch the main screen to a high-resolution graphics display mode and draw stuff on it on a per-pixel basis, with access to fancy functionality such as triple buffering.

That’s huge news, because that’s what most OSdevers want when they begin to design a GUI. After all, the “computational” parts of GPUs are only good for very specific applications (gaming, heavy 3d rendering…) that hobby OSs are unlikely to attract in their early life. For normal GUI operation, we only want a gigantic buffer to draw stuff on, and VBE provides just that.

So, where’s the catch ? Well, VBE is growing old, and with its age come the following issues :

  • 16-bit code : Like most BIOS functionality, VBE mode works best when the processor is running in the super-legacy 16-bit “real” mode. Newer version of the standard offer a hackish 32-bit interface to this 16-bit code, but all in all the logic remains that of real mode, and it still won’t run in 64-bit mode, which all modern processors should be running on. What this means is that accessing VBE functionality requires either temporarily switching the CPU back to real mode (only acceptable in the early stages of kernel boot) or emulating the real mode code of VBE on top of the running 64-bit kernel (possible any time, but requires some work). Without the ability to execute real-mode code, only drawing on an already initialized display is available.
  • Poor manufacturer support : VBE has adapted itself to the spreading of wide screens, by adopting a model where video chip manufacturer specify themselves which video modes (resolution, refresh rate, bits per pixel) are available in hardware. However, manufacturers have not fully kept up with this innovation, because at this point the “one-driver-per-device” model had become the norm and they didn’t feel like doing their homework of properly supporting old standards. What this means is that frequently, only 4:3 video modes and very low-res wide video modes are available, and software must make a though choice between a highly low-res display and a distorted display which requires software adjustments before circles stop having an elliptic shape.
  • No multiple displays : Historically, VBE only supported computers with one single display. At some point, multiple displays became common enough that VESA couldn’t ignore that possibility any more. But instead of adding “extended” video mode manipulation commands that allowed a developer to control the video mode of each screen, VESA chose instead the following “solution” : controlling one display at once, and advertising video modes for both displays. I can’t stress enough how horribly ugly this solution is, but for now sufficient is to say that VBE doesn’t work well with multiple displays.

So, why do hobby OSdevers want VBE anyway in spite of all of its flaws ? Because it works all the time, and provides both a nice graphics driver to begin with and a nice fallback driver when more sophisticated stuff goes mad. And they will go mad, because a small alternative OS will never have the workforce or the influence on manufacturers it takes to properly keep drivers up to date with the evolutions of hardware. Heck, NVidia don’t even disclose their specs, and all of the Linux worlds’ graphics teams is still not enough to get a GPU-based graphic stack that doesn’t feel very rough around the edges.

Some graphics stack ideas

Now, all this talk about VESA is nice and all, but what video drivers will initially be used is close to an implementation detail anyway, because a good driver interface abstracts away the hardware/standard being used to draw things on screen anyway. So what if we talked about what kind of graphics stack this could take place in ?

Layer 1 : Graphics driver

At the bottom of the stack is the graphics hardware. It essentially provides us with two things : a way to display video frames on the screen (framebuffer), and a range of high-performance hardware tools for bitmap and 3D graphic manipulation. As we have seen previously, everyone can access a significant part of the former functionality in a standard way, but the latter requires dedicated drivers.

Operating systems which can afford the luxury to get dedicated drivers for each hardware may assume the presence of dedicated drivers and consider both functionality under the light of a unified device. Windows and Mac OS X can, some Linux distros think they can. We definitely can’t, so we have to ask that drivers provide two standard interfaces. One mandatory one for exchanging information about the screen, setting video modes, and displaying video frames from a standard format (kind of like the Linux framebuffer), and one optional one for all the accelerated stuff (OpenGL is an option, but it makes drivers heavy and causes lots of code duplication. Gallium 3D could be a better candidate there).

All services that are required for the system to work properly must work if only the former interface is available. Non-critical stuff, like 3D games, may depend on the latter. In the future, the non-accelerated VESA driver might also emulate the latter interface, if there is a demand for it, but it must be stressed that it will always be dog slow and that anything a bit complicated won’t work well over it. Try to run some GPU-accelerated applications with direct rendering disabled on Linux, in order to see what it might be like.

Layer 2 : Window manager

Processes, including system ones, share a number of screens (typically one) on which their graphical output is displayed. And like in human societies, each time something is shared in the computer world, system arbitration is required so that no one abuses his rights on the shared object. This is why the window abstraction exists, as a number of private bitmaps that a running program may use to display its visual output without any need to worry about what other programs are doing in their own corners.

The job of the window manager is to combine all these private windows into a single screen image, centrally arbitrating stuff like who is displayed, who is on top of who, etc. It also does the reverse conversion, determining which window an input event belongs to and appropriately handing them to the widget toolkit. It does not take care of the window controls (e.g. close button), which are part of the desktop shell instead (see relevant entry below for a longer explanation).

Layer 3 : Widget toolkit

The widget toolkit’s purpose is to manage standard UI controls, like buttons, canvases, or tabs. Give it a window, the incoming input events, and a resolution-independent spec sheet describing which controls are going to be put on it, and it will manage all their “reflex reactions” (hovering effects, popup menu appearance and disappearance…) and dispatch application-managed events to the relevant application (e.g. if a button is clicked, the widget toolkit will manage the button’s sunken appearance and notify the application that button X has been clicked through a standardized IPC message, which I currently envision as a remote call).

In circumstances which require it (e.g. games), applications may of course still manage their UI themselves. A typical way to do this would be only use a canvas that fills the window, and redirect all UI events associated to it to the application.

Layer 4 : Desktop shell and applications

Previous components were abstractions for developers, whereas the desktop shell is the abstraction for users. It is basically what gives user control over graphical applications. Task switchers, application launchers, and global window controls belong to this component — which can actually be split in several processes in implementation.

Applications also run on top of the widget toolkit and the window manager. The main difference between them and the desktop shell is one of security permissions : the desktop shell can do a few things which user software may only dream about, like keeping its windows on top of everything else, closing other user programs, switch tasks…

An aside : Do we really need to change resolution at run time ?

This is something I’ve been wondering for some times : could the graphics codebase be simplified by removing support for changing screen resolution at run time and replacing that with software-based resolution upscaling methods ? The idea is that…

  • Only cathodic screens could, due to the way they work, cleanly display a lower resolution than their native one. Modern LCD- and OLED-based screens need to run some upscaling algorithm internally in order to display low-resolution graphics.
  • The upscaling capabilities of most screens are, simply put, terrible, as compared to what can be done in software. Check the full-screen output of modern video players (VLC, Windows Media, Flash-based players…) when displaying a low-resolution video to see what modern upscaling looks like : blurry ? Sure. Horrible-looking heap of pixelated visual data where nothing looks clean ? Not so much. Same for modern video games emulators : they get pretty close to the output you’d get from a low-resolution CRT screen, and still run smoothly  while doing so.

This prompted me to wonder why, exactly, people lower the resolution of their computer screens. As far as I can tell, the following reasons exist :

  • Making stuff bigger when you have bad sight. This use case becomes obsolete with resolution-independent GUIs.
  • Playing old games which can only run at a lower resolution. Well, TOSP won’t have “old games”, but if it had they would certainly run fast enough on modern hardware to make the overhead of software upscaling bearable.
  • Playing new games when your computer doesn’t have the hardware it takes to run them at full resolution.

The last one is a tough one. If the game already runs badly, you probably don’t want to slow it down further by adding some expensive software graphics operation to the mix. However, a heavy game in a modern sense is a game with heavy 3D graphics, which requires a GPU to run, so if the user can run this game we can assume that we have a fully capable GPU driver around. Question is : how expensive is good upscaling if we can hardware-accelerate it ? Isn’t upscaling somewhat of a triviality for modern GPU ?

Well, guess I won’t have my answer until I get to graphics implementation, which is a long way away, but I thought it was potentially an interesting track to follow.

Advertisements

27 thoughts on “Big summer update 4 – On graphics and VBE

  1. Brendan August 23, 2011 / 5:48 am

    Hi,

    There’s a few things worth mentioning here.

    The first is EFI/UEFI. For EFI there is no VBE, and the functions for finding information about video modes and setting up a video mode are part of “boot-time services” and not part of “run-time services”. This means that (on EFI machines) without a native video driver you’d setup a video mode during boot and you can’t switch modes after boot.

    For VBE, most video cards only offer “4:3” modes, while most monitors are wide-screen LCD with a “16:9” or “16:10” native/preferred resolution. This means that often you can’t use VBE to setup the monitors native/preferred resolution and therefore can’t avoid the up-scaling/down-scaling capabilities of most screens.

    If you do a lot of extra work to support switching video modes after boot (using VBE and a full real mode emulator or something), then it won’t work for most people (who will be using EFI by the time your OS is usable), and when it does work (older machines) people won’t be happy anyway (no 2D/3D acceleration, no gamma control, etc). Basically I’m not convinced being able to change resolutions after boot (without a native video driver) is worth the hassle.

    The other thing you may be overlooking is resolution independence. To allow for 2D/3D acceleration (if/when supported), the video driver has to do most of the drawing. If the video driver has to do most of the drawing, then why does anything else (GUI, applications) need to know which resolution (and colour depth) the video driver happens to be using?

    Resolution independence is also important to solve some multi-monitor problems. For example, imagine there’s 2 monitors where both monitors are using different resolutions and different colour depths/pixel formats. You’re writing an application; and the left half of your application is on one monitor and the right half of your application is on the other monitor – which resolution and colour depth does your application use?

    Software (GUI, applications, etc) should create some sort of “video script” that describes what to draw (and not pixel data); lower levels should work on those scripts (and not pixel data), and the video driver should do all rendering.

    For example, your application might create a script that says “draw a blue rectangle from virtual coordinates (123, 456) to virtual coordinates (7654, 4567)” and then send that script to the GUI. The GUI might scale it up or down to suit the application’s window size and then add offsets for the window’s position (e.g. “draw a blue rectangle from virtual coordinates (1012.3, 1045.6) to (1765.4, 1456.7)”). Then the GUI would add window decoration, etc; and maybe combine this with other video scripts to create a larger video script that includes the desktop/background, other windows, etc. Then something at a lower level would receive this larger script and do clipping and more scaling before sending the video script/s to the video driver/s. For example it might split the GUI’s “video script” into 2 separate scripts (for 2 different monitors), and the application’s blue rectangle might be split in half (and become 2 “half rectangles”, one for each video driver). After all of that, the video driver/s draw the scripts.

    The other thing I should probably mention here is that each layer can use the exact same protocol/language for those “video scripts”. This makes it far more flexible. For example, if an application is running in full screen mode, then its “video script” could bypass the GUI layer and go directly to the layer below that. You could even have (for e.g.) a web browser running in full screen mode (e.g. as an internet kiosk) where no GUI is present at all. You could also have nested GUIs (e.g. application runs in a window and sends its “video script” to a GUI, which itself runs in a window and sends its “video script” to the root GUI). Then there’s (the equivalent of) “Object Linking and Embedding”, where an application runs inside another application.

    Of course it should be obvious that its stopped being purely video, and become a more complete protocol that covers a lot more (video, keyboard, mouse, sound, etc), where the layer about the video driver/s also communicates with the drivers for other devices.

    With this in mind, the graphics stack becomes “applications -> GUI -> virtual screen manager -> drivers” (with variations like “applications -> virtual screen manager -> drivers” and “applications -> GUI -> GUI -> virtual screen manager -> drivers” and “applications -> applications -> GUI -> virtual screen manager -> drivers”). I’m not too sure where “Widget toolkit” really belongs (it probably should be a library that’s used by applications and GUIs, and not a separate layer in its own right).

    Cheers,

    Brendan

  2. Brendan August 23, 2011 / 8:02 am

    Hi again.

    My previous post barely mentioned “colour depth/pixel format independence”, but this is a complex subject on its own, so I thought I’d come back and do a brief introduction on some of the details.

    To begin, there’s more than one device that deals with colour – printers, scanners and cameras also deal with colour. It would be nice to have a standardized representation of colour for all types of devices, rather than something different for each different type of device.

    Video cards and monitors almost always use some variation of “RGB” (and hopefully “sRGB” which is a specific/standard variation of “RGB”). All variations of “RGB” aren’t adequate – they aren’t able to represent certain colours (like rich cyans and greens). Most printers use some variation of CMYK, but (just like RGB) CMYK isn’t adequate either, as they also can’t represent some colours. Some printers use a more advanced representation of colour (like hexachrome with 6 primary colours) to get around that, but even that doesn’t completely cover all possible colours.

    Then there’s the intensity of light. You’d want to be able to represent a range from “absolutely dark” to “brighter than the sun”, but RGB and CMYK systems typically stop at “white”, and tend not to have very precise representation of dark colours. This means that if you want to make an RGB or CMYK image lighter or darker you get poor results due to the lack of accuracy in the source data.

    So, the first step in developing a standardized representation of colour is to abandon the ways that existing (electronic) systems represent colour. The only piece of hardware that actually matters is the human eye.

    The human eye consists of three types of “cones” plus one type of “rods”; all of which respond to different frequency ranges of light (in an unfortunately non-linear way). The cones respond to higher frequency light, medium frequency light and low frequency light. The rods respond to a wider range for frequencies of light, but saturate easily (and are therefore only really useful for low light conditions – in normal lighting conditions the cones dominate). The eye’s response to light has been studied (mostly by the International Commission on Illumination or “CIE”), and the end result is the “CIE_XYZ” colour space.

    The “CIE_XYZ” colour space is able to represent all colours. The X, Y and Z values represent the eye’s cones. The eye’s rods aren’t represented (and don’t seem to be included in the research at all), but the eye’s rod response can be approximated from the X, Y and Z values (potentially including a hue shift towards “blue” when X, Y and Z are small). However, “CIE_XYZ” isn’t so easy to use (and can have similar problems with representing a wide range of “intensity” as RGB and CMYK). For this reason there are variations of “CIE_XYZ”, like “CIE_LAB” and “CIE_LUV”.

    The next thing to understand is that the only thing that has colour is light. “Colour” is the eye’s response to (a spectrum of) light. Everything else (“materials”) don’t have colour – instead, they effect light. For example, imagine a piece of paper that reflects all frequencies of light – under a white light the paper looks white, under a green light the paper looks green, and in the dark the paper looks black. What properties do materials have?

    If you’re modelling the physics of light, materials are very complex – more complex that what a computer can handle in real time. That approach is impractical. Instead, we can choose a subset of properties that are practical and use that as basis of the model. For example, we could say that a material has a spectrum of frequencies of light that it reflects, a spectrum of frequencies of light that it allows to pass through, and diffusion (for both reflected light and light that passes through it).

    How do you represent “a spectrum of light”? To do that you have to have something like a big table (for e.g. containing every frequency of visible light in steps of 100 Hz or something). It’s a massive nightmare. Fortunately a “spectrum of light” maps to a specific response in the eye which can be described by “X, Y and Z”; and many different “spectrums of light” can map to exactly the same response in the eye. Basically, (to simplify this and avoid going into complicated theory) it’s enough for the system to only deal with 3 frequencies of light; or more accurately, three imaginary/virtual frequencies of light that correspond to X, Y and Z.

    So, taking that into account; materials could have an “XYZ of light” that they reflect, an “XYZ of light” that they allow to pass through (and possibly 2 types of diffusion); and therefore a material can be represented by a set of 6 (or maybe 8) values.

    To actually display this, you’d have to have an (XYZ) light source, where rays of light are effected by any/all materials they intersect. The end result of the rendering is to generate an array of “XYZ” values which can then be converted into the device’s internal representation of colour (e.g. RGB or CMYK or whatever). For something like a normal application, you’d simplify this a lot by assuming the light source is the camera and that the light source is “white”, and by making sure that all materials are the same distance from the camera (or at least parallel to the camera). With these assumptions, rendering ends up being “draw everything in order” (e.g. from front to back). Without these assumptions things get complicated (you end up requiring something like ray tracing).

    Cheers,

    Brendan

  3. Hadrien August 23, 2011 / 9:03 am

    The first is EFI/UEFI. For EFI there is no VBE, and the functions for finding information about video modes and setting up a video mode are part of “boot-time services” and not part of “run-time services”. This means that (on EFI machines) without a native video driver you’d setup a video mode during boot and you can’t switch modes after boot.

    Sounds like one more argument against video mode change at run time, indeed… But I wonder : how can EFI possibly make a difference between boot-time and run-time in OS code ? Do OSs have to explicitly tell the firmware to switch between both ? And if so, what would be the point of doing that ?

    For VBE, most video cards only offer “4:3″ modes, while most monitors are wide-screen LCD with a “16:9″ or “16:10″ native/preferred resolution. This means that often you can’t use VBE to setup the monitors native/preferred resolution and therefore can’t avoid the up-scaling/down-scaling capabilities of most screens.

    Great, ugly rendering as a must-have feature… VBE can be used to retrieve EDID though, can this be used to notice the aspect ratio mismatch and compensate for it when drawing, so that at least circles look like circles ? Or would this be too expensive ?

    If you do a lot of extra work to support switching video modes after boot (using VBE and a full real mode emulator or something), then it won’t work for most people (who will be using EFI by the time your OS is usable)

    Sure, computers will be using EFI under the hood, but BIOS emulation is here to stay for a very long time. I mean, it’s 2011 and PC-compatibles still boot in real mode… So if useful BIOS features are to remain available, why should I directly call EFI, which is poorly supported by emulators and doesn’t bring much useful stuff on the table ?

    To prepare for the disaster that a full switch to EFI would be, I guess the best option is to use what’s available now, but write the code so that it can be painlessly scrapped and rewritten for the next big thing in the future.

    Note that I’m not defending mode switching at run time, but OTOH VBE has another run-time feature which I’d be quite sad to drop and am ready to use/write a real mode emulator for : triple buffering.

    , and when it does work (older machines) people won’t be happy anyway (no 2D/3D acceleration, no gamma control, etc).

    Yeah, VBE is not perfect, but it’s basically the best that alternative OS developers can get without much hassle, unless I’m missing something.

    Basically I’m not convinced being able to change resolutions after boot (without a native video driver) is worth the hassle.

    Me neither. Avoiding tearing, on the other hand…

    The other thing you may be overlooking is resolution independence. To allow for 2D/3D acceleration (if/when supported), the video driver has to do most of the drawing. If the video driver has to do most of the drawing, then why does anything else (GUI, applications) need to know which resolution (and colour depth) the video driver happens to be using?

    Well, this is the part which makes my head hurt. A drunk roommate waking me up at 3 am may have played a role in that too, though.

    We can’t make the driver draw everything using a unified interface, because 2D/3D acceleration and non-accelerated graphics have conflicting needs. Accelerated graphics need abstractions that are very close to the bare metal to work efficiently (OpenGL, D3D, Gallium3D), whereas non-accelerated graphic stacks want to avoid spending a single quantum of CPU time emulating the crazy internals of a GPU (otherwise, you get the world-infamous performance of Mesa’s software renderer). Both are needed. So in my opinion, best option is to have drivers support a non-accelerated interface where software sends (or directly draws using a single blitting command) a large screen-sized bitmap, and the driver only does pixel data conversion and data displaying job. For HW-accelerated stuff, there would be a separate accelerated interface, which uses whatever abstraction is best suitable for accelerated graphics.

    Basic GUI needs to run everywhere, so it has to use the non-accelerated interface. This in itself requires that it knows screen size and pixel resolution, in order to generate a bitmap of the right size that the driver can draw at optimal speed.

    Resolution independence is also important to solve some multi-monitor problems. For example, imagine there’s 2 monitors where both monitors are using different resolutions and different colour depths/pixel formats. You’re writing an application; and the left half of your application is on one monitor and the right half of your application is on the other monitor – which resolution and colour depth does your application use?

    Only the window manager has to know about the existence of multiple drivers and screen topology, as for applications above it’s only about drawing on their private window bitmap that will then get sliced and blitted on the multiple screens by the window manager.

    Software (GUI, applications, etc) should create some sort of “video script” that describes what to draw (and not pixel data); lower levels should work on those scripts (and not pixel data), and the video driver should do all rendering.

    If that video script is some OS-specific language that implements whatever developers think is relevant, then it’s heading towards a very dangerous path which can go wrong in a big variety of ways. If you suggest to use a well-known industry- or community-supported language for drawing stuff (OpenGL, Direct3D, Gallium3D…), they are all designed for GPU-accelerated rendering nowadays, and again software rendering is already slow enough without the hassle of GPU emulation on top of it.

    For example, your application might create a script that says “draw a blue rectangle from virtual coordinates (123, 456) to virtual coordinates (7654, 4567)” and then send that script to the GUI. The GUI might scale it up or down to suit the application’s window size and then add offsets for the window’s position (e.g. “draw a blue rectangle from virtual coordinates (1012.3, 1045.6) to (1765.4, 1456.7)”).

    Adding offset is, in the stack I suggest, the job of the window manager component. However, resolution independence is not only about window size, it’s also about managing variations in output resolution (what are the physical x and y dimensions of a pixel ?) and input resolution (does the user use a precise pointing device like a mouse or an imprecise pointing device like a finger-based touchscreen ?). In the end, size adjustments can literally get some controls *outside* of a window, controls which would thus not be drawn and would become unusable without the drawing application knowing. For these reasons, I think that I/O resolution independence should be managed at the widget toolkit level, not at the lowest levels of the GUI stack (video language interpreter). Stuff which directly accesses the graphic hardware, like games, will use its own resolution independence libraries as usual.

    Then the GUI would add window decoration, etc; and maybe combine this with other video scripts to create a larger video script that includes the desktop/background, other windows, etc. Then something at a lower level would receive this larger script and do clipping and more scaling before sending the video script/s to the video driver/s. For example it might split the GUI’s “video script” into 2 separate scripts (for 2 different monitors), and the application’s blue rectangle might be split in half (and become 2 “half rectangles”, one for each video driver). After all of that, the video driver/s draw the scripts.

    Ouch. My head hurts again, but I think that basically, our ideas have a lot in common, except that you prefer to use a scripting language in pipelines (the Unix way) and describe every single component of the graphics stack but the driver with the vague “GUI” term whereas I prefer to think about modularization right now, before I’ve created another bloated X Window System without understanding what has happened. If you think it’s not so, please try again to explain why in the light of my answers above.

    The other thing I should probably mention here is that each layer can use the exact same protocol/language for those “video scripts”. This makes it far more flexible. For example, if an application is running in full screen mode, then its “video script” could bypass the GUI layer and go directly to the layer below that.

    In my opinion, if you’re so desperate about performance that you’re going to bypass the GUI layer, you could as well talk directly to the hardware driver, in its “native” language (be it OpenGL code or whatever), without the overhead of a video script interpreter. Like hardware-accelerated games and video players do on most OSs.

    You could even have (for e.g.) a web browser running in full screen mode (e.g. as an internet kiosk) where no GUI is present at all.

    There is still a GUI, the web browser’s one. Maybe you’re talking about not having GUI system controls like as a desktop shell or windows decorations, but this you can do in my model already (remove or hide the desktop shell, leave one window which takes the whole screen for the running app)

    You could also have nested GUIs (e.g. application runs in a window and sends its “video script” to a GUI, which itself runs in a window and sends its “video script” to the root GUI). Then there’s (the equivalent of) “Object Linking and Embedding”, where an application runs inside another application.

    I’ve met OLE in office suites and a variant of it as embedded web browser plug-ins, and I’m not convinced of its relevance, as it always look like a big, poorly integrated hack. But if you really want that, I guess you can use a shared bitmap between the host and the guest, which the host uses as a canvas widget and the guest as his window. Some event redirecting will be needed as well, but for sandboxing reasons you probably don’t want to have the guest application hook all host events anyway.

    Of course it should be obvious that its stopped being purely video, and become a more complete protocol that covers a lot more (video, keyboard, mouse, sound, etc), where the layer about the video driver/s also communicates with the drivers for other devices.

    Noticed that, but I’m already guilty of throwing input events in the mix myself so you’re forgiven :)

    With this in mind, the graphics stack becomes “applications -> GUI -> virtual screen manager -> drivers” (with variations like “applications -> virtual screen manager -> drivers” and “applications -> GUI -> GUI -> virtual screen manager -> drivers” and “applications -> applications -> GUI -> virtual screen manager -> drivers”). I’m not too sure where “Widget toolkit” really belongs (it probably should be a library that’s used by applications and GUIs, and not a separate layer in its own right).

    There are pros and cons to each approach. It’s pretty much like the kernel threads vs user threads debate, in fact, in sense that it is an issue of system awareness. Is the system aware of the existence of GUI widgets ? Does it have to be ?

    In my opinion, the benefits of handling widgets through an isolated system component are that…

    • The system can do everything to achieve a responsive GUIs, unlike untrusted user applications which are trapped within the realm of their ridiculously tiny privileges
    • We can also trust the widget toolkit for a number of privileged operations (like file open dialogs), whereas non-isolated library code is not to be trusted.
    • System-wide changes to all GUIs, like input/output resolution changes or theming, can be managed much more cleanly
    • We consistently use IPC as the way to give orders to system components, instead of using IPC somewhere and libraries somewhere else.

    The drawbacks being that…

    • It’s slower
    • And that’s all I can think of right now
  4. Hadrien August 23, 2011 / 12:03 pm

    Thanks a lot for this piece about color representations ! I’ve been dealing with so-called “color profiles” for some time, without being able to truly grasp what they were about. My guess was that they dealt with how the screen’s RGB values translated into “real” colors, and that if your screen’s profile and your printer’s profile mismatched, a sort of image filter was automatically generated to make sure the output has the expected look. But that was about it.

    I like the optical theory which you suggest (considering on-screen objects as light reflectors/transmitters) a lot. I only see two potential problems with that. The first one is a potential dependency on unknown information. Only professional peoples know the relationship between their screen’s RGB values and real retina response data. If the average guy ends up using a (wrong !) “default profile” of its screen, aren’t we just recreating RGB in a different and incompatible way ?

    Now, you can well argue that human eye-based color abstractions will never appear if no one introduces them to begin with. And this I can accept. My second problem is one of cost. The way I see it, drawing an RGB pixel on an RGB framebuffer is three MOVBs (“put the R value in the R box”), or even better a MOVL if you have 32-bit color depth. Whereas if you use CIE_XYZ, the pixel drawing workflow becomes more computationally intensive…

    • Start from the CIE coordinates of your backlight’s color, which can optionally be position-dependent if you want to get into non-uniform lighting
    • Multiply by the absorption/reflexion matrix of the desired “material” (which is hopefully diagonal, so that optimizations can be applied there)
    • Multiply by the screen-specific CIE-to-RGB transformation matrix
    • Truncate and clip the RGB coordinates to get integer values that fit in a byte
    • Move these RGB coordinates to the framebuffer

    Wouldn’t this be quite expensive to do for a mere software renderer ? If I could guarantee that a GPU is available, I wouldn’t worry so much since they are very good at that kind of massive matrix manipulation, but here, I feel a bit more shy…

    Another thing about cost is, I guess that these CIE coordinates are floating point numbers. This means 32 bits per color component for simple precision, or 64 bits per color for double precision. Which means that bitmaps would be respectively 3 times or 6 times as large (or more if you compare with 24-bit, but 24-bit is ugly). Isn’t this a problem ?

  5. Brendan August 23, 2011 / 6:58 pm

    Hi,

    For EFI, the OS (typically the OS’s boot code) asks EFI to switch from boot-time to run-time. This marks a kind of transfer of ownership of the computer’s resources. For example, before calling “exit_boot_services()” the firmware owns all the hardware and does memory management, etc; and after “exit_boot_services()” is called the OS owns all the hardware and does memory management, etc. During “exit_boot_services()” the EFI firmware also unloads a lot of stuff (drivers, etc) to free resources. An OS can’t switch back from “run-time” to “boot-time” (without rebooting).

    You can get EDID (with both VBE and EFI), and you can use that to correct aspect ratio. Of course this is another reason for “resolution independence” (it’s very easy to correct aspect ratio when everything outside the video driver uses virtual coords).

    While BIOS emulation is currently possible, it should probably be treated as deprecated from this point on (as all major OSs already support EFI, and allowing BIOS compatibility to continue has little benefit). Don’t forget that it’s probably going to take you around 10 years to write the OS, and by the time it’s “finished” there will be people using computers that have never even heard of “BIOS” (just like there’s people around now that have never seen a floppy disk, audio tape or dial-up modem).

    For future-proofing, for my project I have a “Boot Abstraction Layer” (BAL). The boot code depends on the firmware and passes information to the BAL. The BAL (and kernel and the rest of the OS) only depends on the information passed to the BAL (and doesn’t know/care what the firmware was).

    I’ve never had trouble with tearing, etc. If you blit data to display memory fast enough (especially if you avoid unnecessary writes – e.g. use dirty rectangles or something) it’s unnoticeable unless you’re playing a game or something with a high frame rate (and if you are playing a game or something with a high frame rate then you really do need 2D/3D acceleration and a native driver).

    You can have a unified video driver interface. It just means that the “raw framebuffer” driver has to do everything in software (and ancient low-end video cards may need to do some things in software too). Some features can also be “optionally supported” – e.g. if the video hardware (or driver) doesn’t support something like volumetric fog or shaders, then the application/s could ask if it’s supported before using it (so that it needn’t be supported in software by the driver).

    2D is just “3D with depth = 0” – there’s nothing too special about it. Font data should be offloaded onto a “font engine”, where the video driver asks the font engine to convert a (Unicode) string into a texture (the video driver only needs to be able to draw textured polygons).

    There’s also a whole pile of caching that the video driver should do. The “video script” should probably be a hierarchy of canvases, where each canvas is cached by the driver (and doesn’t need to be redrawn if the corresponding part of the script hasn’t changed). Note: for native video drivers the canvases would probably be cached in the video card’s memory where possible, so that fast/accelerated “display memory to display memory” blits can be used.

    Applications do not need an “unaccelerated, raw pixel” video interface – that’s just silly (more accurately, it’s a myth that originates from poor design from the past). For performance it doesn’t matter who does the drawing (someone still has to do it) but it means that you’re pushing large amounts of data around (rather than much smaller “video scripts”). It’s also a bad idea to duplicate the same drawing code in all applications (and force applications to care about which pixel format/s, which resolution/s, etc it has to support).

    “Only the window manager has to know about the existence of multiple drivers and screen topology, as for applications above it’s only about drawing on their private window bitmap that will then get sliced and blitted on the multiple screens by the window manager.” You didn’t think about this enough! :-)

    Basically (for the “multi-monitor” example) your application has to generate pixel data for the wrong resolution and wrong colour depth, and then some 20 MiB of pixel data is sent to the window manager. The window manager has to split every horizontal line of pixel data in half and then send 10 MiB to one video driver. The other 10 MiB of data has to be rescaled (and rescaling pixel data isn’t “fast”, especially if you want good results) and converted to a different colour depth before sending the remaining 10 MiB of data to the other video driver. That is going to be very expensive (in comparison to doing the drawing at the right resolution and the right colour depth in the first place, which can’t be done by an application that only generates one buffer of pixel data).

    I (deliberately) didn’t say what the “video script” would contain. It could be pure OpenGL – you call functions to position the camera, draw textured polygons, etc (and the library just adds these commands to a script) and then you call the “glFlush()” (and the library sends the script to the video driver). I personally wouldn’t use OpenGL alone though (it lacks support for things like font data rendering, and nothing says the video driver’s interface has to match the functions in the OpenGL library). It’s your OS, so feel free to decide the details yourself.. :)

    Rendering 3D graphics (with textured polygons, etc) in software isn’t that fast (although it doesn’t necessarily need to be too slow either, if you can throw many CPUs and SSE at it). Rendering 2D graphics in software can easily be fast enough though.

    There is one thing I haven’t mentioned yet. If the video driver is doing the rendering, then the video driver can adjust the quality of the rendering to meet some sort of deadline. For example, for textured polygons that are small it could just use an “average” colour and ignore the texture, the “far” clipping plane could be made closer (and distant polygons skipped), lighting could be simplified (no shadows, no reflections), etc. A hardware accelerated video driver on a fast machine could produce 60 extremely detailed frames every second; and software rendering on a slow machine might produce 30 very simple frames every second; and the application/s, etc don’t need to know/care how detailed the frames are (or how fast the video is).

    Of course if the video driver draws a frame without much detail and then has nothing to do, then it could spend more time redrawing that same frame in higher detail. For example, you might get a low quality version of the frame within 50 ms, then (if nothing has changed) you get a better quality version after 150 ms, and then (if it still hasn’t changed) you get a detailed version of the frame after 1000 ms. If it’s done right most people won’t even notice the low quality versions of frames when doing things like office work. Then there’s screen-shots (take a copy of the script and texture data, then spend as much time as you like creating an image with extremely high detail).

    For me, “GUI” is responsible for things like window decoration (and window size and position, and minimize/maximize, etc), the desktop, the task bar, application menu, handling “alt +tab” and which window has focus, allowing applications to be started, themes, etc. The “virtual screen manager” is responsible for controlling which devices are used by which GUIs (potentially including support for multiple virtual desktops where each virtual desktop runs a different GUI); and also handles user login and maybe some global things (like power management and screen savers). Your “window manager” seems to be a “less flexible” combination of both of these separate concepts. ;-)

    – Brendan

  6. Brendan August 23, 2011 / 8:35 pm

    Hi,

    If the user sets up their monitor wrong, then the colour reproduction will be wrong too. That’s the user’s problem.

    If the system is completely unable to handle colours that RGB can’t represent, then that is the systems problem. Worst case is only supporting colours that all devices can represent – no rich cyan (because of RGB limitations); no bright reds, violets or greens because CMYK can’t handle them, etc (let’s hope the user doesn’t have a black and white printer!).

    To understand this, here’s a picture: http://www.thinkcamera.com/news/images/PS-2-colorspace.jpg

    The strange outer shape is all colours the eye can see. The triangles within it are the colours that a few different RGB systems and one CMYK system can represent. See how severely crippled RGB and CMYK are?

    Now tell me, for “8-bit per component” RGB, what value do you use for “10 times brighter than white”? White itself is probably 0xFFFFFF, so 10 times brighter than white would be…? Imagine you’ve got a picture of the sun in the sky in one window, but on top of that is another (grey) window that happens to be translucent – 10% of the light from behind the translucent window shows through. If you try to pretend that 10 times brighter than white is the same as “plain white” (saturation), then when you draw the translucent window on top you end up with dark grey when you should’ve ended up with white.

    Conversion from CIE_XYZ to RGB is about 3 steps. The first step is matrix multiplication (to convert XYZ into RGB while also adjusting the “white point” to suit the monitor). The second step is clipping the values to a specific range (0 to 1). The third step is gamma correction. Depending on how you do it; for 1024 * 768 it might take about 20 ms on one 2 Ghz CPU (but it’s easy to do in parallel on multiple CPUs; and you’d hopefully be using dirty rectangles or something to avoid updating the entire screen most of the time anyway). Don’t forget that converting “RGB with one whitepoint” into “RGB with a different whitepoint” is the same amount of work

    Of course that also assumes you’re doing the conversion from CIE_XYZ to RGB as late as possible in the rendering. You don’t need to do it that way. Instead you could do the conversion as soon as possible. For example, if the “video script” says something like “draw a blue rectangle” you convert “blue” into RGB first, then draw a rectangle. The same goes for textures – convert them to RGB once when the textures are loaded (and not each time the textures are used). You still need to limit the RGB values to a certain range and do gamma correction, but you’d probably halve the overhead (about 10 ms on one 2 Ghz CPU?).

    An (unsigned) floating point number requires 2 bits – one bit for the significand and one bit for the exponent. You probably want to use more bits than that (but you’re not limited to IEEE 754’s floating point formats). How about a 10-bit unsigned floating point value (with a 3 bit significand and a 7-bit exponent) so you can pack 3 of them into a 32-bit dword?

    – Brendan

  7. Hadrien August 25, 2011 / 8:56 pm

    Hi again !

    For EFI, the OS (typically the OS’s boot code) asks EFI to switch from boot-time to run-time. This marks a kind of transfer of ownership of the computer’s resources. For example, before calling “exit_boot_services()” the firmware owns all the hardware and does memory management, etc; and after “exit_boot_services()” is called the OS owns all the hardware and does memory management, etc. During “exit_boot_services()” the EFI firmware also unloads a lot of stuff (drivers, etc) to free resources. An OS can’t switch back from “run-time” to “boot-time” (without rebooting).

    Some more questions on EFI graphics, as I have to admit that the UEFI spec drowned me a little bit when I tried to read it (why oh why don’t we make simple specs anymore ?) :

    • There are quite a lot of “protocols” in the UEFI spec, so I guess that hardware implementations do not have to support all of them. Is there any guarantee that resolution changing capabilities will be there in EFI implementations, or is it a hit-and-miss thing until EFI becomes sufficiently widespread that some implementations become de facto standards ?
    • Does EFI, in your experience, tend to provide a more complete set of resolutions than VBE ? (better wide resolution support from EFI GPU firmwares)
    • Is there such a thing as a permanent access to the linear frame buffer with EFI, or are you unable to draw anything without a dedicated video driver once “boot-time” mode has been left ?

    You can get EDID (with both VBE and EFI), and you can use that to correct aspect ratio. Of course this is another reason for “resolution independence” (it’s very easy to correct aspect ratio when everything outside the video driver uses virtual coords).

    Preaching a convinced person there. I was all for resolution independence even before you mentioned it, the area where we don’t agree is at which level it should be implemented (widget level for me, video driver level for you).

    While BIOS emulation is currently possible, it should probably be treated as deprecated from this point on (as all major OSs already support EFI, and allowing BIOS compatibility to continue has little benefit). Don’t forget that it’s probably going to take you around 10 years to write the OS, and by the time it’s “finished” there will be people using computers that have never even heard of “BIOS” (just like there’s people around now that have never seen a floppy disk, audio tape or dial-up modem).

    Again, most people there have never heard of DOS, and yet their PC is designed to boot it given that a removable floppy drive is plugged in… But I see your point about 10 years. Sometimes, I wonder if this isn’t even optimistic… Getting stuff right in OSdeving just takes awful lots of time. If I drop this project at some point, it will probably be due to result lack frustration :)

    For future-proofing, for my project I have a “Boot Abstraction Layer” (BAL). The boot code depends on the firmware and passes information to the BAL. The BAL (and kernel and the rest of the OS) only depends on the information passed to the BAL (and doesn’t know/care what the firmware was).

    I do pretty much the same thing, except that stuff has a different name (bootstrap code or component here). Its goal is to do “anything fully arch-specific that the system needs before booting”. Thought about putting most of the VESA initialization stuff there too, considering how arch-specific it is.

    I’ve never had trouble with tearing, etc. If you blit data to display memory fast enough (especially if you avoid unnecessary writes – e.g. use dirty rectangles or something) it’s unnoticeable unless you’re playing a game or something with a high frame rate (and if you are playing a game or something with a high frame rate then you really do need 2D/3D acceleration and a native driver).

    I stand corrected here, then.

    You can have a unified video driver interface. It just means that the “raw framebuffer” driver has to do everything in software (and ancient low-end video cards may need to do some things in software too). Some features can also be “optionally supported” – e.g. if the video hardware (or driver) doesn’t support something like volumetric fog or shaders, then the application/s could ask if it’s supported before using it (so that it needn’t be supported in software by the driver).

    2D is just “3D with depth = 0″ – there’s nothing too special about it. Font data should be offloaded onto a “font engine”, where the video driver asks the font engine to convert a (Unicode) string into a texture (the video driver only needs to be able to draw textured polygons).

    Do not agree there. Modern 3D graphics have certainly not reached the beautiful simplicity of putting colored pixels on a multi-dimensional grid, either individually or through blitting, like you can do with good 2D libraries such as SDL. I’ve read that John Carmack was working on something like that for idTech 6, but he also said in the same interview that the hardware for that does not exist yet.

    So instead of this, we deal with the thousands of performance hacks of rasterization, and each new version of Direct3D and OpenGL showcases some new GPU functionality designed to provide a single-digit performance increase in some specific situations. This is what native video drivers have to deal with, and this is one of the reasons why FOSS ones are so far behind.

    I can’t see how emulating shaders in a framebuffer drivers just to get a unified interface that would need no major rewrite if someday native drivers are everywhere would be a good idea. Better just admit that we can’t get native drivers right now, in my opinion, and unite the majority of drawing code inside a single codebase, library or process, that can easily get hardware-accelerated.

    There’s also a whole pile of caching that the video driver should do. The “video script” should probably be a hierarchy of canvases, where each canvas is cached by the driver (and doesn’t need to be redrawn if the corresponding part of the script hasn’t changed). Note: for native video drivers the canvases would probably be cached in the video card’s memory where possible, so that fast/accelerated “display memory to display memory” blits can be used.

    Well, not talking about caching yet, but I agree that applications which work with native video drivers should be able to allocate video memory.

    Applications do not need an “unaccelerated, raw pixel” video interface – that’s just silly (more accurately, it’s a myth that originates from poor design from the past). For performance it doesn’t matter who does the drawing (someone still has to do it) but it means that you’re pushing large amounts of data around (rather than much smaller “video scripts”). It’s also a bad idea to duplicate the same drawing code in all applications (and force applications to care about which pixel format/s, which resolution/s, etc it has to support).

    Duplicating a video script interpreter inside all video drivers is not much better, in my opinion. The Linux graphic stack traditionally had each video driver bundle its own OpenGL interpreter, and the predictable result was that nowadays, the quality of your OpenGL implementation strongly depends on which driver you use, which goes pretty much against the initial purpose of abstract video interfaces. This is why I think that Gallium3D is such a good idea and hope that it’ll be successful : it’s a bare-metal abstraction of the hardware, so there are only so many things that driver implementers can mess up…

    Anyway, I don’t plan to duplicate drawing code everywhere, but rather to put it somewhere above the framebuffer/GPU abstraction layer of the driver. My initial idea was to put it above the multiscreen/window abstraction, in the “Widget toolkit” component. But I agree with you in the upcoming post that this is a bad idea after all…

    “Only the window manager has to know about the existence of multiple drivers and screen topology, as for applications above it’s only about drawing on their private window bitmap that will then get sliced and blitted on the multiple screens by the window manager.” You didn’t think about this enough! :-)

    Basically (for the “multi-monitor” example) your application has to generate pixel data for the wrong resolution and wrong colour depth, and then some 20 MiB of pixel data is sent to the window manager. The window manager has to split every horizontal line of pixel data in half and then send 10 MiB to one video driver. The other 10 MiB of data has to be rescaled (and rescaling pixel data isn’t “fast”, especially if you want good results) and converted to a different colour depth before sending the remaining 10 MiB of data to the other video driver. That is going to be very expensive (in comparison to doing the drawing at the right resolution and the right colour depth in the first place, which can’t be done by an application that only generates one buffer of pixel data).

    So instead, perhaps what I could do would be to split that stuff in two, the rendering on one side and resolution-independent widgets on the other side, and have…

    Framebuffer/GPU abstraction -> Primitive renderer -> Window abstraction/Multiscreen independence -> Widget engine/Resolution independence -> Desktop shell/Applications.

    In that case, you can imagine a typical workflow like this :

    • Applications says to widget engine something along the lines of “draw me a button of 1 x 3 grid units, and if there isn’t enough room for everything then it has priority level 4”
    • The widget engine looks up its theme data, I/O resolutions, and instructions from the application. From this it determines that a button is defined by a gradient of specific pixel sizes and key material properties (that can just be reflection/transmission properties, or something more), on top of each other. It asks the window manager to have that gradient drawn.
    • The window manager converts the window-relative coordinates to absolute screen coordinates, notice that there’s a screen frontier in the middle of the gradient, and splits it in two, before asking the primitive renderer to draw the gradients on each screen
    • The primitive renderer does the CIE to RGB conversion for each screen and draws on each screen’s virtual framebuffer
    • The video driver does the displaying job.

    This stack gets a bit long for my taste, but IPC is cheap (we’ve been talking about something in the order of the µs earlier, whereas optimal GUI response time is 10ms, which is 10000 times more), and that’s the price you pay for wanting to modularize something as complex as a resolution-independent GUI I guess. Seems it would work.

    I (deliberately) didn’t say what the “video script” would contain. It could be pure OpenGL – you call functions to position the camera, draw textured polygons, etc (and the library just adds these commands to a script) and then you call the “glFlush()” (and the library sends the script to the video driver). I personally wouldn’t use OpenGL alone though (it lacks support for things like font data rendering, and nothing says the video driver’s interface has to match the functions in the OpenGL library). It’s your OS, so feel free to decide the details yourself.. :)

    Well, see above…

    Rendering 3D graphics (with textured polygons, etc) in software isn’t that fast (although it doesn’t necessarily need to be too slow either, if you can throw many CPUs and SSE at it). Rendering 2D graphics in software can easily be fast enough though.

    Good, this is reassuring :)

    There is one thing I haven’t mentioned yet. If the video driver is doing the rendering, then the video driver can adjust the quality of the rendering to meet some sort of deadline. For example, for textured polygons that are small it could just use an “average” colour and ignore the texture, the “far” clipping plane could be made closer (and distant polygons skipped), lighting could be simplified (no shadows, no reflections), etc. A hardware accelerated video driver on a fast machine could produce 60 extremely detailed frames every second; and software rendering on a slow machine might produce 30 very simple frames every second; and the application/s, etc don’t need to know/care how detailed the frames are (or how fast the video is).

    This can also be done with the “renderer” component described below, I think : the primitive renderer can quickly evaluate the speed of each graphics hardware under its juridiction by running a “speed test” for each hardware at run time, and adjust graphical quality accordingly. For this to work, the graphics primitives should be relatively high-level though : there are only so many ways to optimize the rendering of a pixel or a line…

    Of course if the video driver draws a frame without much detail and then has nothing to do, then it could spend more time redrawing that same frame in higher detail. For example, you might get a low quality version of the frame within 50 ms, then (if nothing has changed) you get a better quality version after 150 ms, and then (if it still hasn’t changed) you get a detailed version of the frame after 1000 ms. If it’s done right most people won’t even notice the low quality versions of frames when doing things like office work. Then there’s screen-shots (take a copy of the script and texture data, then spend as much time as you like creating an image with extremely high detail).

    This means either caching video scripts or sending a “redraw” message all the way to the top of the stack to get a new copy of it. I think the latter is needed, though, in the event of a graphics driver crash leading to an unknown amount of screen corruption.

    For me, “GUI” is responsible for things like window decoration (and window size and position, and minimize/maximize, etc), the desktop, the task bar, application menu, handling “alt +tab” and which window has focus, allowing applications to be started, themes, etc. The “virtual screen manager” is responsible for controlling which devices are used by which GUIs (potentially including support for multiple virtual desktops where each virtual desktop runs a different GUI); and also handles user login and maybe some global things (like power management and screen savers). Your “window manager” seems to be a “less flexible” combination of both of these separate concepts. ;-)

    My idea was be to put the developer abstraction (continuous window to draw on) on the “window manager” part and the user abstractions (global system controls) on the “desktop shell” part. It should be noted that a GUI for controlling a system is relatively complex, so perhaps the desktop shell functionality could be modularized and brought by a myriad of independent privileged processes.

    Anyhow, for the use cases you quote…

    • Window decoration : Desktop shell (admittedly awkward for the “one decoration per window” model that is currently the norm, but I’m thinking of a global windows control model where it makes more sense)
    • The desktop : Desktop shell
    • Task bar : Undoubtedly desktop shell
    • Application menu : Global, high-level controls -> Desktop shell
    • Alt+Tab and window focus : Desktop shell.
    • Allowing applications to be started : Not quite sure what this is, sounds like something lower-level than GUI to me.
    • Which devices are used by which GUI : Several GUIs running at once ? Sounds beyond the scope of what I want to achieve, unless you can suggest a good use case where someone would want to do that, that can not be addressed by a full-screen window.
    • User login : Either desktop shell, or perhaps some “login manager” full-screen software which then gives control to the desktop shell, like on Linux.
    • Power management : Not been thinking about this a lot yet, although I have some ideas about PM flowing around from time to time. I’d probably hand it either to the driver or a similarly low-level system service, as it is something low level and highly hardware-dependent. Don’t know yet. Not put enough thoughts into this…
  8. Hadrien August 25, 2011 / 9:21 pm

    If the user sets up their monitor wrong, then the colour reproduction will be wrong too. That’s the user’s problem.

    If the system is completely unable to handle colours that RGB can’t represent, then that is the systems problem. Worst case is only supporting colours that all devices can represent – no rich cyan (because of RGB limitations); no bright reds, violets or greens because CMYK can’t handle them, etc (let’s hope the user doesn’t have a black and white printer!).

    To understand this, here’s a picture: http://www.thinkcamera.com/news/images/PS-2-colorspace.jpg

    The strange outer shape is all colours the eye can see. The triangles within it are the colours that a few different RGB systems and one CMYK system can represent. See how severely crippled RGB and CMYK are?

    Now tell me, for “8-bit per component” RGB, what value do you use for “10 times brighter than white”? White itself is probably 0xFFFFFF, so 10 times brighter than white would be…? Imagine you’ve got a picture of the sun in the sky in one window, but on top of that is another (grey) window that happens to be translucent – 10% of the light from behind the translucent window shows through. If you try to pretend that 10 times brighter than white is the same as “plain white” (saturation), then when you draw the translucent window on top you end up with dark grey when you should’ve ended up with white.

    Points taken. That floating-point CIE stuff definitely sounds interesting.

    Conversion from CIE_XYZ to RGB is about 3 steps. The first step is matrix multiplication (to convert XYZ into RGB while also adjusting the “white point” to suit the monitor). The second step is clipping the values to a specific range (0 to 1). The third step is gamma correction. Depending on how you do it; for 1024 * 768 it might take about 20 ms on one 2 Ghz CPU (but it’s easy to do in parallel on multiple CPUs; and you’d hopefully be using dirty rectangles or something to avoid updating the entire screen most of the time anyway). Don’t forget that converting “RGB with one whitepoint” into “RGB with a different whitepoint” is the same amount of work

    Ah, gamma correction, a gift from the CRT age… Is this stuff really still relevant ? I’ve not seen a gamma correction setting outside of a graphics editor’s effects for ages now. And avoiding the nonlinear operation in the workflow would certainly speed up calculations and allow for more nice non-accelerated tricks.

    Anyway, if it can be done that fast, I think I’m sold :)

    Of course that also assumes you’re doing the conversion from CIE_XYZ to RGB as late as possible in the rendering. You don’t need to do it that way. Instead you could do the conversion as soon as possible. For example, if the “video script” says something like “draw a blue rectangle” you convert “blue” into RGB first, then draw a rectangle. The same goes for textures – convert them to RGB once when the textures are loaded (and not each time the textures are used). You still need to limit the RGB values to a certain range and do gamma correction, but you’d probably halve the overhead (about 10 ms on one 2 Ghz CPU?).

    Yeah, with the “primitive renderer” component described above, that’s the way I’d do it.

    An (unsigned) floating point number requires 2 bits – one bit for the significand and one bit for the exponent. You probably want to use more bits than that (but you’re not limited to IEEE 754′s floating point formats). How about a 10-bit unsigned floating point value (with a 3 bit significand and a 7-bit exponent) so you can pack 3 of them into a 32-bit dword?

    If I don’t use standard floating point formats, how can I have the processor’s FPU play with them ? Isn’t software floating point emulation supposed to be super slow ?

  9. Brendan August 26, 2011 / 10:18 am

    As far as I understand, for EFI/UEFI there’s 2 protocols for graphics. The first is the Universal Graphics Adapter protocol (UGA), which is an older protocol used for “EFI version 1” and Apple machines. The second protocol is the Graphics Output Protocol (GOP) which replaced UGA and is used for “UEFI version 2” and later. There is also a (separate) protocol for obtaining EDID information.

    In both cases, during boot you can setup a video mode (via. boot services) and get a framebuffer that you can use; and you can continue using that framebuffer after you call “exit_boot_services()” to switch to run-time mode (but can no longer use UGA or GOP after calling “exit_boot_services()”). For VBE, I’d take the same approach – basically, boot code sets up a framebuffer, and after boot the rest of the OS uses the framebuffer (without caring if UGA, GOP or VBE was used to set it up).

    I haven’t actually used the graphics protocols yet, so I’m not too sure how complete the range of supported video modes is for any specific video card. I’d be pre-emptively pessimistic though (it’s better to have low expectations and then be pleasantly surprised than to have high expectations and be disappointed… ;-).

    When I said “2D is just 3D with depth = 0″ what I meant was that if you’ve already got 3D, 2D is very little extra work. The opposite isn’t true – if you’ve only got support for 2D then you’ve probably only got partial support for 2D (no 2D polygons, no 2D texture rotation, no 2D texture scaling, etc); and you probably need to redesign/rewrite to add 3D. Think of it like this: The first part of a 3D pipeline is projecting the 3D onto a 2D plane (which can be skipped/simplified for 2D graphics); and the second part of the pipeline is drawing textured 2D polygons (possibly with a Z-buffer or something).

    Think about a typical GUI, where the desktop is at the back, the window that currently has focus is at the front (just behind the mouse pointer), and other windows are somewhere between the front and the back. It does have depth, and therefore it is (a limited form of) 3D. The only difference is that all textures are parallel to the viewing plane and orthographic projection is used instead of perspective projection. For a “2D” GUI, it would be possible to make use of the “a form of 3D” fact and add things like transparency/translucency, lighting (with specular highlights, etc) and shadow. For a very simple example, have a good look at the way shadows are done in this old screenshot: http://forum.osdev.org/download/file.php?id=1195 – notice how the shadows enhance the illusion of depth.

    Now think of something like “texture = draw_script(script_that_can_reference_other_textures);”. You might have a script to draw a radio button which is just “fill the texture with a transparent background and draw a small black circle on it”. The resulting radio button texture might be included in an application’s script that says “draw a big white rectangle, and then draw 6 of those radio button textures on it”. The application’s texture might be included in a larger window texture that says “draw some borders and a window title, then draw the application’s texture inside it”. This window texture might be included in the script for a GUI texture that says “copy the background image texture, then draw the task bar texture onto it, then draw the window texture on it rotated slightly clockwise”. It’s a hierarchy of textures. Now imagine a 3D game that has a script that says “copy this part of the sky texture, draw a bunch of textured polygons, then draw a cube rotated like this, then draw the GUI’s texture on one side of that rotated cube”.

    Now imagine the user moves the application’s window somewhere else. The scripts for the radio button texture, the application’s texture and the window’s texture didn’t change, so they don’t need to be redrawn (the previously used/cached texture data can be reused). The GUI’s script did change (as a texture is being drawn in a different place now) so it needs to be redrawn. The 3D game’s texture also needs to be redrawn (even though it’s script is the same) because it includes a texture that did change.

    The video card might have the ability to use hardware acceleration to draw small black circles. In that case, the video driver might tell the video card’s hardware to draw the black circle instead of doing it itself (in software). Most video cards are capable of filling rectangles with a colour (including a transparent colour), and most are capable of copying data from one texture to another. Most video cards are also capable of drawing textured 3D polygons. Basically, for most video cards everything in that entire example could be hardware accelerated (except for maybe the “draw a black circle part) where (almost) everything is drawn by the video card’s hardware (and not drawn by the application, not drawn the video driver itself, and not drawn by any software in between).

    Of course not all video cards support all things. Maybe the video card is old and can’t handle transparent texture data. In that case, the video driver might need to copy the radio button texture into the application’s texture (but everything else would still be hardware accelerated). Not all video cards have the same amount memory either. Maybe some of those textures can’t be cached in the video card’s RAM and have to be cached in the computer’s RAM instead, or not cached at all and redrawn each time. That’s fine – the video driver can easily figure out which textures to cache and where (and do it dynamically, without pestering an unknown number of pieces of software scattered throughout the graphics stack).

    Finally; each video driver could export a “convert script/s to texture” service. You could also do the “convert script/s to texture” service in software. The computer I’m using now has 2 video cards and 16 logical CPUs, so if I ask the first video card to draw a very complex scene on the first monitor, then the driver for the first video card could split the “hierarchical tree of scripts” into 3 parts, and the rendering could be done on both video cards and all CPUs in parallel. Think about that for a little while, then…

    Let’s take that one step further. Imagine 2 different users sharing the same crusty old 66 MHz Pentium that has two ancient/crappy video cards, where the first user happens to be playing something like Crysis at a resolution of 1024*768 in extremely high detail on one monitor, while the other user is enjoying a Quake deathmatch in one window while watching a video in another window on the other monitor. Could this be possible? What if there’s a distributed system where video rendering is offloaded to any/all “convert script/s to texture” services on the LAN?

    Parallel (and even distributed) rendering using as much hardware acceleration as the hardware is capable of, things like 3D GUIs (I’m sure you’ve seen demos from the Project Looking Glass – if not, google now!) and special compositing effects, resolution and colour depth independence (including “infinite window zoom” without the “chunky pixels” effect), full multi-monitor support, automated (per frame) detail adjustment, etc; all without anything outside the video driver/s caring about more than simple scripts. It’s all relatively easy to do, as long as the video driver is responsible for all the rendering. As soon as something outside the video driver starts messing about with raw pixels the entire thing falls into a steaming heap of “too hard”. ;-)

    – Brendan

  10. Hadrien August 26, 2011 / 10:31 am

    Well, been reading a bit about CIE and ICC profiles, and it’s time to double-check some of my previous implicit assumptions.

    So far, I’ve more or less asumed that the relation between CIE and RGB was linear. Looking at the ICC specs, this is obviously not true : the relationship is apparently potentially totally random, in the worst case it is essentially lookup tables. Linear interpolation is only used to save space. I have my answer about gamma, too : so called “B curves” seem to *still* be around in 2011. Sigh. Damn optoelectronics engineers :)

    What this also means is that gradient rendering just seen a jump in complexity. For perfect device-independent rendering, gradients need to be defined in the CIE color space. Let’s forget for a minute that CIE doesn’t have the same mixing properties as RGB and that a linear gradient in CIE will potentially have a totally different look : if the relationship between CIE and RGB is totally random, each line of the gradient must have its color calculated independently, with all the table lookup it implies.

    Does this all change your statements concerning performance ?

  11. Brendan August 26, 2011 / 11:21 am

    Gamma correction is still very relevant. The RGB pixels you’re used to are actually “RGB with gamma”. There’s a good reason for this – it gives better results for small values. For example, for 8-bit per colour, the values 0x00, 0x01, 0x02 with gamma represent intensities that are much closer together than the values 0x00, 0x01, 0x02 without gamma.

    Floating point emulation is slow. For various “tiny floating point” encodings you’d convert to integer, float or double, then do your calculation/s, then convert back to “tiny floating point” or store the result as 8 bit integers (e.g. RGB components) or leave it as float for later.

    For example, for the “3 bit significand and a 7-bit exponent” example you’d be able use a 1024 entry lookup table to convert the “10-bit floating point” into a normal float or double. The opposite (converting floats or double back into “10-bit floating point”) involves manipulating the binary representation of a float or double to extract the most significant significand bits and exponent.

    If the exponent is smaller (e.g. “4-bit significand, 4-bit exponent”) you’d be able to convert to/from fixed point integer relatively easily too (e.g. “uint32_t value = (significand | 0x10) << exponent").

    It would get messy if you want full support for negative numbers, sNaN and qNaN, infinities, rounding modes, exceptions, etc (but other than support for "zero" you probably won't need any of that).

    Also, depending on what you're doing you could invent something entirely different, specifically for that purpose. For example, you could have a 4 bit intensity significand, a 5 bit intensity exponent, a 4 bit "ratio of green to red+blue" and a 3-bit "ratio of red to blue"; and use that as your own 16-bit "LAB" encoding. You could even do something like what JPG does, and have (for e.g.) 4 intensity values for 4 adjacent pixels that share the same hue data (e.g. I1, I2, I3, I4, shared A, shared B).

    Mostly all I'm saying is that (for storage) you're not necessarily restricted to the data types that are supported natively by compilers/CPUs; and if you really want to pack the most data into the least space there's plenty of alternatives.

    – Brendan

  12. Hadrien August 26, 2011 / 11:42 am

    As far as I understand, for EFI/UEFI there’s 2 protocols for graphics. The first is the Universal Graphics Adapter protocol (UGA), which is an older protocol used for “EFI version 1″ and Apple machines. The second protocol is the Graphics Output Protocol (GOP) which replaced UGA and is used for “UEFI version 2″ and later. There is also a (separate) protocol for obtaining EDID information.

    In both cases, during boot you can setup a video mode (via. boot services) and get a framebuffer that you can use; and you can continue using that framebuffer after you call “exit_boot_services()” to switch to run-time mode (but can no longer use UGA or GOP after calling “exit_boot_services()”). For VBE, I’d take the same approach – basically, boot code sets up a framebuffer, and after boot the rest of the OS uses the framebuffer (without caring if UGA, GOP or VBE was used to set it up).

    Well, I would’ve liked to know if implementing GOP was mandatory for UEFI implementations. But googling about GOP showed me that it’s not, and even if GOP is actually there, it doesn’t matter, because implementations do not have to provide an easily accessible framebuffer anyway. I hate modern standard bodies…

    Sources :
    http://wiki.phoenix.com/wiki/index.php/EFI_GRAPHICS_OUTPUT_PROTOCOL
    http://wiki.phoenix.com/wiki/index.php/EFI_GRAPHICS_OUTPUT_MODE_INFORMATION

    Anyway, I guess that I can indeed create a frame buffer abstraction that works with both VESA and UEFI, and panic the bootstrap code if none is available… (Not supporting EFI v1. EFI is currently a niche, and EFI v1 is already deprecated, so it would be basically supporting a niche within a niche)

    I haven’t actually used the graphics protocols yet, so I’m not too sure how complete the range of supported video modes is for any specific video card. I’d be pre-emptively pessimistic though (it’s better to have low expectations and then be pleasantly surprised than to have high expectations and be disappointed… ;-).

    Re : I hate modern standard bodies… How much does it cost them to write “this functionality, if provided, must at least be useful and reflect all hardware video modes” ?

    When I said “2D is just 3D with depth = 0″ what I meant was that if you’ve already got 3D, 2D is very little extra work. The opposite isn’t true – if you’ve only got support for 2D then you’ve probably only got partial support for 2D (no 2D polygons, no 2D texture rotation, no 2D texture scaling, etc); and you probably need to redesign/rewrite to add 3D. Think of it like this: The first part of a 3D pipeline is projecting the 3D onto a 2D plane (which can be skipped/simplified for 2D graphics); and the second part of the pipeline is drawing textured 2D polygons (possibly with a Z-buffer or something).

    That is a given, but why would I want to put the complexity of 3D rendering in my desktop to begin with ? Implementing 2D graphics on top of 3D hardware is easy, implementing 2D graphics on top of 2D hardware is easy, implementing 3D graphics on top of 2D hardware is hard, so a 2D graphic stack sounds like the “sweet spot” there. You seem to provide some explanations, though, so moving on…

    Think about a typical GUI, where the desktop is at the back, the window that currently has focus is at the front (just behind the mouse pointer), and other windows are somewhere between the front and the back. It does have depth, and therefore it is (a limited form of) 3D. The only difference is that all textures are parallel to the viewing plane and orthographic projection is used instead of perspective projection. For a “2D” GUI, it would be possible to make use of the “a form of 3D” fact and add things like transparency/translucency, lighting (with specular highlights, etc) and shadow. For a very simple example, have a good look at the way shadows are done in this old screenshot: http://forum.osdev.org/download/file.php?id=1195 – notice how the shadows enhance the illusion of depth.

    A matter of taste, I guess, as I have a totally different feeling towards this. Myself, I find the core idea of rendering a 3D interface on a 2D screen manipulated with a 2D pointing device to be a cumbersome hack, and would like to put a tiled GUI in my OS where everything that’s displayed on screen is on a single plane. I also tend to disable shadows whenever I encounter them, as I find them either useless (when they are subtle) or distracting (when they are not subtle).

    I’ve met other people who feel like you about them, though, so again I think this is heavily a matter of taste. The thing is, this is my OS, so I’ll first develop for people who have a taste similar to mine, and then will think about 3D interfaces for incompatible version #1 (aka “the second system”) if there is really strong demand for them.

    Now think of something like “texture = draw_script(script_that_can_reference_other_textures);”. You might have a script to draw a radio button which is just “fill the texture with a transparent background and draw a small black circle on it”. The resulting radio button texture might be included in an application’s script that says “draw a big white rectangle, and then draw 6 of those radio button textures on it”. The application’s texture might be included in a larger window texture that says “draw some borders and a window title, then draw the application’s texture inside it”. This window texture might be included in the script for a GUI texture that says “copy the background image texture, then draw the task bar texture onto it, then draw the window texture on it rotated slightly clockwise”. It’s a hierarchy of textures. Now imagine a 3D game that has a script that says “copy this part of the sky texture, draw a bunch of textured polygons, then draw a cube rotated like this, then draw the GUI’s texture on one side of that rotated cube”.

    Well, I think it is fine for applications which mainly deal with 3D (like video games) to try to implement 2D graphics on top of it in order to have a single, consistent abstraction. But for stuff which almost exclusively deals with 2D, like a desktop shell, I think that going through 3D rendering is a needlessly cumbersome step, especially considering how limited the 3D that can be rendered without hardware acceleration is (in my experience, even a non-trivial rotation angle would be too much).

    Now imagine the user moves the application’s window somewhere else. The scripts for the radio button texture, the application’s texture and the window’s texture didn’t change, so they don’t need to be redrawn (the previously used/cached texture data can be reused). The GUI’s script did change (as a texture is being drawn in a different place now) so it needs to be redrawn. The 3D game’s texture also needs to be redrawn (even though it’s script is the same) because it includes a texture that did change.

    The video card might have the ability to use hardware acceleration to draw small black circles. In that case, the video driver might tell the video card’s hardware to draw the black circle instead of doing it itself (in software). Most video cards are capable of filling rectangles with a colour (including a transparent colour), and most are capable of copying data from one texture to another. Most video cards are also capable of drawing textured 3D polygons. Basically, for most video cards everything in that entire example could be hardware accelerated (except for maybe the “draw a black circle part) where (almost) everything is drawn by the video card’s hardware (and not drawn by the application, not drawn the video driver itself, and not drawn by any software in between).

    Of course not all video cards support all things. Maybe the video card is old and can’t handle transparent texture data. In that case, the video driver might need to copy the radio button texture into the application’s texture (but everything else would still be hardware accelerated). Not all video cards have the same amount memory either. Maybe some of those textures can’t be cached in the video card’s RAM and have to be cached in the computer’s RAM instead, or not cached at all and redrawn each time. That’s fine – the video driver can easily figure out which textures to cache and where (and do it dynamically, without pestering an unknown number of pieces of software scattered throughout the graphics stack).

    I think I know where our views strongly diverge. You act in the optimistic perspective that native video drivers will be available and of good quality, and that one must design stuff around them because, well, GPUs are everywhere so better use them. I act in the pessimistic perspective that native video drivers will never work well, or at least not until a very long time (how old is Linux’s direct rendering infrastructure now ?), and that the availability of accelerated graphics is just a specific option that, while it must be taken account for, is not to be made a core assumption of the system and designed around.

    Finally; each video driver could export a “convert script/s to texture” service. You could also do the “convert script/s to texture” service in software. The computer I’m using now has 2 video cards and 16 logical CPUs, so if I ask the first video card to draw a very complex scene on the first monitor, then the driver for the first video card could split the “hierarchical tree of scripts” into 3 parts, and the rendering could be done on both video cards and all CPUs in parallel. Think about that for a little while, then…

    I do not exclude the possibility of offloading some “primitive renderer” operations to GPUs when it’s possible, and I certainly am for using multiple CPU cores. I just think that GPUs can render 2D primitives reasonably well without a need to optimize specifically for their intricacies. After all, if a CPU-powered software renderer can do it, it should be a piece of cake for a GPU-powered renderer even if the code is suboptimal, right ?

    Let’s take that one step further. Imagine 2 different users sharing the same crusty old 66 MHz Pentium that has two ancient/crappy video cards, where the first user happens to be playing something like Crysis at a resolution of 1024*768 in extremely high detail on one monitor, while the other user is enjoying a Quake deathmatch in one window while watching a video in another window on the other monitor. Could this be possible? What if there’s a distributed system where video rendering is offloaded to any/all “convert script/s to texture” services on the LAN?

    Hmmm… There we get a bit outside of the limitations I voluntarily put on my design to keep my candid illusion that it is doable by one person in a reasonable time frame. I design for mono-user operation (more precisely, “one user at a time”), and mostly local one (even though I’m open to using distributed computing for non-critical tasks outside core system services, like backing up computer A on computer B, where nobody care if the backup fails once because computer B has crashed).

    Parallel (and even distributed) rendering using as much hardware acceleration as the hardware is capable of, things like 3D GUIs (I’m sure you’ve seen demos from the Project Looking Glass – if not, google now!) and special compositing effects, resolution and colour depth independence (including “infinite window zoom” without the “chunky pixels” effect), full multi-monitor support, automated (per frame) detail adjustment, etc;

    • I’ve said above what I think of 3D GUIs. For the short version : shiny but impractical.
    • Pretty compositing effects : You can totally have some primitive graphic operations that will be reported as unavailable and ignored if the hardware power for them is not available, like distorsion, rotation, embossing and specularity…
    • Resolution and color depth independence : For resolution, I’ve already stated why I think this belongs to the widget level. For color depth, I don’t see what’s the problem with putting CIE-to-RGB conversions above the graphics driver, given that graphics hardware itself only speaks RGB and separate color profiles will be needed anyway.
    • Zoom : What for, apart from being cool-looking ? If text/icons/whatever else are not usable because the user has bad sight, then everything on the screen should be made bigger, and this can be done by putting support for it in the widget engine. Zooming stuff which is not a standard system widget without chunky pixels will always require per-application support anyway.
    • Multi-monitor : Either you want a “continuous screen” abstraction, and this is the job of the window manager, or you want to have screens considered as separate entities, and I guess support for this can be implemented in the window manager too, isn’t it ? What’s missing ?
    • Detail adjustment : Can be done at the primitive rendering level, unless I’m missing something

    all without anything outside the video driver/s caring about more than simple scripts. It’s all relatively easy to do, as long as the video driver is responsible for all the rendering.

    Do you hate video driver manufacturers so much that you want them to implement super high-level system abstraction so that every other system component can believe that computer graphic hardware fluently manipulates vector graphics and write sub-optimal code based on that ? Do you hate users so much that you want a large critical part of the operating system to be written in the low-quality, hardware-dependent codebase of video drivers ? Do you hate yourself so much that you want to inflict upon yourself the consequences of this choice, on the day where you decide to release the much improved “video script v2” specification and discover that it will take ages until video drivers catch up and you can actually use it ?

    As soon as something outside the video driver starts messing about with raw pixels the entire thing falls into a steaming heap of “too hard”. ;-)

    I don’t see why yet.

  13. Hadrien August 26, 2011 / 11:51 am

    Gamma correction is still very relevant. The RGB pixels you’re used to are actually “RGB with gamma”. There’s a good reason for this – it gives better results for small values. For example, for 8-bit per colour, the values 0×00, 0×01, 0×02 with gamma represent intensities that are much closer together than the values 0×00, 0×01, 0×02 without gamma.

    How come we don’t have light emitters that follow the response of the human eye yet, after all this time ? :)

    Anyway, about floating point units, if I sum it up, using these “small” FP values are a form of (destructive) image compression, like monochromatic bitmaps or jpeg pictures : useful for storage, useless for real-time calculations. Well, it’s worth taking anyway ^^

  14. Brendan August 26, 2011 / 11:58 am

    Hi,

    For perceptual uniformity (e.g. smooth gradients), RGB isn’t really good with or without gamma (although with gamma is slightly better than without). CIE XYZ is equally “not good” (probably equivalent to RGB without gamma). As far as I know, CIE_LAB is the best colour space/encoding to use for that purpose.

    I’ve lost track of my statements about performance. I think originally I was mostly only considering conversion from CIE_XYZ to RGB alone, but since then we’ve been looking at gamma and conversion/s between alternative data encodings (and maybe some sort of LAB) on top of that.

    In general (for all pieces of non-trivial software), some compromise (e.g. between features, correctness, performance and development time) is to be expected. With that in mind it might be a good idea to decide some design details and benchmark things like conversions, to better estimate viability. For example, to get adequate performance you might (or might not) need to do conversion to RGB before rendering (and lose some correctness), rather than doing rendering in CIE_XYZ or CIE_LAB and then converting to RGB at the last step.

    – Brendan

  15. Hadrien August 26, 2011 / 1:27 pm

    Well, looking at the ICC spec, CIE_XYZ and CIELAB seem to play a symmetrical role in it. Looking at the Wikipedia article on Lab, it seems that the conversion between both is simple (or, at least, well-known). Apparently, XYZ is a sort of de facto standard that is purely based on physical phenomena, whereas Lab is more convenient for perceptual uniformity. Therefore, perhaps Lab would be a better fit for internal rendering calculations, as it makes everything nicely linear ? Can’t tell.

    Anyway, performance tweaks may indeed be needed, guess I have to wait until I reach implementation phase. It’s crazy to imagine that I could reach the limits of a modern CPU simply by drawing standard GUI controls in a non-native color space, though.

  16. Brendan August 26, 2011 / 1:54 pm

    Hi,

    You’re right about our different approaches (designing for what is easy vs. designing for what could/should be).

    Ironically, I used to be a bit like you. I remember a long time ago someone asked me if I planned to support SMP. I explained to them that SMP was a niche, and the price of SMP machines was far too high (back then you could buy a pair of complete single-CPU machines for about the same price as one dual-CPU machine), and therefore it was mostly a waste of time, and that I wasn’t going to bother with SMP.

    I’ve lost track of the number of rewrites I’ve done since. I learnt the hard way that the only way to avoid rewrites caused by design limitations is to avoid design limitations.

    Imagine if all of your software expected direct access to a pixel buffer. Let’s say that some time in the future you want to change to a “video driver is responsible for all rendering, hopefully with hardware acceleration” model. How would you go about introducing that change? Would you rip out the old interface and replace it with the new interface, so that all software breaks until it’s rewritten? Would you create the new interface, and then have legacy support for the old interface for a few years to give people a chance to rewrite their software? Would you keep the old interface for a decade? Would you keep it around forever?

    This isn’t just a silly example though…

    Did you know that GDI+ continues to rely on software rendering in Windows 7, and that Microsoft shifted to (hardware accelerated 2D) WDDI and Direct2D about 5 years ago? Do you think it’s funny that GDI/GDI+ is probably going to hang around like a stale fart stinking up Windows for the next decade?

    Of course I’m not planning to write full hardware accelerated video drivers – I’ll be hard pressed just to get “no native driver” software rendering done. I am planning to design a system such that in future, pre-existing software could take advantage of full hardware accelerated video drivers if they’re ever written. If I don’t design the system in this way, then nobody will ever have much reason to bother writing full hardware accelerated video drivers (as no software would exist to use them), and even if they did bother it’d take too much effort to rewrite all the existing software.

    Now have a look at current 3D monitor prices. The “polarised stereoscopic” stuff is about the same price as a normal monitor. What do you think is going to happen with 3D displays in the next 10 years? By the time your OS is ready, people are going to be using “touch free 3D” input devices (like Microsoft’s Kinect) and full 3D displays. Do you want to spend 5 years writing an OS, and then give up and redesign it from scratch before you get half way?

    For “zoom”, take a look at this: http://en.wikipedia.org/wiki/Zooming_user_interface

    Anyone writing a modern video driver (for any OS) has to support hardware accelerated 2D/3D. Increasing the video driver writer’s hassles by also making them deal with an additional obsolete “pixel buffer” API doesn’t make any sense to me, given that every decent OS has abandoned those APIs.

    Note: for Apple, the ancient QuickDraw API used a pixel buffer approach, and it got deprecated and replaced with Quartz which doesn’t use the pixel buffer approach; for Windows, GDI used a pixel buffer approach and it got replaced by WDDM which doesn’t; for X I’m not sure what it did (I suspect it was a messed up hybrid if pixel and non-pixel based approaches, as “ugly mess” is typically the right assumption when X is involved) but the new push is towards using OpenGL instead. On top of that there’s modern languages like Java, which never had any pixel based graphics API from the start.

    – Brendan

  17. Hadrien August 26, 2011 / 4:07 pm

    You’re right about our different approaches (designing for what is easy vs. designing for what could/should be).

    As an aside, we’re very much sounding like pro and anti-abortion people sometimes :) One calls himself pro-choice, implying that the other is anti-choice, while the other calls himself pro-life, implying that the other is anti-life. Such is the power of words…

    Ironically, I used to be a bit like you. I remember a long time ago someone asked me if I planned to support SMP. I explained to them that SMP was a niche, and the price of SMP machines was far too high (back then you could buy a pair of complete single-CPU machines for about the same price as one dual-CPU machine), and therefore it was mostly a waste of time, and that I wasn’t going to bother with SMP.

    I’ve lost track of the number of rewrites I’ve done since. I learnt the hard way that the only way to avoid rewrites caused by design limitations is to avoid design limitations.

    I can see your point there. However, I also believe that you can’t fully avoid design limitations, and the rewrites they bring, if you want to get something out of the door at some point. Development of Windows Vista and Apple’s “Copland” are, in my opinion, good example of what happens of what happens when developers try to put everything and the kitchen sink in a software release, without stating “okay, this will be for R2 if there’s demand for it at some point”.

    There is a good balance to reach. I don’t know where it is, but I believe that it’s neither “quick and dirty” nor “no design limitations”, but something in-between that is more complex, and defined by stuff such as goals, use cases, etc.

    Imagine if all of your software expected direct access to a pixel buffer. Let’s say that some time in the future you want to change to a “video driver is responsible for all rendering, hopefully with hardware acceleration” model. How would you go about introducing that change? Would you rip out the old interface and replace it with the new interface, so that all software breaks until it’s rewritten? Would you create the new interface, and then have legacy support for the old interface for a few years to give people a chance to rewrite their software? Would you keep the old interface for a decade? Would you keep it around forever?

    Guess that this is where having control on your software ecosystem, the Apple way, is useful :) Anyhow, here’s how I’d go about it.

    • Come up with the new video driver model. Make it so that video drivers can support both the new and the old version. Ask all video drivers developers to develop “dual-stack drivers”, and provide a “test suite” for them.
    • While waiting for the new video drivers to mature, write a version of the graphic stack which can works on top of them. Make the system decide at boot time which one to use depending on which drivers are available.
    • By the time drivers have matured, everything which uses the system’s graphic stack (so 99% of software which would use the system’s rendering primitives at all) gets to use it freely.
    • Once all known drivers are compatible with the new stack, deprecate the old one. All development tools which I have control on now generate errors when compiling code which uses the old stack. Next version of the OS includes these new development tools. Make public announcements.
    • Once most commonly used software (which we have personally contacted) is ready, remove the need to develop dual-stack drivers, and make the new stack a requirement. As time passes, they will gradually drop the old abstractions on their own in order to remove cruft in their code.

    Compatibility breakage sucks, I agree. But it is sometimes necessary. In this case not, thankfully, as driver-side rendering is an horrible mistake which Linux has experimented for us alternative OSdevers :)

    This isn’t just a silly example though…

    Did you know that GDI+ continues to rely on software rendering in Windows 7, and that Microsoft shifted to (hardware accelerated 2D) WDDI and Direct2D about 5 years ago? Do you think it’s funny that GDI/GDI+ is probably going to hang around like a stale fart stinking up Windows for the next decade?

    So ? For most software’s UI, software rendering is obviously good enough, otherwise Microsoft would push developers harder. The real problem here is that Microsoft have not made their GUI toolkit abstract enough that it can work with both software and hardware rendering. We agree there. I’ve suggested a way it can work with my OS. So what is your issue ?

    Of course I’m not planning to write full hardware accelerated video drivers – I’ll be hard pressed just to get “no native driver” software rendering done. I am planning to design a system such that in future, pre-existing software could take advantage of full hardware accelerated video drivers if they’re ever written. If I don’t design the system in this way, then nobody will ever have much reason to bother writing full hardware accelerated video drivers (as no software would exist to use them), and even if they did bother it’d take too much effort to rewrite all the existing software.

    We partly agree there. The “primitive renderer” component which I’ve described above can work above a hardware accelerated driver as well as a software accelerated one.

    Where I disagree is that I think existing software will not work better because it’s hardware accelerated. Developers wrote their software within the boundaries of software rendering capabilities, and there is thus no need for hardware rendering to run that software well. The point of such an architecture, in my opinion, is rather to make deprecation of the 2D renderer easier in a highly hypothetic future where I could do that. No, what can push the development of HW accelerated drivers, in my opinion, is the wish by some dev to see *new kinds* of software (3D games, 3D modelling) emerging.

    Now have a look at current 3D monitor prices. The “polarised stereoscopic” stuff is about the same price as a normal monitor. What do you think is going to happen with 3D displays in the next 10 years?

    Not going to go very far until they get rid of these glasses on larger screens. At this point, it will probably make its way to consumer PCs, like widescreens and glossy displays have had, because people will want to watch 3D movies and play 3D games. But a 3D office suite or file explorer ? Not so much, except for subtle eye candy. Stereoscopic effects are known to cause eye strain and headaches when used over extended periods of time, not to mention the current estimate that 10% of the world population (which I belong to btw) won’t see the difference at all.

    By the time your OS is ready, people are going to be using “touch free 3D” input devices (like Microsoft’s Kinect)

    No. The Kinect way to touch-free 3D is like voice command : it has its place for specific use cases, but it is impractical as a general-purpose interface. Honestly, would you want to do wild gestures and Hitlerian salutes in front of your computer when coding or editing videos ? Or browsing files ?

    and full 3D displays.

    This I’m more interested in. If you’re just talking about stereoscopic displays, you already know what I think of it, but if you’re talking about a device that’s actually capable of displaying 3D objects, either by faking it with eye tracking or using holograms… Now THAT’s a bold claim. Not sure it would change the way we use computers for productive purposes, but that would certainly be a little technical revolution.

    Do you want to spend 5 years writing an OS, and then give up and redesign it from scratch before you get half way?

    I would give up as well if I had to spend those 50 years it takes for a single person to write the perfect OS. With a simple, clean design, I may be lucky and not give up. With “everything-and-the-kitchen-sink”, I know that I will give up to begin with. Better choose the biggest probability of success :)

    For “zoom”, take a look at this: http://en.wikipedia.org/wiki/Zooming_user_interface

    Nice indeed ! Wish well to anyone implementing that in a general-purpose fashion, though, designing it so that it is usable and enjoyable is going to be crazily painful. I think that GUIs are good enough myself, so I’m not going there, but it’s like managed code OSs : an interesting experiment for someone else to work on..

    Anyone writing a modern video driver (for any OS) has to support hardware accelerated 2D/3D. Increasing the video driver writer’s hassles by also making them deal with an additional obsolete “pixel buffer” API doesn’t make any sense to me, given that every decent OS has abandoned those APIs.

    Good point. I have forgotten to mention that with the aforementioned “primitive renderer”-based architecture, video drivers don’t *need* to provide a 2D interface anymore. As soon as the renderer has gotten a hw-accelerated backend, which can be provided anytime in the future, it doesn’t care anymore.

    As a result, the job is actually simpler than in your solution : if you’re a lazy video driver dev who wants to hide its precious framebuffer, what you do is to contribute code to the primitive rendering service so that it can work with GPU-accelerated rendering, and then every single video driver developer (not only the one who did the work) doesn’t need a compatibility 2D interface anymore. Simple, efficient, evolutive.

    Note: for Apple, the ancient QuickDraw API used a pixel buffer approach, and it got deprecated and replaced with Quartz which doesn’t use the pixel buffer approach; for Windows, GDI used a pixel buffer approach and it got replaced by WDDM which doesn’t; for X I’m not sure what it did (I suspect it was a messed up hybrid if pixel and non-pixel based approaches, as “ugly mess” is typically the right assumption when X is involved) but the new push is towards using OpenGL instead. On top of that there’s modern languages like Java, which never had any pixel based graphics API from the start.

    Again, I agree with the principle that the rendering component should not intrinsically depend on the existence of a pixel buffer. It is the case in the architecture I suggest. Now, I must add that all decent graphics APIs still have such a thing as a canvas that applications can draw on, only updated to fit modern use cases.

    About Java, I’m not sure it is the perfect example of a clean graphics stack, or a clean standard library for that matter, but I’m curious : how does Powder Game, where controls are nonstandard and every powder gain is exactly 1px wide, manage to work without pixel-based drawing ?

  18. Hadrien August 26, 2011 / 6:34 pm

    Well, we’ve been discussing quite a lot here, and I’ve been changing my plans significantly thanks to your input, so I think that I’m going to do a new version of this article with the current plans, to make things clearer. Reading through this comment section starts to get difficult :)

  19. Brendan August 27, 2011 / 2:32 pm

    Hi,

    For design; I’d distinguish between the design of public interfaces and the design (and implementation) of normal code (that inevitably ends up relying on those public interfaces); where “public interfaces” is things like APIs, protocols, etc that lots of different programmers may need to deal with.

    Public interfaces are difficult to change later (as lots of different pieces of code, potentially written by many different people, ends up relying on them), and therefore public interfaces should be designed well and designed to be as “future proof” as possible (including covering any/all features that might be desired in future). The design and implementation of normal code doesn’t matter much, as it’s a lot easier to change (and a lot easier to have many different competing/alternative implementations).

    For a simple example, something like a networking protocol for email really needs to be designed well (taking into account all use cases, etc), but the actual code for any specific email client could be a badly designed mess that barely works and it wouldn’t matter too much (as people can write better email clients or improve the existing client).

    An OS could be thought of as a set of public interfaces, where the same applies – the design of those public interfaces is very important, but the actual code doesn’t matter much. For example, you could design a video driver interface that takes into account *everything*, but then only actually implement one crappy driver that only supports part of the video driver interface (due to time constraints or whatever) and that’s fine. Alternatively you could design a crappy video driver interface (due to time constraints or whatever) and spend ages implementing the best video driver you can, and even if the code is excellent the “crappy driver” would be far better.

    You could even go a step further: an OS is a set of formal specifications (that define public interfaces, etc), and an OS doesn’t actually need to have any code at all (only an implementation of the OS would need code, not the OS itself). A good OS with a bad implementation is superior to a bad OS with a good implementation.

    I wouldn’t agree that Linux is an example of “driver-side rendering is an horrible mistake”. Instead, I would agree that Linux is an example of “herding cats rarely works”. The chance of getting a clean and consistent design that’s implemented well by all parties is virtually zero when there’s that many completely different groups (OpenGL, Xorg, KDE, Gnome, Intel, AMD, NVidia, kernel developers, etc) each with completely different goals and no clear leadership to guide them. Under these circumstances, if driver-side rendering “sort of works sometimes by accident” then it’s a miraculous success rather than a horrible mistake. :-)

    “Where I disagree is that I think existing software will not work better because it’s hardware accelerated. Developers wrote their software within the boundaries of software rendering capabilities, and there is thus no need for hardware rendering to run that software well.”.

    This is one place we disagree. If developers wrote their software within the boundaries of software rendering, then there’s been a design failure (regardless of whether hardware acceleration helps them or not). Developers should write their software within the boundaries of a well designed video interface that has been designed to ensure that hardware acceleration (if/when available) will help (and developers shouldn’t be writing software within the boundaries of software rendering to begin with). If the design failure was made 20 years ago (e.g. by Microsoft and Apple) then it’s excusable (as predicting the future is hard). If the design failure is made by anyone today then that’s not excusable (as predicting the past is easy, and history suggests that 2D/3D acceleration will be ubiquitous by about 1995, and “software rendering” will be obsolete by the year 2005).

    Of course by now I think our designs are mostly similar anyway. Mine could be described as “(basic mode setting + hardware acceleration + primitive renderer) + (various stuff on top)” while yours has become “(basic mode setting + hardware acceleration) + (primitive renderer + various stuff on top)”. The only main difference is where the “primitive renderer” is – in the video driver in my case, and just above the video driver in your case. The end result of this difference is that you’ve got an extra “public interface” (between the video driver and primitive renderer) that has to be designed well (and thoroughly documented, e.g. as a formal specification) which may or may not become a limitation for video driver writers; and I’ve got potential code duplication (which is a problem that can be solved easily in a variety of ways – either by providing a “software renderer” service, or providing an open source reference driver that driver writers can “cut & paste”, or having a shared library that can be used by any video drivers that want it, etc).

    About that Powder Game. I think I found the one you mean, and to be honest I’m not too sure exactly how it works. There’s only really 2 alternatives though – using the native Java graphics API and drawing very small rectangles (where each rectangle happens to be about the same size as a pixel), or using an OpenGL library and modifying a texture (where a “texel” happens to be about the same size as a pixel). In both cases, the code that does the drawing wouldn’t work with pixels (only rectangles or texels).

    – Brendan

  20. Hadrien August 28, 2011 / 11:51 am

    I strongly agree that in the realm of design, some stuff is more difficult to change later and requires more care than other stuff. I’d just extend this notion beyond the realm of directly-accessed programmatic interfaces, by also putting interfaces which users directly interact with into the mix. No one loves controls randomly moving and disappearing, ever-changing keyboard shortcuts, etc. A good UI is one which you can use without needing to think about it, and which feels the same but better after an update.

    I wouldn’t agree that Linux is an example of “driver-side rendering is an horrible mistake”. Instead, I would agree that Linux is an example of “herding cats rarely works”. The chance of getting a clean and consistent design that’s implemented well by all parties is virtually zero when there’s that many completely different groups (OpenGL, Xorg, KDE, Gnome, Intel, AMD, NVidia, kernel developers, etc) each with completely different goals and no clear leadership to guide them. Under these circumstances, if driver-side rendering “sort of works sometimes by accident” then it’s a miraculous success rather than a horrible mistake. :-)

    I think this is also true of the Linux ecosystem. It is not dictatorial enough to work :) The problem is, isn’t anything GPU-related a cat herding job ? Each thing you want to be put in a GPU driver requires the cooperation of a number of diverging interests (the spec designer if it’s not you, and every single native video driver manufacturer). This is why, in my opinion, one should make drivers do as little as possible, only offering the basic abstraction it takes to all manipulate them in a consistent way, and then leave the real work to higher-level components. Less things in drivers means less things which can go wrong due to unrelated manufacturers taking unrelated decisions based on unrelated interests.

    “Where I disagree is that I think existing software will not work better because it’s hardware accelerated. Developers wrote their software within the boundaries of software rendering capabilities, and there is thus no need for hardware rendering to run that software well.”.

    This is one place we disagree. If developers wrote their software within the boundaries of software rendering, then there’s been a design failure (regardless of whether hardware acceleration helps them or not). Developers should write their software within the boundaries of a well designed video interface that has been designed to ensure that hardware acceleration (if/when available) will help (and developers shouldn’t be writing software within the boundaries of software rendering to begin with).

    But in the end they do, because they write and test their software on real-world machines, with real-world capabilities. So they set expectations and use the interfaces in a manner that works with those machines.

    To say it with a simple example : if no one can run a 3D game, no one will write a 3D game, even if there’s a programmatic interface of it. Which is why there’s a chicken and egg problem with 3D software and 3D drivers, that I’ve think you’ve mentioned yourself earlier in this discussion.

    Now, I agree that it’s possible to *enhance* the appearance or behaviour of existing software by switching to hw-accelerated rendering in the backend. As an example, if software uses standard system widgets, and the widgets are made in glassy 3D with refractive effects, then the eye candy is free even for software which was developped before the availability of hw acceleration. And scrolling a bitmap will get smoother if the system stack gets hardware-accelerated. But that does not bring groundbreaking changes to the way the software works, or the way people use it. It will be an extra comfort, not a revolution. Real revolutionary software will have to be developped with hardware acceleration in mind explicitly.

    For another example : if GTK gets hardware-accelerated drawing primitives, Gimp’s button will be rendered with 3D acceleration and you can put every gimmicky eye candy on top of that if you want. But the nontrivial, actually useful stuff, which is GPU-accelerating the image editing itself, requires a rewrite of GIMP’s image manipulation backend, which is implemented above OS layers.

    Of course by now I think our designs are mostly similar anyway. Mine could be described as “(basic mode setting + hardware acceleration + primitive renderer) + (various stuff on top)” while yours has become “(basic mode setting + hardware acceleration) + (primitive renderer + various stuff on top)”. The only main difference is where the “primitive renderer” is – in the video driver in my case, and just above the video driver in your case. The end result of this difference is that you’ve got an extra “public interface” (between the video driver and primitive renderer) that has to be designed well (and thoroughly documented, e.g. as a formal specification) which may or may not become a limitation for video driver writers; and I’ve got potential code duplication (which is a problem that can be solved easily in a variety of ways – either by providing a “software renderer” service, or providing an open source reference driver that driver writers can “cut & paste”, or having a shared library that can be used by any video drivers that want it, etc).

    If you provide a reference implementation or a library that is easily portable from hardware A to hardware B, you need to abstract away the hardware-dependent part. Ergo, you need an hardware-independent abstraction which your drawing code bases itself on. If you have such an abstraction at hand, your solution becomes basically equivalent to mine, unless I’ve missed something.

    (Again, a “layer” is not necessarily a process, although I prefer this option. It may also be implemented as a shared library. The only thing which matters is that layers have independent, easily replaceable implementations, with only a stable interface between them)

  21. Brendan August 28, 2011 / 5:40 pm

    Hi,

    Let’s look at a hypothetical video driver interface – something detailed enough to actually be usable rather than some abstract high level view of it.

    For the sake of simplicity I’ll use the word “application” to mean “whatever uses the video driver” (e.g. the primitive renderer, window manager, virtual screen manager or something). I’ll also use “colour” when I really mean “the set of attributes that determine how something interacts with light”.

    At the lowest levels, the video driver should probably work on textures, where each texture has a “texture handle” and a default colour (that’s used when the texture data isn’t available or when the texture is so small/far away that the texture data isn’t needed). On top of that a texture has one or more (think mipmaps) sets of texture data (width in texels, height in texels, pointer to texel data).

    So to begin with you’ll want a few ways for applications to ask the video driver to create a raw texture. I like the idea of applications being able to ask the video driver to load a file containing raw texture data (so that the application itself needn’t do much and there’s no double handling) – something like “video.loadTextureFile(textureHandle, filename, defaultColour);”. Of course the texture data would originally be in a standardized format and the video driver might (or might not) convert the data into a device dependant format when it’s first created; and the video driver could/would also recalculate the caller’s “default colour” from the actual texture data, create mipmaps if/when needed, etc. You’d want to allow the application to dynamically generate texture data and pass it to the video driver too, so you want something for that – maybe “video.loadTexture(textureHandle, width, height, textureData, defaultColour);”. For every way of creating a texture, if an existing texture handle is reused then the new texture replaces the old texture; but you’d want a way to explicitly free a texture like “video.freeTextureFile(textureHandle);”.

    You’d also want to be able to create a texture from a Unicode string (where internally, the video driver might ask a generic font engine outside the driver to do the actual conversion). This might be “video.createUnicodeTexture(textureHandle, fontType, unicodeString, aspectRatio, foregroundColour, backgroundColour);” where the aspect ratio is used by the video driver to determine an appropriate width and height for the resulting texture.

    Next you’d want applications to be able to send a list of commands to create a texture. Something like “video.createTexture(commandScript, defaultColour);”. You’d need commands to draw fixed colour polygons, shaded polygons and textured polygons; and you’d need this for both 2D and 3D. That’s 6 commands. The textured polygons are where other textures are included in the new texture. For 3D polygons you’d want to be able to set/modify a projection matrix and camera position, so that’s another 2 commands. You’d also want some command to tell the video driver which texture is the final texture that should be drawn on the screen (where sending a new script for the “screen texture” causes the screen to be redrawn). Lastly, a command to set what happens near the “far” clipping plane would be a good idea (to be used for “things in the distance are obscured by white fog, or darkness, or..” – everyone has seen games use that as a way to hide the clipping at the rendering distance limit).

    You’d want some way for the application to determine which extra/optional features the video driver supports – something like “capabilities = video.getCapabilities(void);”. A way to identify the driver would be a good idea too (e.g. “identification = video.getIdentification(void);”).

    I haven’t forgotten the basic “get/set video mode” functionality, it’s simply not needed due to resolution independence. Instead, I’m going to add a function that applications use to determine the aspect ratio of the screen (e.g. “screenAspectRatio = video.getScreenAspectRatio(void);”). The video driver can automatically determine which video mode it feels like using (possibly based on a little benchmarking during driver initialisation, possibly just using the monitor’s preferred/native resolution, possibly taking into account some user preferences from somewhere).

    At this point, you’ve got everything you need for normal office work (even including CAD!) and the video driver interface is only 9 functions and about 9 “script commands”. Let’s call that “core functionality”. Anything beyond this is optional (and possibly not supported by a software renderer or ancient low-end cards).

    Optional features/capabilities would include lighting – a script command to create a light source with certain properties (colour of light, strength of light, direction of light?) and another script command to set the ambient light. If this feature isn’t supported or if the application doesn’t use it, then the video driver just assumes “no light sources, ambient light is daylight”. Note: the video driver itself could/should be responsible for using things like shaders; and I personally wouldn’t want to allow application/s to mess with any sort of “shader language” themselves (as it’s too hard to avoid software that requires a specific video card in that case).

    Volumetric fog would be another optional feature (one more script command).

    There’s also support for various movie formats (MPEG, H.264, Theora, whatever). I’d probably try to do these in a “texture is a movie” way but I can imagine that getting messy for a lot of reasons.

    Another optional feature/capability I’d want to at least consider is voxel rendering. For example, rather than only supporting “2D array of texels” you could also (optionally) support “3D array of voxels”. This would add a bunch of new functions and script commands, and you’d probably want to come up with some sort of “octree-like” structure for representing the voxel data. Some research and experimentation might be a good idea here.

    You’d probably also want to allow non-standard vendor commands. This provides people with a way to experiment, and provides you with a possible source of new ideas to add into future versions of the video driver interface standard.

    For some features (like sub-pixel rendering/anti-aliasing, support for reflections and specular highlights, etc) the application doesn’t really need to know or care if the video driver supports them or not – the video driver can just use them if/when it wants to.

    There’s probably a few things I’ve forgotten, but that mostly covers all graphics requirements.

    But that’s not all.

    One possible optional feature has nothing to do with graphics/video. If the video card has lots of RAM and the video driver doesn’t think all of that RAM will be needed for graphics, then it could allow that RAM to be used by the OS for swap space. This includes situations where there’s a spare/unused video card in the system (with no monitor attached), and situations where an older machine has been “pimped out” with an over-powered modern video card (I’ve seen a 166MHz Pentium computer with 64 MiB of RAM equipped with a 512 MiB video card that was probably worth more than the computer before – scary stuff).

    There’s also the “script to texture service” I mentioned once before, where the video driver offers to do generic rendering to other software. This could be a very nice feature (and not just for things like render farms).

    Finally, there’s GPGPU. The video driver offers a “general purpose processing” service, where other software provides some sort of standardised byte-code that it wants executed and the video driver compiles the byte-code into something that will run on the GPU and executes it. The same “general purpose processing” service could be offered by other types of drivers (e.g. physics accelerator cards, normal CPUs, etc). I’m not too sure how you’d want to go about this; but I’d do it such that drivers provide the service to the kernel/OS, and the kernel/OS controls things like scheduling (who uses which device/s when).

    Ok, lets say you write up something like this as a strict/formal “video driver interface” specification. Is it dictatorial enough to work? Someone writing a video driver only needs to comply with your formal “video driver interface” specification, and someone writing code that talks directly to video driver/s only needs to comply with your “video driver interface” specification. There’s no decisions anyone can make that effect the interface itself; and therefore the entire “herding cats” problem is gone.

    Now imagine that you’ve created the strict/formal specification, and written a software renderer that handles all the “core functionality”. The textured polygon part might be a little slow (but honestly, it might not be if it’s done right, especially on modern “multi-core with SIMD” systems, especially if you’re doing the “use the default colour rather than the texture data if you need to save time” trick, and especially if you’re not doing any lighting calculations). Then imagine you write some 3D games. You’d want to choose something where “no support for lighting” isn’t a problem – maybe a 3D SimCity clone or something. Eventually (hopefully maybe one day) someone sees your 3D SimCity clone (and “reduced quality” software rendering) and decides it’d be awesome in high detail with full hardware acceleration, and they download your “video driver interface” specification and write a full featured video driver for a specific video card. Suddenly your 3D SimCity game is humming along with all the fancy bells and whistles. The entire “chicken and egg” problem is gone, simply because you’ve got a formal specification that everyone can comply with (and a software renderer).

    Now imagine you just want to write a text editor. The video driver interface handles “fixed colour” 2D polygons and “Unicode to texture”, so you start with a script that tells the video driver to create a texture, fill it with “white”, then super-impose another texture (from the Unicode text) on top of it (where the Unicode text came from some text file the text editor is being used to edit). Then you add some code to ask the video driver to load some textures from files (for icons, buttons, etc) and add commands to your script to draw a menu, position some icons, etc. Of course maybe you’re using a widget library, and the widget library creates a few scripts/textures that the text editor includes (using a few “draw a textured polygon containing the widget texture” commands) in it’s own script. Your text editor just sends this script/s to it’s parent (the GUI?) which has it’s own script (that says “draw a textured polygon containing the text editor’s texture” and a bunch of other commands), then sends all these scripts to it’s parent (which sends it to it’s parent, which…) until the video driver (or more correctly, video drivers) get a hierarchical tree of scripts to draw.

    Now imagine the user wants a screen shot. The exact same “hierarchical tree of scripts to draw” (and textures files, etc) get sent to the printer driver, and the printer driver renders it all in CMYK (maybe with extremely high detail – processing time isn’t much of an issue here) and prints it out on some paper.

    Did you notice what happened there? A large part of the “video driver interface” is actually a device independent rendering interface; which is used by lots of other types of devices, and also happens to be used throughout the entire “graphics stack”.

    Finally, lets say you’ve got a good clean consistent design (where almost all software never needs to care about what actually happens with it’s graphics output) that takes into account device independence (where everything from printers to holographic displays can be supported), that is more flexibility than anyone could ever want (where “scripts” from various pieces of software can be nested and transformed in any number of ways throughout the entire “graphics stack”) and is also extensible (in ways that avoid backward compatibility problems). Then someone says they just want a 2D API where they can diddle about with RGB pixels in a “pixel buffer”. What would you tell them? :-)

    – Brendan

  22. Hadrien September 4, 2011 / 7:30 pm

    (Sorry for the delay, I was turned into a bit of a zombie by my recent Sweden-to-France trip)

    About your video driver interface, I admit that it would work. I’m not sure, however, how comfortable I am with the idea of a spec where video drivers have very little in the “must support” category and a lot in the “should support” category. My dual-spec idea has this advantage that no matter whether you have native drivers or not, you always get the most out of your device : the “framebuffer” spec includes anything a framebuffer can do, and the “GPU” spec includes anything a modern GPU can do. If hardware supports something, drivers must disclose it. This means no half-baked drivers. And nothing prevents drivers from still including proprietary extensions if they really want to…

    One possible optional feature has nothing to do with graphics/video. If the video card has lots of RAM and the video driver doesn’t think all of that RAM will be needed for graphics, then it could allow that RAM to be used by the OS for swap space. This includes situations where there’s a spare/unused video card in the system (with no monitor attached), and situations where an older machine has been “pimped out” with an over-powered modern video card (I’ve seen a 166MHz Pentium computer with 64 MiB of RAM equipped with a 512 MiB video card that was probably worth more than the computer before – scary stuff).

    Well, software should sure be able to allocate and manipulate VRAM for GPU-accelerated rendering anyway. But swapping in there… Hmmm… Not sure how much of a good idea it is. This means that memory management, a core kernel service, must interact with the GPU driver, which is an optional and partly untrusted user-mode component… Wouldn’t it hurt the cleanness of kernel design a little bit ?

    There’s also the “script to texture service” I mentioned once before, where the video driver offers to do generic rendering to other software. This could be a very nice feature (and not just for things like render farms).

    Not specific to driver-side rendering, I could as well add that feature to the primitive rendering layer, if I add a backend to render on a user bitmap (just as well as it can render on a GPU or a framebuffer), and bring an API for that.

    Finally, there’s GPGPU. The video driver offers a “general purpose processing” service, where other software provides some sort of standardised byte-code that it wants executed and the video driver compiles the byte-code into something that will run on the GPU and executes it. The same “general purpose processing” service could be offered by other types of drivers (e.g. physics accelerator cards, normal CPUs, etc). I’m not too sure how you’d want to go about this; but I’d do it such that drivers provide the service to the kernel/OS, and the kernel/OS controls things like scheduling (who uses which device/s when).

    Well, good question. I’m not even sure whether the OS should care about that or it should be the job of third-party components as is currently the case in most OSs (anything but OSX afaik).

    You raise a good point about scheduling, though. If GPGPU is allowed (and with a bare-metal GPU driver interface, there’s no reason why it shouldn’t), there has to be a way to make sure that one GPU-accelerated software doesn’t block another GPU-accelerated software. However, how should this be implemented ? Perhaps the kernel could provide an optional “scheduling service” that sends a signal to all processing unit drivers when context switches which follow the GPU’s queue occur, so that they can do a form of context switch (clean up state, start buffering orders of process A and executing orders of process B).

    Limitation is that a badly done GPU driver can crash/hang GPGPU software, which I think is fair.

    Now imagine that you’ve created the strict/formal specification, and written a software renderer that handles all the “core functionality”. The textured polygon part might be a little slow (but honestly, it might not be if it’s done right, especially on modern “multi-core with SIMD” systems, especially if you’re doing the “use the default colour rather than the texture data if you need to save time” trick, and especially if you’re not doing any lighting calculations). Then imagine you write some 3D games. You’d want to choose something where “no support for lighting” isn’t a problem – maybe a 3D SimCity clone or something. Eventually (hopefully maybe one day) someone sees your 3D SimCity clone (and “reduced quality” software rendering) and decides it’d be awesome in high detail with full hardware acceleration, and they download your “video driver interface” specification and write a full featured video driver for a specific video card. Suddenly your 3D SimCity game is humming along with all the fancy bells and whistles. The entire “chicken and egg” problem is gone, simply because you’ve got a formal specification that everyone can comply with (and a software renderer).

    Except for one thing : why would one design high detail textures/polygons if the vast majority of users won’t see anything but the ugly software rendered version ?

    What if high-quality rendering becomes critical to the gameplay ? (e.g. in Trine, if glow/HDR is not enabled, XP potions become harder to find. In a physics-based game, GPU acceleration may be critical to proper physics)

    Now imagine the user wants a screen shot. The exact same “hierarchical tree of scripts to draw” (and textures files, etc) get sent to the printer driver, and the printer driver renders it all in CMYK (maybe with extremely high detail – processing time isn’t much of an issue here) and prints it out on some paper.

    Did you notice what happened there? A large part of the “video driver interface” is actually a device independent rendering interface; which is used by lots of other types of devices, and also happens to be used throughout the entire “graphics stack”.

    It is not necessarily as much of a good thing as it sounds. Imagine that you have a video script which uses volumetric fog (which you mentioned as an optional part of your video driver interface). If you send that to a printer driver, it will say “okay, a printer can’t render this” and just ignore the command. If there’s a drawing layer above drivers, it can render the volumetric fog on an available GPU, convert the image to the printer’s color space, and send the result to the printer. The printer driver worried about nothing in the process, it just receives a high-resolution image, and this it knows how to print.

    Finally, lets say you’ve got a good clean consistent design (where almost all software never needs to care about what actually happens with it’s graphics output) that takes into account device independence (where everything from printers to holographic displays can be supported), that is more flexibility than anyone could ever want (where “scripts” from various pieces of software can be nested and transformed in any number of ways throughout the entire “graphics stack”) and is also extensible (in ways that avoid backward compatibility problems). Then someone says they just want a 2D API where they can diddle about with RGB pixels in a “pixel buffer”. What would you tell them? :-)

    So you want to ban SDL and all modern game development toolkits from your OS, knowing that gaming is an area where color rendering only matters slightly and rendering speed matters a lot ? Sounds like a risky bet :-)

  23. Brendan September 5, 2011 / 2:37 am

    Hi,

    For swap space, I’d have “swap providers”. The kernel’s virtual memory manager tracks a list of these swap provides (and their details), and when a page needs to be stored on swap it decides which swap provider to use (e.g. the fastest one with free space) and tells it to store the page. The swap provider stores the page and returns a “page ID”, which the virtual memory manager uses to retrieve that page later. It’s not this simple (e.g. swap provider can be put into “prepare for offline” mode, and redundancy would be an option); but basically there’s some abstraction involved so that it’s easy for the kernel to use a swap partition or a swap file on any file system, or use anything else for swap space (including video memory and remote computers/network). You’re right about “trusting a partly untrusted user-mode component”, but the kernel must trust something sooner or later, and at the end of the day it’s the computer’s owner/administrator that would need to decide what they want to trust.

    For GPGPU scheduling; in general “one finished job and one unstarted job” is better than “two half-finished jobs”. Because there’s no IO involved (GPGPU tasks don’t block waiting for disk, etc) I’d end up with a form of batch processing (when a GPU becomes free, highest priority job is started on that GPU which runs until it completes) with no context switching of any kind (other than when job complete or are aborted/discarded due to using more than some maximum amount of time).

    “Why would one design high detail textures/polygons if the vast majority of users won’t see anything but the ugly software rendered version ?”. Have you seen some of the offline renderers for Minecraft? The graphics in the game are (by necessity) relatively low-quality, but the exact same graphics data rendered with advanced lighting (shadows, reflections) and other things (anti-aliasing) can be stunning in comparison. Now imagine something like Crysis – my Windows/gaming machine can’t run Crysis at 1920 * 1600 with high detail in real-time (but there’s no reason it couldn’t spend 3 hours rendering a 4096 * 2048 image in “insanely high detail” from the exact same data). Now consider drawing 1000 white lines (with random start and end co-ords) on a black background. The fastest approach is Bresenham’s line algorithm, where some pixels are pure white and the rest remain pure black. A higher quality (but much slower) approach is to calculate the percentage of each pixel that should be white and generate “perfectly anti-aliased” shades of grey (note: this is harder than it sounds if you want the pixels around line intersections to be perfect). Basically, to get acceptable real-time performance you need to sacrifice detail/quality, and when you no longer care about real-time performance you no longer need to sacrifice any detail/quality and can therefore get much better graphics from the exact same data.

    I can’t prevent poor software (things like games that fail on low-end cards, games that require physics acceleration but try to work when there is none, and printers that treat “offline rendering” as “real-time rendering” and skip volumetric fog). Nobody else can either. Mostly, the point is to design things in such a way that good implementations are possible.

    I’m not interested in upholding the status quo or making compromises for the sake of compatibility. For my project I’m using a “redesign the world” approach (where everything must be redesigned to suit, and where even basic things like plain text files won’t be allowed to exist).

    Cheers,

    Brendan

  24. Hadrien September 5, 2011 / 10:50 am

    For swap space, I’d have “swap providers”. The kernel’s virtual memory manager tracks a list of these swap provides (and their details), and when a page needs to be stored on swap it decides which swap provider to use (e.g. the fastest one with free space) and tells it to store the page. The swap provider stores the page and returns a “page ID”, which the virtual memory manager uses to retrieve that page later. It’s not this simple (e.g. swap provider can be put into “prepare for offline” mode, and redundancy would be an option); but basically there’s some abstraction involved so that it’s easy for the kernel to use a swap partition or a swap file on any file system, or use anything else for swap space (including video memory and remote computers/network). You’re right about “trusting a partly untrusted user-mode component”, but the kernel must trust something sooner or later, and at the end of the day it’s the computer’s owner/administrator that would need to decide what they want to trust.

    So you’d manage swap as an out-of-kernel service with a minimal kernel-side interface (kernel can send pages to a swapping device and retrieve them back) where the sysadmin can choose to enable swapping for every storage media on the system in an opt-in fashion ? Sounds like an interesting way to go about it, will have to consider this when I start to work on swapping…

    For GPGPU scheduling; in general “one finished job and one unstarted job” is better than “two half-finished jobs”. Because there’s no IO involved (GPGPU tasks don’t block waiting for disk, etc) I’d end up with a form of batch processing (when a GPU becomes free, highest priority job is started on that GPU which runs until it completes) with no context switching of any kind (other than when job complete or are aborted/discarded due to using more than some maximum amount of time).

    Won’t there be a problem if you mix scheduling algorithms for different devices ? As an example, imagine that the CPU has a round-robin-ish algorithm and the GPU a batch algorithm. When a process sends a GPGPU task to the appropriate system service and there’s no free GPU, what happens ? Is it blocked, in violation of the general asynchronous philosophy of the the system ? Or does the “GPGPU server” have to silently buffer its orders in an ever-growing heap block without any guarantee that a GPU will ever be free in hours (if there’s a 3D rendering task running) ?

    Have you seen some of the offline renderers for Minecraft? The graphics in the game are (by necessity) relatively low-quality, but the exact same graphics data rendered with advanced lighting (shadows, reflections) and other things (anti-aliasing) can be stunning in comparison. Now imagine something like Crysis – my Windows/gaming machine can’t run Crysis at 1920 * 1600 with high detail in real-time (but there’s no reason it couldn’t spend 3 hours rendering a 4096 * 2048 image in “insanely high detail” from the exact same data). Now consider drawing 1000 white lines (with random start and end co-ords) on a black background. The fastest approach is Bresenham’s line algorithm, where some pixels are pure white and the rest remain pure black. A higher quality (but much slower) approach is to calculate the percentage of each pixel that should be white and generate “perfectly anti-aliased” shades of grey (note: this is harder than it sounds if you want the pixels around line intersections to be perfect). Basically, to get acceptable real-time performance you need to sacrifice detail/quality, and when you no longer care about real-time performance you no longer need to sacrifice any detail/quality and can therefore get much better graphics from the exact same data.

    What about the weight of said high-quality data then ? As an example, Carmack recently stated that all uncompressed texture data for Rage combined weighted 1 Terabyte. If you have the aim of making the game perfectly future-proof, you’d have to say “Okay, at some point in the future we’ll probably have 320dpi TV screens and neural interfaces that plug directly in the optical nerve, so we can’t afford to apply strong JPEG compression and downscaling to this. We’ll rather have to use something lossless like PNG, and since no support exists today for releasing a game that weights hundreds of gigabytes, we’ll sell it on expensive USB3 (or, better, Thunderbolt) hard drives that people can buy in stores or have sent to them by mail. Users will plug this hard drive into their computer, and will maybe also have to deal with a “bootstrap” DVD/BR for computers which do not support loading a game from a hard drive, like video game consoles. Aside from this cumbersome setup, the game would also run like crap and/or look like something from the 90s, because of the intensive use of USB bandwidth and the amount of CPU/GPU power used for real-time texture downscaling. Positive side is that in 10 years, when computers are capable of things we can only imagine right now, this game will finally run as intended on the computer of the few remaining nostalgic players. However, at the time, it will still look extremely dated, because gaming hardware will also have evolved into something we can’t imagine right now, allowing for new kinds of games that we couldn’t design right now even if we wanted to. Sounds like a fine plan, right ?”

    This example is certainly a bit extreme, but it goes to show that future-proofness (and more generally tweaking a game so that it runs on a wide range of machines) is not a one-way positive action which you should spend every free hour of spare development time doing. It has upsides and downsides, and is part of the grand scheme of game development compromises. For some games, it will be an attractive choice, for others it will be a major avoid, for a third category of games it simply won’t make sense at all. I think this is a decision and work that is best left to game developers.

    I can’t prevent poor software (things like games that fail on low-end cards, games that require physics acceleration but try to work when there is none, and printers that treat “offline rendering” as “real-time rendering” and skip volumetric fog). Nobody else can either. Mostly, the point is to design things in such a way that good implementations are possible.

    Games that fail on low-end cards : again, you seem not to acknowledge that there is a compromise there which sometimes make it the best option. Let’s picture ourselves in the highly hypothetical situation where Intel would release today a low-end GPU which only supports a relatively old OpenGL release (2.1 or 3.0). If you develop a game that’s a tech demo for showcasing the power of modern GPUs (Like most Crytek and Bethesda games), then you probably want to base your engine on the latest and greatest version of OpenGL (4.2). But if you also want to run that tech demo on Intel GPUs, you’d have to maintain two different backends to your game engine, one for the new OpenGL and one for the old one, and manage the transition between both. That steals a part of your development time, which means that the final tech demo will be less impressive in some way. All that to make that game on office computers, where it will still look dull and dated anyway.

    On the other hand, I agree that games that fail to check that hardware meets their requirements before running are probably just badly coded. It must cost, like, a few hours of coding to write such a check if APIs don’t offer a quick way to do it, and it significantly improves both user experience and tech support. I don’t see in which situation it could be a bad idea.

    Printers that treat “offline rendering” as “real-time rendering” and skip volumetric fog : Problem is, why should a printer driver care about GPU features like volumetric fog ? Even if it’s to render them in software, I mean.

    I understand that you worry about good implementations, and I obviously agree that good implementations should be possible. But I think that between “possible” and “actually there”, there is a gap where the OS can make implementation errors less likely. As an example, you mention “games that require physics acceleration but try to work when there is none”. One way to avoid this situation would be to make it so that before doing GPGPU work (such as physics acceleration), a game has to acquire a “GPGPU handle”. If it tries to do that when no GPGPU capabilities are available, the program is crashed with a clean system message explaining what’s going on.

    I’m not interested in upholding the status quo or making compromises for the sake of compatibility. For my project I’m using a “redesign the world” approach (where everything must be redesigned to suit, and where even basic things like plain text files won’t be allowed to exist).

    Let’s see where this will lead you to :) I don’t understand what’s the problem with plain text files myself, but if you can come up with something better and just as easy to use…

  25. Brendan September 6, 2011 / 4:54 pm

    Hi,

    Using different scheduling algorithms for CPU vs. GPGPU wouldn’t be much of a problem, as the GPGPU stuff is very “special purpose” anyway. The kernel would queue GPGPU jobs (no need for normal threads to block when submitting a job). There’s plenty of ways to prevent the queue from getting too large – refusing to accept new jobs if the queue is too full, offloading jobs to CPUs, temporarily storing jobs on disk/swap, asking threads to cancel their jobs, etc.

    For future-proofing; 20 years ago we had textured 3D polygons projected onto a 2D plane using OpenGL (e.g. ID’s Quake), and now we’ve advanced all the way to textured 3D polygons projected onto a 2D plane using OpenGL (e.g. ID’s Rage). That’s not what I’d consider a radical change in the fundamental approach.

    It would’ve been possible to create a “3D polygons projected onto a 2D plane” engine back in 1980 that could support as many polygons and as much texture data as the hardware can handle; and while the number of polygons and amount of texture data would’ve increased (due to improvements in hardware), the basic engine would’ve remained exactly the same.

    Of course more polygons and larger textures isn’t the only thing that changed in this time. They’ve also added things like volumetric fog, advanced lighting and shadow, motion blur, etc. Over time, all of these things could’ve been added to the basic “3D polygons projected onto a 2D plane” engine as optional features without breaking backward or forward compatibility.

    This is the type of future-proofing I’m talking about – an API where the basic functionality doesn’t change and optional extensions can be added in future. I’m not talking about making ID’s Rage run on an ancient 80486 or anything.

    If a game generates data for advanced features (for the latest and greatest hardware); then why would a games developer care if the video driver ignores most of the data for advanced features and only bothers drawing basic textured polygons? It’s the end user’s problem if they can’t get all the advanced features, not the game developer’s problem.

    For existing video APIs, it doesn’t work like that though. If you try to use unsupported feature your software will probably crash; which forces game developers to write special code for different situations; which results in game developers failing to support a wide range of hardware (especially low-end stuff); which results in consumers buying latest release games and getting very annoyed when they find out their “new” computer won’t run the game (and then get more annoyed when they find out they can’t return the game because they opened the packaging and the CD itself isn’t faulty). It’s stupid.

    For printers, why shouldn’t a printer driver render what it’s told to render? It’s understandable for a video driver trying to get 60 frames per second because it may not be able to render things fast enough; but when there’s no time limits there’s no justifiable excuse (other than “feature not supported yet”).

    For plain text files; see if you can write a utility to display the contents of a plain text file (like “cat” on Unix or “type” on DOS or whatever). Sounds easy doesn’t it? Your utility should support all of the different character encodings (UTF8, UTF32-BE, UTF32-LE, ASCII, all the “code page xxx” ASCII extensions, etc) and auto-detect the encoding correctly in all cases. It should also auto-detect the intended tab width and which characters are used for “end of line”. Does it still sound easy?

    Now see if you can invent something better than “plain text”. I’d start with “UTF-8 only” and with 0x0A as the only end of line character; then I’d ban some control characters (0x00, 0x0D, vertical tab, bell, delete, byte order mark, etc). I’d give it a minimal header, starting with a standardized “file size” field and a CRC checksum (so file corruption can be easily detected) and a standardized “file type” field that all file formats would have (so you can tell if it definitely is a text file). Then I’d add a “tab width” field in the header so that programmers won’t be too scared to use tab characters.

    Cheers,

    Brendan

  26. Hadrien September 6, 2011 / 9:49 pm

    This is the type of future-proofing I’m talking about – an API where the basic functionality doesn’t change and optional extensions can be added in future. I’m not talking about making ID’s Rage run on an ancient 80486 or anything.

    I believe this is more or less how OpenGL works. Basic functionality is only altered when the API’s designers find out about a major design flaw in them (like when Khronos deprecated immediate mode in OpenGL 3.0), otherwise knowledge of old revisions of OpenGL can easily be applied to a newer revision of the lib.

    If a game generates data for advanced features (for the latest and greatest hardware); then why would a games developer care if the video driver ignores most of the data for advanced features and only bothers drawing basic textured polygons? It’s the end user’s problem if they can’t get all the advanced features, not the game developer’s problem.

    But then when does an advanced feature stop to be advanced and becomes mainstream ? When can game devs actually *base* their game engine on some new GPU tricks, instead of always keeping a workaround to make sure that the game is playable on older GPUs ?

    For existing video APIs, it doesn’t work like that though. If you try to use unsupported feature your software will probably crash; which forces game developers to write special code for different situations; which results in game developers failing to support a wide range of hardware (especially low-end stuff); which results in consumers buying latest release games and getting very annoyed when they find out their “new” computer won’t run the game (and then get more annoyed when they find out they can’t return the game because they opened the packaging and the CD itself isn’t faulty). It’s stupid.

    And which is why demos and databases like CanYouRunIt are gold and should be provided by game developers themselves. But you know, I think that current OpenGL implementations do as you say. When games don’t check for availability of the proper extensions before starting up, the extended commands are just ignored and this results in random visual glitches happening. Sometimes it doesn’t matter, it’s just bad-looking fire/smoke effects or something like that. Sometimes, it affects the gameplay (as an example, disabling motion blur on a modern FPS for networked play would basically amount as cheating) or makes the game totally unplayable. I think that what would be best would be a way for developers to quickly figure out which extensions they rely on, and artificially disable them in software to see what the game is like without them. From that point, they can figure out which set of extensions is vital for the game to run properly, and make sure that the game crashes if these extensions are not present.

    For printers, why shouldn’t a printer driver render what it’s told to render? It’s understandable for a video driver trying to get 60 frames per second because it may not be able to render things fast enough; but when there’s no time limits there’s no justifiable excuse (other than “feature not supported yet”).

    My problem is not with rendering speed here, but with driver complexity. According to Tanenbaum, driver code statistically includes 3x more flaws than application code, so you probably want to keep those fellows as simple as possible. Asking a printer driver to care about GPU features like volumetric smoke does not match my vision of keeping things simple, which would be better served by unifying distinct imaging devices under a common drawing interface at a layer above drivers.

    But as you say, there are drawbacks to this approach too (need to create more standard interfaces), so I understand that you could see things another way.

    For plain text files; see if you can write a utility to display the contents of a plain text file (like “cat” on Unix or “type” on DOS or whatever). Sounds easy doesn’t it? Your utility should support all of the different character encodings (UTF8, UTF32-BE, UTF32-LE, ASCII, all the “code page xxx” ASCII extensions, etc) and auto-detect the encoding correctly in all cases. It should also auto-detect the intended tab width and which characters are used for “end of line”. Does it still sound easy?

    Now see if you can invent something better than “plain text”. I’d start with “UTF-8 only” and with 0x0A as the only end of line character; then I’d ban some control characters (0×00, 0x0D, vertical tab, bell, delete, byte order mark, etc). I’d give it a minimal header, starting with a standardized “file size” field and a CRC checksum (so file corruption can be easily detected) and a standardized “file type” field that all file formats would have (so you can tell if it definitely is a text file). Then I’d add a “tab width” field in the header so that programmers won’t be too scared to use tab characters.

    Okay, for a second I thought you were allergic to the idea that developers can directly manipulate a file at the byte level. If what you advocate is just setting a standard for how text should be stored in files (a standard that’s hopefully at least respected by the OS itself), that’s fine by me.

    I’m less comfortable with your header idea, on the other hand. I don’t like file headers, in my opinion they make things needlessly difficult for users of other OSs which deal with files created by your OS. I think that OSs should do their best to make their filesystem-related features non-invasive and invisible to other OSs which do not have the feature, whenever reasonably possible. CRC checksum and file size sound like something which would be better located in filesystem metadata, which other OSs don’t see if they don’t know it’s there.

    Now, I’m also against filesystem metadata for critical OS features, in some contexts, because they make you rely on a the range of filesystems which have them. But that’s another story, and filesystem metadata can always be emulated by a huge database (preferably well taken care of and backed up multiple times) that’s located somewhere on the disk and loaded by the OS when the it is plugged in.

  27. Brendan September 7, 2011 / 5:52 am

    Hi,

    Most of the games I’ve seen that require certain features that aren’t supported either pop up a dialog box and refuse to start, crash (in the “blue screen of death” way) or render abstract trash. This isn’t the behaviour I want. Something with a huge amount of special effects (e.g. Crysis 2, Rage) should run fine regardless of hardware; even if there’s no lighting, no shadows, no motion blur, etc; and even if it runs at 3 frames per second. If a game developers wants to prevent the game from running on low-end hardware, then they should have to do more work than normal (rather than doing more work than normal to allow a game to run on low-end hardware).

    For (some types of) multi-player games, the objective is to have fun. If you treat it as a competition to see who is the most skillful, then you’d have to make sure all players have the same network latency, the same frame rate, the same graphics features, the same resolution, etc. There’s only 2 ways to do that – force players to use the same hardware (which won’t work well for PCs), or artificially cripple things for most players to match the “worst case” (so if someone with a slow computer and bad network connection joins the game, everyone else’s computers drop back to “poor performance” mode to match it). Neither of these options are desirable; which means “fair” isn’t practical.

    I was planning to have “required features” and “optional features”. Optional features will always be optional and will never be promoted to “required”, partly because that’s impossible to do without causing backward compatibility problems (e.g. older video drivers that don’t support the new required features because they weren’t required).

    I’m not sure where Tanenbaum got his statistics from (or what he’s including as a flaw). He’s a micro-kernel advocate though, and I’d assume his objective is to convince people to shift device drivers to user space (where catching bugs and debugging is easier); and that he wasn’t suggesting that drivers should be less complex (or that applications aren’t buggy enough!). If you really want the least complexity, then you should probably stick with ASCII characters for printing, and “text mode single monitor” video. :-)

    Note: When I said “printer driver is responsible for doing the rendering”, this doesn’t necessarily mean that the printer driver does the rendering itself – it can mean that the printer driver is responsible for ensuring some other software does the rendering on the printer driver’s behalf. For example, you could have a generic “video script to texture” service that printer drivers use.

    For my project, I’m not just setting a standard for how text should be stored in files, but setting a standard for how text must be stored in files. If someone wants to open a plain text file from Windows or something, then that file has to be converted to “standard text file” format before any application will be able to touch it. To mitigate the hassles for end users, the VFS will handle the conversion semi-automatically. For example, the application asks the VFS “open file foo as standard text file” and the VFS tries to find a suitable file conversion module and convert it to “standard text” without the application knowing or caring what the original file “foo” was. People using other OSs will be using the wrong OS, and therefore won’t be my problem.

    File system metadata fails as soon as files are transferred between computers (e.g. stored on legacy file systems like ISO9660, transferred via. FTP or HTTP, etc). Data in a header doesn’t suffer from this problem.

    Now try to think of files or file formats that don’t have a header. There’s “plain text” (obvious) and a few “flat binary” executables for rare/special purposes. That’s all I can think of. That means there must be thousands of different file formats that have headers, that everyone is already using on a wide variety of different OSs without any problems.

    Cheers,

    Brendan

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s