Everyone has once met the phenomenon of screen tearing, illustrated by the appearance of horizontal random lines in the video output of a computer. If you think you never have, here’s an example of an image with artificially enhanced tearing, and an example of one without that is sure to revive some bad video playback memories. In this post, I’m going to discuss why it happens, how it can be avoided sometimes, and an interesting fallback method to reduce its effect in situations where it can’t be avoided.
Why tearing happens
While pretty much around here has probably dealt with tearing at some point, what you may not know is that at the core of screen tearing is nothing but a good old race condition. Basically, what happens is that when a moving picture is displayed on screen, software continuously blits the new pictures in video memory, whose contents are in turn regularly displayed on screen by the GPU. However, there is no thread synchronization mechanism in place to ensure that both events cannot be happening at the same time. Consequently, the GPU will regularly display half-printed pictures on the screen, resulting in the jagged picture effect shown earlier.
If both software-side picture display routines and GPU-side screen update routines operated at exactly the same rate, moving pictures would exhibit a static jagged line, corresponding to the position at which software blitting lies at the time where the screen is updated. Most of the time, they aren’t, so the line between what has been updated and what hasn’t is constantly moving, both figuratively and concretely.
Avoid tearing entirely
Solutions against screen tearing exist. The first one is vertical synchronization, which revolves around the GPU notifying software in some way of periods when it’s not updating the screen. That is not a very good solution, because it gives software relatively little time to display a picture on screen. It is, however, remarkably easy to implement in hardware (just redirect a screen update clock to a CPU interrupt line), so GPU manufacturers like to leave you with that. And here, I assumed that they would bother to actually rewire the clock interrupt, whereas sometimes software may have to constantly poll video memory until it receives the VSync greenlight, wasting precious CPU cycles.
The second solution is multiple buffering, which revolves around having at least two video memory buffers and a pointer that can be set to point to one buffer or the other. In this scenario, software draws on one buffer while the screen is updated using the other one, then the buffer pointer is flipped and the next screen update will thus use the new image. Using more than two buffers allows software to continue drawing when it’s done with a first picture and the screen update is still not over, allowing continuous software blitter operation and thus increased display responsiveness.
Multiple buffering is by far the best solution against tearing, but it requires that GPU manufacturers stop hardcoding memory addresses in their hardware and learn the use of pointers. For someone who cannot even follow a standards document, that’s an incredibly hard task, and so they generally end up either giving up or implementing it in a nonstandard way. Because although you may not know it, multiple buffering really is a key area of GPU differentiation that cannot just be implemented in the same way as the neighbouring manufacturer. Yeah, right…
Noise as a fallback path
So in effect, there are three possible situations:
- We have a reliable multiple buffering mechanism in place for the current GPU, and we know how to use it: everything is fine, no tearing will occur. This will rarely happen, and when it does will likely require GPU-specific drivers.
- We have a vertical synchronization mechanism in place, we know how to use it, and we can draw video frames fast enough: No tearing will occur, but we may have to hog the CPU for polling purposes which is not always appropriate.
- If we have neither multiple buffering, nor VSync and an extremely optimized blitter, tearing is bound to occur at some point. This is likely to be the common case for a hobby OS project like TOSP.
In the two latter cases, we may decide that some amount of screen artifacts can be acceptable after all. But it would be preferable to experience a visual artifact that is more subtle than regular tearing’s jagged lines. Here, I’ll show that by playing with persistence of human vision, this can be done, if you can afford to pay higher CPU or GPU costs at the video memory blitting step.
To understand why, we have to go back to the question of what makes screen tearing so blatantly obvious to the eye. And the reason is that our eye is extremely good at shape analysis, and one of the easiest things which it can process is a jagged line shape. Which is precisely what we’re dealing with when tearing occurs, and thus what makes the effect highly visible. Thus, what I claim is that if tearing artifacts were replaced by a fully random combination of pixels from the old and the new picture, the effect should be more subtle, especially when we’re talking about pictures that only last at most 1/60th of a second each.
To convince yourself of that, compare the previous tearing simulation with a variant where tearing is replaced by random noise in the updated screen region. Obviously, there still are artifacts, but a different kind of them, similar to those of vintage video tapes, that works better on moving picture in my opinion. Note that for the above examples, the rate at which your computer is able to display animated gif frames will directly determine how smooth the noise-induced blur looks. You may want to display these pictures at smaller sizes, or even save them and open them in other software that can display animated gifs. If I had more time, I’d build proper videos, but for now I don’t have the will to learn how to do that.
Producing noise in practice
The way tearing can be replaced by noise depends on how the system picture blitter works. If the fastest way to blit a picture into video memory is to fill individual pixels or small chunks thereof, then the previously displayed noise effect is trivially achieved by randomizing the locations at which new pixels are drawn when a picture is blitted. This will have some performance cost, since a lookup table has to be parsed and computer memories work faster when they are accessed in a sequential fashion, but it shouldn’t incur too much overhead considering how slow communication between the CPU and video memory is to begin with.
Note that in this scenario, each individual frame displayed by the screen will contain a different, slowly varying fraction of the initial and final picture, so that the actual effect achieved will be more subtle than constant random noise, in turn making shapes significantly more recognizable. Here’s an example, to be compared with the random noise scenario (again, try it at smaller sizes and/or with a standalone gif viewer). If we know the screen refresh rate, we could voluntarily run video memory updates at a slightly lower or higher rate so as to control how fast the displayed fraction of new vs old picture is changing. This will require some fine-tuning, though, because if this fraction changes too slowly, it will result in visible variations of the noise contrast with time.
If dedicated GPU drivers are available, however, or if you are dealing with a computer that has a really smart firmware, there can be an integrated blitter that is much faster at commiting whole pictures to video memory than individual pixels. In this case, the random noise effect can be approximated through the use of several intermediary frames per picture, each frame representing a different mix between the picture that was initially displayed and the picture that is now going to be displayed. Such intermediary pictures can be computed really quick through the use of pre-computed noisy binary masks. The amount of intermediary frames that should be used should probably depend on how fast the GPU can actually do the blitting: the more intermediary frames, the least noticeable tearing in the intermediary frames will be, but the higher the cost.