Master of Science Thesis
Department of Computer Science
Lund Institute of Technology
Real-time Video Effects
Using Programmable Graphics Cards
Videoeffekter i realtid med programmerbara grafikkort
Klas Skogmar <>
Supervisor:
Lennart Ohlsson <>
Abstract
The thesis treats the use of modern consumer graphics cards for doing real-time manipulations with high-resolution video.
A demo program was developed for color correcting video using the graphics card instead of using the processor. The demo program shows the abilities of this method for doing image-specific tasks.
Other programs were also used to test this method’s performance of rendering video to the screen, and to test the transfer rate between the card and the processor.
My conclusion is that this is something that has a great potential, but the transfer speed from the graphics card to the memory has to be improved. This is a software issue, but this problem is about to be resolved with driver updates. Already driver updates have enhanced the performance by several hundred percent.
It is already a technique that is capable of enhancing the visualization of the modifications that are made. This is a sufficient reason for making programs that utilize the graphics card instead of the processor.
Contents
Real-time Video Effects 1
Using Programmable Graphics Cards 1
Abstract 2
1. Introduction 5
Background 5
Emergence of high resolution broadcasts 6
Higher dynamic range 7
Modifying video in real-time 8
The problems 8
2. Programmable graphics cards 10
3. Effects on the graphics card 12
Color correction 12
Masking 13
Color keying 14
Compositing 14
Transitions 14
Painting 15
3D-effects 15
Limitations 15
4. High resolution challenges 16
Disc space 17
Transfer speeds 17
Resolution limitations 18
Using the graphics card 18
5. Working practice 20
Scandvision Interview 20
6. Existing technology 23
Dedicated hardware 23
OpenGL:s Imaging subset 23
DirectX 9 24
Quicktime 25
ATI’s technologies 25
Nvidia’s technologies 25
CinePaint 25
7. Building media tools 27
Why create a demo program? 27
Building media tools using DirectShow and graphics cards 27
Accessing the graphics card 29
Transform filters 29
Connecting it with a graphical user interface 30
DirectShow-filter 30
The MFC GUI 31
Testing the program 32
8. Performance analysis 34
ATI's demo program 34
OpenEXR's EXRDisplay 35
Render to memory 36
Matt Craighead's test program 36
Test results 37
9. Conclusions 40
The potential of using the graphics card 40
The future 41
Appendix A – Formats 43
Open-EXR 43
DPX/Cineon 43
TIFF 44
DV 44
Digi-beta 44
HDTV 44
Appendix B – Graphics cards 45
Older cards 45
Nvidia's Geforce FX 45
ATI's R300 45
Bibliography 46
Books: 46
Papers: 46
Internet: 46
1. Introduction
The thesis will show how high-resolution, high dynamic range video can be modified in real-time using the new generation of programmable graphics cards.
Background
The need for computer aid in editing/color correcting film and video has been obvious since it was first made possible. Today there are relatively cheap solutions in real-time for DV resolutions (720x576 PAL/720x480 NTSC). For HDTV, the picture is different. Real-time solutions for editing/correcting HDTV cost around 200,000-300,000 USD. This thesis will try to determine if the new high quality programmable consumer graphics cards with programmable pixel and vertex shaders will be able to do HD video manipulating in real-time, or near real-time. The thesis will try to deliver a summary of the research and will also try to foresee some future developments.
Before 2002, 2D image manipulation had to be done by dedicated hardware parts. Using hardware support made the solutions very specific, and new hardware had to be bought, when the need exceeded the current installations. When programmable graphics cards turned up in 2002, this seemed like a good solution that could deliver both speed through hardware, but still a serve as a general programmable platform.
For the technique to be useful there need to be real speed advantages when using the graphics card compared to the processor. Testing data will be able to tell if the technique is valuable today, or if this belongs to the future. As part of the master thesis a demo program is developed for testing purposes, and to show what can be done using a DirectShow filter that uses the graphics card for manipulating the pixel values. The results that are analyzed, is the time it takes to get a frame from memory, upload it to the graphics card and then fetched back to the main memory. This time is compared with those obtained when using the processor.
Other times, like how long time it takes to read/write to/from hard drive, is another potential problem. This thesis assumes that this is not a bottleneck. Hard drive performance is covered elsewhere, but the thesis anyway covers some basic storage requirements for a future low-end video editing station.
For the graphics card usage to become a success, not only needs it to be faster than the processor, but it should also be able to do all the currently available effects. Example of effects that are common today are: color correction, color keying, masking, transitions, compositing, filters and painting. Generally these will quite easily be implemented using the graphics card as well. In chapter two many effects will be analyzed with respect to the implementation on the graphics card.
Emergence of high resolution broadcasts
High resolution is needed when the final product will be shown with a high resolution, that is in cinemas or on HDTV, or when modifications are going to be made on the source video.
Since the broadcasting of television began in the fifties, the resolution of the television has stayed the same. When color TV transmissions replaced the ones in black and white, it stayed compatible by using one black and white channel and two color channels. The television standards are different in different regions. The dominating standards are Americas NTSC (National Television Standards Committee) and Europe’s PAL. Some countries have chosen to use an alternative, SECAM. All current television standards are interlaced[1] with PAL having a slightly higher vertical resolution. PAL has a resolution of approximately 720x576, and NTSC 720x486[2], but PAL has fewer fields (50 fields/s) compared to NTSC’s 60 fields/s.
If you want to compare the resolution of TV with film that is used to record movies, you have to compare the resolution of the film when scanned. The most common format[3] for recording movies is 35 mm (full aperture), which usually is scanned at either 4K (4096 pixels in width) or 2K (half = 2048 pixels). The height then depends on the ratio between width and height, which usually is 1.85:1 or 2.35:1. This resolution is around 3 to 16 times larger resolution than the one used in today’s TVs.
HDTV is emerging as a new standard for television[4]. It supports several resolutions and both interlaced and progressive scan. The highest HDTV resolution is very similar in resolution to the “half” resolution used when scanning 35mm[5]. It is thus sufficient for display on cinematic sized/high resolution displays[6].
Higher dynamic range
Today, there is an increasing interest in the movie production industry for programs and hardware that can handle HDR (high dynamic range) images. This means that they have more information than can be displayed or perceived; this limit is around 8 bits/color. The extra bits can later be used to change the exposure of the images in post-production, which limits the need for re-scanning the movie. Traditional film has a higher dynamic range than the 8 bits that are common in image manipulation programs like Photoshop. The film's sensitivity is also non-linear. It is more sensible to intensity changes in very bright and dark parts of images, which means that you can usually distinguish differences in colors in an image even if it is taken directly at the sun for example. This also means that digital camera equipment will need the extra bits of information, since then it is impossible to go back and rescan sequences for changes. The digital sensors also have more contrast compared to film.
Digital image sensors have the potential of becoming even better than film, since for digital image sensors “A high dynamic range is not in contradiction with a high sensitivity”[7]. But film still has a much greater tolerance with over exposure. To solve the dynamic range problems in digital sensors, the Japanese company Fuji has developed a new digital image sensor, which is called SuperCCD SR. It alternates sensitive photodiodes, intended to capture shadows and midtones, with less sensitive, that captures the bright areas. This technology will be launched in early summer in consumer cameras. If it works, this concept will soon be used in the movie industry, in which cameras cost several hundred times as much and high dynamic range is very important.
Image showing differences between film's sensitivity and CCD
Modifying video in real-time
When manipulating images, there is a need for instant feedback. Actually, the industry in some cases still uses analogue editing stations because it can display transitions immediately. Using tapes, it is also easy to track frames, and the response times are negligible.
If the computer is not powerful enough to render the whole image in real-time, most programs utilize a preview mode that shows changes on a small part of the image. This area is called area of interest (AOI) . This is a good way of giving instant feedback, but also limits the general impression of how the changes will affect the image. This is especially important when dealing with video, since the preview will consist of a clip, rather than a picture. Since many pictures are involved, the rendering needs to be done on the entire image in the same speed as the frame rate (usually 24-30 fps[8]). This means that each image will need to be loaded, rendered and stored in less than 30-40 ms.
Another way of increasing the speed is to work with a low resolution copy – a proxy[9]. The proxy can be edited and modified with ease, and when the work is finished all editing and algorithms are applied to the original copy.
Why graphics cards?
The idea behind using graphics cards, is that they are developed to do parallel brutal force calculations per pixel basis[10]. They are also very cheap, since they are produced in large quantities. If it is possible to use the graphics card, then many programs like Photoshop could start transferring some functionality to the graphics card. This would mean faster programs, with more functionality without having to pay for expensive hardware.
Recent additions to the programmability have increased the flexibility even further. This means that these graphics cards can give hardware accelerated performance of almost any operation. What can be done is covered in the next section.
The problems
The following problems need to be addressed:
· How can graphics cards be used for displaying and altering video?
· What kinds of effects are suitable for a 3D environment?
· Is it possible to speed up effects by using the graphics card?
· How are the transfer rates affecting the system?
How much disc space is required and how does this affect the system?
2. Programmable graphics cards
Today graphics cards have even more transistors than processors[11]. This made the graphics card manufacturer Nvidia introduce the notion GPU, an abbreviation for Graphics Processing Unit, to place it at the same level as the processor, the CPU. The reason for increasing the amount of transistors in graphic cards is that the manufacturers have added more flexibility to the cards. Nowadays the cards can do more than just the traditional 3D pipeline.
This added new flexibility opens up new possibilities in 3D, but also for doing image manipulations. In 3D games the programmability of the cards makes it possible to add a lot of new features. This can be for example bump-mapping, displacement mapping, toon-shading and making the environment of the 3D worlds dynamic.
Despite the fact that the technology was created to enhance 3D, it can be used for 2D as well. This is accomplished by applying an image as a texture on a polygon that covers the display area. When the texture is rendered to the screen there is a possibility to use the programmable pipeline to modify the image.
The programmable pipeline consists of two steps: first vertex shaders, and then pixel shaders. Vertexes are the corners of the polygons. When each polygon is rendered, each pixel’s color value is fetched from its polygon’s texture. The pixel shaders can then do a computation for each pixel, like merging it with another texture, subtract a constant from a color component or multiplying it with a transformation matrix.
The programmable GPU’s started with Nvidia’s Geforce 3, and since then all but the cheapest graphics cards have vertex and pixel shaders. Usually there are several parallel pipelines for the shaders, which make the cards handle even more data simultaneously.
The first versions of pixel shaders supported only a very limited set of operations. Only 8 color operations and 8 texture operations could be performed on each pixel, with no support for conditional blocks[12]. Recent versions have much more operations, more bits for each color and added support for conditional blocks.
3. Effects on the graphics card
There are some differences that have to be considered when doing effects on graphics cards compared to doing them on the processor. The graphics card is more limited, since a limited amount of operations can be performed in one pass. There are also some restrictions on conditional jumps, but the recent cards can do much here as well. Actually the languages for programming the graphics card now resemble those for programming the processor. Cg[13] and HLSL[14] are very similar to C. This makes it easy to port code written for the processor to the graphics card.