In the mid 1990?s, video scaling was an unknown technology and line doublers and quadruplers were the standard methods used to convert TV video to computer video format for display on high-resolution LCD and DLP projectors. Line doublers and quadruplers provided a better end result and were viewed as indispensable.
But a few years later, when video scaling technology was first introduced, it offered a new way to think about converting TV video to computer video format. Rather than generating an output that is dependent on the input format, video scalers produce a converted image in a wide range of resolutions and refresh rates, completely independent of the original, incoming TV video format. The end result is superior image quality over doublers and quadruplers, and the clearest, crispest, most brilliant video for display on LCD, DLP projectors and plasma displays.
Like line doublers and quadruplers, video scalers combine the information in the odd and even fields of an incoming video signal into a combined, non-interlaced picture. But video scalers also use sophisticated, processing algorithms to manipulate the image, changing its resolution, refresh rate, and even aspect ratio, to exactly match the desired output specifications. Rather than generating ?odd-ball? resolutions of 483 or 966 lines, video scalers provide converted output in standard resolutions like 640 x 480, 800 x 600, and as high as 1280 x 1024, all at a variety of refresh rates.
Today, video scaling has become the preferred method in the Professional A/V and Home Theater markets for increasing the resolution of traditional video images. Now its own well-respected category of product, video scaling is recognized for its many advantages over line doubling and quadrupling. In fact, the technology has become so mainstream and accepted that many display manufacturers have begun integrating scaling capabilities right into their units.
As a result, the need for external scalers ? once thought to be a critical add-on when designing a top quality display system ? has become ?less obvious.? Afterall, if a projector or display can internally scale video input to match its own native resolution (H&V pixel count), many customers, dealers and system integrators are having a hard time justifying the added expense of purchasing or specifying an external scaler.
But this position may need to be rethought, because a good scaler is more than just a fancy ?upconverter,? designed to change resolutions from lower to higher. It is also a sophisticated video processor, with the ability to make difficult adjustments for motion, conversion from film to video and changing aspect ratios. Without these capabilities, simple ?scaling? of an image may result in less visible flicker or horizontal lines, but true, film-like quality cannot be achieved.
With the introduction of HDTV, a whole new set of situations arises in which proper video scaling AND processing becomes critical for creating a top-quality viewing experience. This article will help explain Inverse 3:2 Pulldown, Anamorphic Scaling and other confusing concepts associated with advanced video scaling.
Many challenges arise in the conversion process from standard TV video to non-interlaced, high-resolution, computer-quality or HDTV-quality video. Moving objects are one of those challenges. The issue arises not because of the motion itself, but because of the way motion is recorded and captured in traditional interlaced video, and then what happens to it when it is then converted to a non-interlaced format.
Each frame of traditional video is made up of two fields: odd lines and even lines. When a frame is ?painted? onto a TV or video screen, first the odd lines appear, from left to right, top to bottom, and then the even lines appear in a second pass, again from left to right, top to bottom, 1/60th of a second later. This is not only the way video is displayed; it is also the way it is recorded. In other words, if a video camera records the motion of a flying bird, in any given frame of the video, the bird is located at a slightly different location in the odd field than in the even field.
Now imagine what happens when the odd lines and even lines get combined into a single frame, as they do in the process of video scaling.
The edges of the bird are going to appear staggered, or jagged. (This phenomenon is often referred to as ?the jaggies.?) It is the video processing function of a scaler to smooth out these edges, so that within each full frame of video, the bird appears at one, distinct location. At the same time, the scaler must also make sure that, as a result of the video processing, the edges of the bird do not appear blurred, that its motion across the screen remains smooth and realistic, and that other, static images within the frame (roof tops, trees) are not affected by any manipulations done to the moving image (the bird). A good video scaler uses a variety of video processing and ?motion compensation? techniques to achieve the desired effect.
Static Mesh Processing
Static Mesh Processing is the most basic type of conversion from interlaced to non-interlaced video. When employing static mesh processing, a scaler merges the odd and even fields of a video frame into a single, combined image without any regard for movement or discrepancies between the odd and even fields. When applied to static images, this type of processing generates the most crisp details and eliminates any jitter ? particularly with thin horizontal lines. However, it does nothing to eliminate ?the jaggies.? As a result, static mesh processing is most effectively used when combined with other types of video processing and selectively applied only to those parts of the video frame that show little or no motion.
Vertical Temporal Processing
Vertical Temporal (VT) processing is a technique intended for use when processing moving images. As in static mesh processing, a scaler employing VT processing combines the odd and even fields into a single frame. However, using the ?bird? example, as the fields are combined, the scaler averages together the points that make up the bird?s jagged border and creates a new, smooth edge for the moving bird, midway between where it appears in the odd and even fields. While a good quality scaler is able to do this with minimal blurring or loss of detail, there is obviously some compromising made to the original video signal. Therefore, as with static mesh processing, it is best when VT processing is applied selectively. A scaler that effectively combines static mesh and VT processing, using each where most appropriate, will provide the most satisfactory picture quality overall.
Adaptive Frame Processing ? for Video that Originated as Film (Inverse 3:2 and 2:2 Pulldown)
Perhaps the most popular application for high-resolution displays is creating a ?theater-like? experience for viewing movies recorded to videotape or DVD. The conversion of film to standard video creates a unique set of conditions for which scalers with top-quality motion compensation processing can be particularly helpful. This is because traditional film has many properties that are qualitatively different than standard video (NTSC or PAL). First of all, unlike video, each frame of film represents a unique moment in time. There are no separate ?odd and even? fields, nor is there any ?scanning? of the image left to right, top to bottom. Each frame is a static, complete snapshot of what is happening at a specific moment.
Secondly, film is recorded and played back at a different speed than video. A movie camera shoots 24 frames per second, and the film plays back at a speed of 24 Hz. By comparison, there are 30 frames (60 fields) per second in NTSC video and 25 frames (50 fields) per second in PAL video. Thirdly, film is generally shot at a different aspect ratio than TV video. Movies shot for the theater are generally in a wide screen format while TV screens and traditional video sources have a more square-shaped, 4:3 aspect ratio.
Converting Film to Video:
The following drawing represents 8 frames of film. Each frame, A through H, depicts a precise moment in time at which the image in the frame is captured.
When converting film to a standard ?video? image, such as on videotape, the most obvious way to imagine doing it would be to convert each frame of film to two fields of video ? made up of odd and even lines. The two fields, when combined, create full frames of video, each one corresponding to one of the original frames of film.
Converting film to video in this matter would be possible, but a problem would arise when you went to play back the videotape. The movie would run way too fast. This is because the original film was recorded at 24 frames per second, while NTSC video plays at 30 frames per second. This means that one hour of the original movie would play back in only 48 minutes! (In Europe and other places that use the PAL standard, the problem would not be as noticeable. PAL video runs at a speed of 25 frames per second, so one hour of original film would play back on videotape in about 57 _ minutes.) Clearly, the difference in speed between original film and NTSC is not acceptable. Therefore, an additional step, beyond the process shown in the previous diagram, must occur in order to create a satisfactory conversion from film to video.
Basically, you need to calculate how many fields of video you should create from each original film frame so that when the video plays back at its correct speed of 30 frames per second (60 fields per second), the movie appears to run at the correct speed. That can be calculates as follows:
1) If the original film runs at a speed of 24 frames per second, then each film frame is intended to display for 1/24th of a second, or .041666 seconds. (1 second/24 frames = .041666)
2) In NTSC video, which runs at a speed of 30 frames per second , each frame of video displays for 1/30th of a second, or .0333 seconds. Furthermore, since each frame is made up of two fields, you can calculate that each field appears for half that amount of time, or 1/60th of a second. (.01666)
3) To calculate how many fields of video should be used to represent each frame of film, we simply divide 1/24th (the speed of each film frame) by 1/60th (the speed of each video field). 1/24 ? 1/60 = 60 ? 24 = 2.5 or .041666 ? .01666 = 2.5
So, in order to make the original film appear to run at the correct speed when converted to NTSC video, you need to make 2 _ fields of video for each frame of film.
This, unfortunately, is not possible. There is no such thing as a half field of video. However, what you can do instead is create 5 fields of video for each 2 frames of film. This can be done by creating 3 fields from the first film frame, and then 2 fields from the next film frame, and just repeating this process over and over.
This process of converting one frame of film to 3 fields of video and the next film frame to 2 fields of video is known as ?3:2 Pulldown.? For a video scaler to do the best possible job of scaling and processing video that was created via this method, the scaler should offer a processing technique called ?Inverse 3:2 Pulldown.? Inverse 3:2 Pulldown, sometimes also referred to as an ?adaptive frame mode,? is designed specifically to address the properties unique to NTSC video originating from film.
How Does A Scaler Know When To Apply Inverse 3:2 Pulldown Processing?
A video scaler that offers Inverse 3:2 Pulldown must first be able to detect that the video it is processing originated from film. It does this by looking for certain patterns in the way the odd and even fields of video ?match up.? Let?s take a closer look at the fields of video as they would be generated using 3:2 Pulldown.
For video frames 1, 4 and 5, the odd and even fields comprising these frames are identical to each other, originating from the same frame of film. Video frames 2 and 3, by contrast, have different information appearing in their odd and even fields, as each field originated from a different frame of film. When a video scaler detects this repeated, 5-frame pattern of ?same, different, different, same, same?? in the source video it is processing, it knows it is looking at video resulting from 3:2 Pulldown. In these instances, the scaler performs best by applying an Inverse 3:2 Pulldown technique.
How Does Inverse 3:2 Pulldown Work?
Inverse 3:2 Pulldown is actually a combination of static mesh and vertical temporal processing techniques. As previously explained, static mesh is most effective when processing relatively still images, while vertical temporal processing works best when applied to fields (or portions of fields) that show motion ? detected as differences in the placement of objects from the prior field. What constitutes ?Inverse 3:2 Pulldown? is the way these techniques are combined for application to 3:2 Pulldown source material.
When employing Inverse 3:2 Pulldown, static mesh would be applied for frames 1, 4 and 5, while vertical temporal processing would be applied for fields 2 and 3 (unless no motion was present). Similarly, the scaler would also look at differences between fields comprising adjacent frames, and apply the correct processing technique as appropriate. For example, when the second field of frame 1 and the first field of frame 2 are identical, static mesh would be applied at this point. However, if the second field of frame 4 and the first field of frame 5 are different from each other, vertical temporal processing might be necessary for this transition.
Obviously, this is a very simplified explanation of how Inverse 3:2 Pulldown works, but it addresses the basic principles of the process. In short, for original film material to appear as film-like as possible when viewed from a video source, a scaler that offers Inverse 3:2 Pulldown or an ?adaptive frame mode? plays a critical role.
CHANGING ASPECT RATIOS
In addition to providing superior motion compensation processing, video scalers also help to maximize the benefits that can be derived from viewing video in a widescreen format. Let?s quickly review the basics of aspect ratios ? both for video and film (as much of the video we watch originates as film).
TV Aspect Ratios
Traditional TVs have an aspect ratio of 4:3. In other words, the screen has a width of 4 units and a height of 3 units.
These TVs provide the best quality viewing experience when watched from a distance of 8 times the height of the screen. This is the minimum distance from the screen at which horizontal lines and flicker, inherent in NTSC and PAL video, become virtually undetectable. So, for example, when watching a standard 27 inch TV which has a screen height of approximately 16.2 inches, you should sit almost 11 feet from the TV in order for your eyes to perceive the highest quality picture. (16.2 inches x 8 = 129.6 inches = 10.8 feet.)
By contrast, an HDTV display has an aspect ratio of 16:9.
The origins of this aspect ratio date back to post WWII, when the Japanese began to develop HDTV technology. The goal of this project was to develop a TV standard that would provide viewers with a more lifelike, sensory immersing experience. Part of the solution involved the creation of a wider screen standard that would result in a broader viewing field. The 16:9 shape was ultimately chosen because it provided viewers with a 30 degree field of vision when sitting at a distance of three screen heights away from the TV. (Three screen heights, as opposed to eight, was intended to be the ideal distance for viewing HDTV, based on the higher resolution which eliminates flicker and visible horizontal lines.)
This 30 degree field of vision contrasts to only a 10 degree field provided by standard 4:3 TVs at the 8x screen height viewing distance. This information may seem a bit extraneous at this point in this guide, but its relevancy will become apparent during our later discussion on anamorphic scaling.
Aspect Ratio of Film
Unlike TV, there are no set standard aspect ratios that film must conform to, although there are certain sizes that are most often used.
Prior to the early 1950?s, most movies were filmed at an aspect ratio of 1.37:1, which is very close to the 4:3 (1.33:1) aspect ratio of standard television. In fact, it is very likely that during the years in which television technology was first developed, the 4:3 aspect ratio of the ?little screen? was modeled after what was then the shape of popular ?silver screen.? However, today?s cinematic productions are most often filmed in a widescreen format. The most common aspect ratio in use today for film is 1.85:1, but there are other, frequently used ratios as well, including 1.66:1 and 2.35:1.
Obviously, films recorded to video can be better displayed on new, widescreen HDTV displays that come closer to approximating the films? original aspect ratio. However, new HDTV displays do not exactly match the proportions of original film. (HDTV?s 16:9 screen shape is equal to a 1.77:1 aspect ratio.) Furthermore, when making any transfer of film to video, traditional NTSC and PAL standards are still used ? even if the video will ultimately be displayed in a widescreen format. In other words, film converted to video must be displayed within the structure of a 4:3 video frame, even if the video will ultimately be displayed on a 16:9 screen. These incompatibility issues can be eased through the use a video scaler.
Converting Between Aspect Ratios
There are many different conversion and display permutations that must be addressed in a thorough discussion of this topic. Let?s look at each of them individually.
Displaying Widescreen Images on a 4:3 Screen:
Until the majority of us replace our current televisions with new, widescreen models, this will continue to be the most common conversion challenge. By now, we are all accustomed to viewing some videos, TV shows and even commercials in the widescreen format. In order to fit a widescreen image on a 4:3 screen, two black bands appear along the top and bottom of the screen, effectively reducing the height of the viewable image so that a widescreen aspect ratio is achieved. This technique is called letterboxing. Whether the desired aspect ratio is 16:9, 1.85:1 or even the very wide 2.35:1, the same black banding technique is used. The only change is that the height of the bands increases proportionally as the aspect ratio of the viewable image becomes ?wider.?
Displaying 4:3 Images on a 16:9 Display:
Just as horizontal black bars must be used for displaying widescreen images on a 4:3 screen, vertical black bars must be used on the sides of the screen to display 4:3 images on a widescreen display. Otherwise, the image would appear unnaturally stretched horizontally to fill the screen.
As the majority of source material currently available for viewing, including most TV shows and videotapes, is still provided in a 4:3 format, widescreen displays generally have a built-in feature that allows them to display 4:3 video in this manner.
Displaying Widescreen Images on a 16:9 Display:
Displaying widescreen video on a widescreen display would seem to be pretty basic. However, in reality, this is the most confusing conversion challenge with the most variables to consider.
First of all, remember that the 16:9 widescreen aspect ratio of new HDTV displays does not exactly match the most common film ratios of 1.66:1, 1.85:1 or 2.35:1. So, in a technique similar to the letterboxing discussed earlier, black bars can be used to adjust the aspect ratio of the viewable area of the screen.
The exception to this rule is when the original film was shot at a 1.66:1 aspect ratio. In this case, because the 16:9 screen has a wider aspect ratio than the source material, vertical bars must be used to allow for full viewing of the 1.66:1 image without any distortion.
Displaying Widescreen Images Using Anamorphic Scaling
Earlier in the booklet, we made the point that HDTV is intended to be viewed from a relatively close distance; approximately three times the height of the screen. This is because the wide aspect ratio of HDTV is designed to provide the viewer with a wider angle of viewing that requires the use of more peripheral vision. Our peripheral vision is highly sensitive to motion and the viewing experience becomes much more lifelike when images on the screen are processed by both our ?straight-on? and ?peripheral? sight. However, if we are to sit that close to the screen, it is imperative that the quality of the displayed image be as crisp and detailed as possible. Any distortions or artifacts will be much more noticeable than if we were sitting eight times the screen height away from the display ? the ?traditional? distance we currently sit for standard 4:3 NTSC or PAL video.
However, even though many films are now being provided on DVD in widescreen format, the DVD makers are still forced to conform with the 4:3 standard when making the conversion from film to video. This is because almost all consumer level DVD players can only read and process standard 4:3 video in traditional NTSC, PAL or SECAM formats. While the most obvious way to record these DVDs would be using the letterbox techniques just described, there is another method, called anamorphic scaling, which provides more vertical detail of the original image to be recorded and therefore allows widescreen displays to provide a higher quality output that shows the director?s original intent.
First let?s look at what happens if we just use standard letterboxing to record a wide screen film to video. We?ll use NTSC for this example, but the same points pertain for PAL and SECAM video using slightly different math calculations.
NTSC video has 483 visible lines. When widescreen images are recorded to video using letterbox formatting, the black bars at the top and bottom of the display take up some of these 483 lines, and only the remaining lines are used to display information about the image.
When these images are fed to a higher resolution device, such as an HDTV display, the video must be scaled to increase the number of lines to match the higher number of lines in the targeted display. There is no single standard established for the number of lines in today?s HDTV displays, but two common resolutions are 720 lines and 1080 lines. While the scaling process makes the video appear to be of a higher quality, the fact is that it is originating from source material that is of an even lower vertical resolution than standard NTSC (fewer horizontal lines). The scaling process combined with a high resolution HDTV display can only do so much to disguise this fact.
However, there is a way to improve the quality of the widescreen image when it is recorded and stored in a 4:3 standard video format. We can compress the image horizontally, so that the full height of the 4:3 video frame is used. The following diagram shows how a 16:9 image might be horizontally compressed to fit within the confines of a 4:3 video frame.
Obviously, if this compressed image is displayed on a standard 4:3 monitor, the image will appear distorted, with everything looking taller and thinner than it should be. But what if we uncompress the video back to its original proportions before displaying it on a widescreen HDTV monitor? The image would still need to be scaled to match the number of lines in the HDTV display, but the original source material would now have a full 483 lines instead of only 362, as in the previous letterbox example. This is a full 33% more lines, providing 33% more information and vertical detail about the original material. Obviously this will allow the final output on the HDTV display to be of a much higher quality.
In conclusion, there are many issues to consider when displaying an image that was originally produced in one aspect ratio, stored and transmitted in another, and ultimately displayed in a third. Unintended geometric distortions of the displayed image can occur if not processed properly. But, with knowledge of the image?s format and the targeted display, an intelligent video scaler can easily produce the desired results ? and ultimately, an enhanced viewing experience.
For information on Communications Specialties? full line of Deuce? Intelligent Video Scalers, visit www.commspecial.com or contact CSI headquarters at: Phone: (631)273-0404, Fax: (631)273-1638, Email: firstname.lastname@example.org .
Reprinted from the Advanced Video Scaling EduGuide, copyright 2001 Communications Specialties, Inc.
 NTSC, the standard used in N. America and Japan, has a refresh rate of 60Hz. PAL, the standard used in most of the remainder of the world, has a refresh rate of 50Hz. In this case, the second field appears 1/50th of a second after the first field
 The actual speed is 29.97 frames per second, but we?ll use 30 frames per second for the sake of simplicity.