Technical Overview of Video Standards
This appendix explains the basic concepts and usage of video parameters and standards.
Video Standards: NTSC and PAL
The video that you see on your television screen follows standards established in the 1950s when color television was first introduced. The leading formats in use today are NTSC (National Television System Committee) and PAL (Phase Alternating Line). Generally speaking, NTSC is the standard used in the Americas and Japan, whereas PAL is used in Europe, Australia, the Middle East, and Asia.
Neither video standard is optimal for presentation on computer monitors; each poses different challenges when you are trying to optimize video for web delivery:
- Frame Size: NTSC and PAL have different image sizes, both of which differ from the available image sizes of computer monitors.
- Frame Rate: NTSC and PAL have different frame rates for the display of images, both of which differ from those used by computer monitors.
- Pixel Aspect Ratio: NTSC and PAL share a pixel aspect ratio (referred to as D1 Aspect Ratio, which is essentially rectangular), but this ratio is different from that used by computer monitors (which is square).
- Display: NTSC and PAL consist of two separate “interlaced” fields, while computer monitors display “progressive” images.
The following table summarizes these differences:
|Image Size||Frame Rate||Aspect Ratio||Display|
|NTSC||720 x 480||29.97||D1||Interlaced|
|PAL||720 x 576||25||D1||Interlaced|
Conventional television screens are made up of horizontal lines, while computer monitors consist of a series of horizontal and vertical pixels. The standard line resolution for an NTSC television is 525 lines; for PAL, it is 576 lines. Most modern computer monitors have much higher vertical resolutions (measured in pixels), such as 768 or 1024, requiring vertical upscaling during playback in order to fill the monitor.
For NTSC video images, the SMPTE 259M professional standard specifies that the 525 lines be represented as 720 x 486—that is, 720 horizontal pixels by 486 vertical pixels. This default video size is commonly known as D1. Capturing footage with most modern video capture cards from a professional BetaSP or Digital Betacam source results in a D1-sized frame. Capturing footage from a DV (digital video) source, however, yields a 720 x 480 frame. The difference between the D1 spec and the DV spec is only 6 vertical pixels. Many compression algorithms, including DV compression, work best with pixel dimensions that are multiples of 16. By removing 6 pixels from the D1 resolution, the DV format achieved a native resolution with a multiple of 16.
For PAL video images, frames are always 720 x 576 pixels, regardless of video source. Because PAL’s vertical resolution, 576, is a multiple of 16, no change is necessary for DV compression.
Video is essentially a sequence of images flashed on the screen in rapid succession, giving the illusion of motion. The number of frames displayed every second is known as the frame rate, and it is measured in frames per second (fps). The higher the frame rate, the more frames per second will be used to display the sequence of images, resulting in smoother motion. The trade-off, however, is that higher frame rates require a larger amount of data to display the video, and therefore more bandwidth.
NTSC video is usually said to run at 30 fps, and PAL runs at 25 fps. Actually, NTSC runs at 29.97 fps. The reason for the odd frame rate dates back to the transition from black and white television to color TV signals, where the 29.97 fps rate was chosen to ensure backwards compatibility with existing television sets. There are still 30 frames, but they run 0.1 percent slower than actual time, resulting in a frame rate of 29.97 fps.
When working with compressed video in a format like Flash video, frame rate can affect the quality of the video in hard-to-predict ways depending on how you encode the video and its specific content. Lower frame rates ostensibly provide less content to encode, which theoretically improves quality or decreases file size. At the same time, however, it makes it more likely that there are noticeable changes from one frame to the next, which require more data to encode. If you lower the frame rate and leave the data rate unchanged, the video may appear to stutter and motion may look less fluid than desired.
Whenever the frame rate is reduced, it is always a good idea to use an evenly divisible ratio of the original frame rate. If your source has a frame rate of 24 fps, then reduce the frame rate to 12 fps, 8 fps, 6 fps, 4 fps, 3 fps, or 2 fps. If the source frame rate is 30 fps, in most cases you can adjust the frame rate to 30 fps, 15 fps, 10 fps, 6 fps, and so on. If your video is more than 10 minutes long, then audio will drift noticeably out of synch if you do not adhere to the 29.97 fps rate or an accurate even division for lower frame rates (such as 14.98, which is half of 29.97).
Pixel Aspect Ratio
The D1/DV NTSC and PAL specifications specify non-square pixels (often called D1 aspect ratio), while computer monitor pixels are square. D1 pixels are vertically shorter. For this reason, when you look at a D1 video image on a computer monitor, the images appear to be squashed vertically—making actors appear to be shorter. When this image is displayed on a broadcast monitor, the pixels are wider than they are tall and appear normal, as shown in the following image.
For this reason video images that are intended for display on computer monitors must be pixel-aspect-corrected by scaling the image to a valid 4:3 aspect ratio. For NTSC, the full square pixel resolution is 720 x 540 (vertical compensation), and for PAL it is 768 x 572 (horizontal compensation). Commonly used final video display resolutions on the Internet include 640 x 480, 512 x 384, 320 x 240, and 160 x 120.
Most video editing applications compensate for the pixel aspect ratio discrepancy by scaling the video image in real time while rendering it on the computer monitor. This is done because eventually the images are intended to return to television monitors for final display, and scaling the actual pixels in the video file would needlessly introduce a subtle distortion from the scaling operation. However, for web display, this real-time compensation is not a valid approach, given that the video sequence is destined to be displayed on a square pixel monitor, and as such should be hard-rendered to compensate for the discrepancy.
Interlaced and Progressive Video
Video images consist of two interlaced fields that together make up a frame. This approach was introduced when TV was first invented, due to a technical limitation that prevented a full frame from being “progressively” drawn on the monitor (from top to bottom) without a noticeable visual shuttering; as an image was being displayed, it appeared as though it was being wiped on the screen. By breaking up the image into two fields (halves) and displaying one after the other this artifact was eliminated. This legacy technique has been a tremendous obstacle in the digital age of video and computers, and has been eliminated from newer video standards for High Definition television, which are progressive (images are drawn in one pass from top to bottom). A single group of lines is known as a field. The two fields are referred to as the upper field and the lower field, or Field 1 and Field 2, or odd and even, or top and bottom; unfortunately there is no standard nomenclature.
With real video footage, the two interlaced fields often look very similar to each other, and no visible artifacts appear when looking at a video frame on a computer monitor. However, with video footage that includes high motion material that changes quickly (such as movement of the camera or of people in the frame) very noticeable field artifacts appear when the fields are displayed together on a computer monitor, giving the image a ghost-like quality. This is due to the composition of two moments of time together in one frame.
Therefore, to display crisp video on a computer monitor, video frames must be de-interlaced by eliminating one of the fields. Half the information of each frame is discarded and the remaining information doubled or interpolated. For NTSC, this results in 30 frames of 30 distinct points in time.
Modern video standards for digital television have eschewed interlacing in favor of progressive scan display techniques. Progressive scan video cameras usually have the ability to switch back from progressive scan to interlaced video, and most of these cameras have a variety of frame rates with and without interlacing. Typical frame rates are described as 60p (60 fps progressive), 30i (30 fps interlaced), 30p (30 fps progressive), and 24p (24 fps progressive). When working with progressive images there is no need to de-interlace footage before deploying to the web.