Polytechnic University, Dept. Electrical and Computer Engineering

Polytechnic University, Dept. Electrical and Computer Engineering

EE4414 Multimedia Communication System II

Fall 2005, Yao Wang

______

Homework 6 Solution (Video Coding Standards)

Reading Assignment:

· Lecture slides

· K. R. Rao, Z. S. Bojkovic, D. A. Milovanovic, Multimedia Communication Systems: Techniques, Standards, and Networks, Prentice Hall PTR, 2002. (Chap.5)

· Excerpts from EL514-Multimedia Lab Manual on video coding

Written Assignment (Quiz based on selected problems on 11/1)

What is the target application of the H.320 standard? What is the video coding standard used in H.320?

H.320 is the standard for audio-visual conferencing/telephony over the ISDN channels. This standard is mainly used for tele-conferencing for business/education. At the time H.320 is developed, the H.261 standard is developed for video coding. But in the new systems, H.263 can also be used.

What is the target application of the H.323 standard? What is the video coding standard used in H.323?

H.323 is the standard for audio-visual conferencing/telephony over the packet switched networks that does not provide guaranteed quality of services, mainly the Internet. It allows both H.263 and H.261 standards for video coding, but H.263 is preferred.

What is the target application of the H.324 standard? What is the video coding standard used in H.324?

H.324 is the standard for audio-visual conferencing/telephony through the circuit switched telephone networks, including both wired and wireless phone modems. It allows both H.263 and H.261 standard for video coding, but H.263 is preferred.

What are the main differences between H.320, H.323 and H.324 applications in terms of available bandwidth and delay variation?

The H.320 and H.324 standard are targeted for circuit switched networks which have dedicated channels allocated to a particular communication session. Therefore the available bandwidth is fixed and the delay variation is small. Because of this fixed bandwidth and delay, the quality of the received audio and video quality stays fairly constant in time. The H.320 uses the ISDN channel, with rates much higher that that affordable by either wired or wireless modems used by the H.324 system. The ISDN channels are also very reliable (very low bit error rates). The H.320 system is mainly used within a large corporation. The channel quality for H.324 applications depend on the underlying application: the wired channels are more reliable than the wireless channels. For wireless channels, a large portion of the bandwidth is used for channel error correction, so that the available bandwidth for sending audio and video signals is much lower.

The H.323 standard is targeted for the packet switched networks that do not guarantee quality of services, with large variations in available bandwidth and the end-to-end delay. Because of this variation, the quality of the received audio and video signals can vary greatly in time.

H.261 and H.263 video coding standards differ mainly in how motion estimation is performed. Describe some of the techniques adopted in H.263 that helped improve its coding efficiency over H.261.

The H.261 standard performs motion estimation at the integer-pel accuracy only. It also has a relatively low search range (-16,16). The H.263 allows half-pel accuracy motion estimation. It allows a larger maximum search range (-32,32). It also allows a 16x16 MB to be divided to 4 8x8 blocks for more accurate motion estimation, which is helpful when an MB contains multiple objects with different motions. It also allows for, as an option, overlapping block motion compensation, which can suppress the blocking artifacts in predicted images. All of these techniques help to improve the prediction accuracy and consequently the coding efficiency.

What is the target application of MPEG-1? What are the different parts of MPEG-1 standard?

MPEG-1 is initially developed to enable storage of a 2-hour movie on a CD. This is the standard used to produce VCD. But MPEG-1 is also used now to distribute video together with audio over the Internet. The MPEG-1 standard contains many parts, including a video coding part, and an audio-coding part, and a system part which deals with how to synchronize audio and video.

Describe some of the differences between MPEG-1 and H.261/H.263 video coding standards?

H.261 and H.263 are targeted for two-way video conferencing/telephony, which has stringent delay requirement. This low-delay requirement rules out coding a frame as a B-frame, which requires coding a future frame first and causes quite large delay. Both the encoder and decoder have to be able to process a video in real time to enable effective communication between people at different locations. Therefore the encoder and decoder both cannot be overly complex. Also the bit stream should have a fairly constant bit rate, so as not to cause large delay variation due to transmission. This requirement forbids the video encoder to insert I-frames periodically, rather only I-blocks when necessary (either for coding efficiency or for error resilience). MPEG-1 on the other hand is targeted for viewing a video that is either pre-compressed or live compressed, but does not involve two-way communication. The encoder is located at the originator of the video content and can be fairly complex. The bit rate can have large variation as long as the decoder has a large buffer. The viewer just needs a decoder to view the compressed video. But the compressed video bitstream should allow random access (fast forward, rewind, etc.). With MPEG-1 (and MPEG-2), this is enabled by organizing video frames into group of pictures (GoP), with each picture starting with an I-frame followed by P-frames. Between P-frames, the encoder can also use B-frames for enhanced coding efficiency and random access capability.

What is the target application of MPEG-2? What are the different parts of MPEG-2 standard?

When the MPEG-2 standard was first developed, the major target application is to store a 2-hr video in the BT.601 resolution (704x480) on a DVD with quality comparable or better than broadcast TV. Later on, the scope of the standard was expanded to consider broadcasting of video (SD and HD) over the air or cable and other types of networks. The MPEG-2 standards includes a system part, a video coding part, an audio coding part, and several parts dealing with the transmission aspect.

Describe some of the differences between MPEG-1 and MPEG-2 video coding standard?

The main difference is that MPEG2 must handle interlaced video. For this purpose, different modes of motion estimation and DCT scanning were developed that can handle interlaced sequences more efficiently. In addition, MPEG2 standard has options that allow a video to be coded into two layers. It also has a profile dealing with how to code stereo video or more generally multiple view videos.

What are the different ways that MPEG-2 uses to generate two layer video? Explain each briefly.

MPEG-2 has 4 ways to generate layered video: 1) data partition: a video is first coded using conventional method into one bit stream. Then the bits for each MB is split between a base layer and enhancement layer. The base layer includes the header and motion information and first few low DCT coefficients, the enhancement layer includes the remaining coefficients. The base layer alone yields a some what blurred version of the original video. The enhancement layer includes the detail information, and, when added to the base layer, provides a more clear representation. 2) SNR scalability: Each frame of a video is first coded using the conventional method but with a large quantization step size. The resulting bits constitute the base layer. Then the quantization error for the DCT coefficients are quantized again using a smaller step size. The resulting bits constitute the enhancement layer. The base layer alone yields a coarsely quantized version of the original video, the enhancement layer together with the base layer yields a more accurate version. 3) Spatial scalability: the base layer codes a down-sized version of the original video, the enhancement layer codes the original size, but with each frame predicted from either the past coded frame in the original size, or the interpolated version of the current frame produced by the base layer, or a weighted some of both. The enhancement layer together with the base layer yields the original size video (with coding artifacts). 4) Temporal scalability: the base layer codes the original video (say 30 frames/s) at a lower frame rate (say 10 frames/s), the enhancement layer codes the skipped frames, using either the coded frames in the base-layer, or the past coded frames in the enhancement layer for motion compensated temporal prediction.

MPEG-4 video coding standard uses the so-called “object-based coding”. Describe what it is and how a receiving user may make use of it? What are the three types of information contained in each object?

With object-based coding, a video sequence is decomposed into multiple objects, and each object is coded separately. This enables the encoder to code different objects with different accuracy. For example, the foreground moving object can be coded more accurately than background. The receiver can choose to compose the objects as desired. It can choose not to decode certain objects, change the viewing angle for one or several objects when displaying a sequence, or replace an object with some other pre-stored objects. The information transmitted for each object includes its shape, its motion, and its texture (the color intensities in the initial frame and the prediction errors in following frames).

Describe some techniques incorporated in H.264 that helped improving its coding efficiency over H.263/MPEG-4.

1) Intra-prediction. In the prior standards, in the INTRA mode, the pixels in a block are coded directly through transform coding, therefore not exploiting any spatial correlation that may exist between pixels in this block and pixels in adjacent blocks. Intra-prediction in H.264 makes use of this correlation. Different intra prediction modes are introduced to consider correlation along different directions. 2) Integer transforms: instead of the DCT, H.264 uses an integer version of the DCT, which approximates DCT but all computations can be done through integer operation. This helps to eliminate any numerical errors between the forward transform at the encoder and the inverse transform at the decoder. Also the transform block size can be varied from block to block depending on which one gives the best representation. 3) More accurate motion estimation with 1/8 pel search step size, and variable block sizes from 16x16 down to 4x4. Also instead of using a single reference frame, one can choose among several reference frames. The bidirectional prediction is generalized to allow prediction from past two reference frames with any weighting. 4) More efficient arithmetic coding. 5) Deblocking filtering to remover blocking artifacts in reconstructed images.