ITU - Telecommunications Standardization Sector
STUDY GROUP 16 Question 6
Video Coding Experts Group (VCEG)Video Coding Experts Group (Question 15)
______
Twelfth Meeting: Eibsee, Germany, 9-12 January, 200114th Meeting: Santa Barbara, CA, USA, 24-27 Sep., 2001Eleventh Meeting: Portland, Oregon, USA, 22-25 August, 2000 / Document VCEG-N40L20
Filename: VCEG-N40L20.doc
Generated: Sept 18. 200129 Dec ’00Document Q15-K-21
Filename: q15k21.doc
Generated: 08 Aug ’00
Question: / Q.6/SG16 (VCEG)Q.15/SG16
Source: / Markus FlierlThomas Wedi
Telecommunications Institute IInstitut fuerür Theoretische. Nachrichtentechnik
und Informationsverarbeitung,
University of Erlangen-NurembergUniversity of Hannover
Cauerstr. 7Applestr. 9A
D-9105830167 ErlangenHannover, Germany /
Tel:
Fax:
Email: /
+149 650511 72362 34765304
+149 650511 724762 36485333

Bernd Girod
Information Systems Laboratory
Stanford University
Stanford, CA 94305, USA / Tel:
Fax:
Email / +1 650 724 6354
+1 650 725 8286

Title: / Multihypothesis Prediction for B framesComplexity Reduced Motion Compensated Prediction with 1/8-pel Displacement Vector Resolution.1/8-pel motion vector resolution for H.26L with same decoder complexity than TML-5 with 1/4-pel motion vector resolutionComplexity reduced motion compensated prediction with 1/8-pel motion vector resolution for H.26L
Purpose: / ProposalProposal

______

1  Introduction

At present, B frames in TML-8 [1] allow multiple reference frames for forward motion vector data only. The backward motion vectors always refer to one temporally subsequent P frame. In this document, the generic approach to predict from an arbitrary reference frame is applied to backward prediction in B frames in order to improve coding efficiency. Prediction from multiple subsequent pictures also improves error resilience in streaming systems as shown and proposed in [2].

B frames in TML-8 explicitly distinguish between forward and backward prediction types. And the bi-directional prediction type only allows a linear combination of a forward/backward prediction signal pair. When having multiple temporally previous and subsequent reference frames, the reference frame parameter will implicitly provide the distinction between forward and backward prediction and will additionally enable multihypothesis prediction.

This document investigates the coding efficiency of B frames that additionally allow prediction from multiple temporally subsequent pictures. The bi-directional mode is replaced by a multihypothesis mode that allows not only a linear combination of forward and backward prediction signals but also combinations of two forward or two backward prediction signals.

The document is organized as follows: Section 2 discusses prediction from multiple temporally subsequent pictures. Section 3 outlines the differences between the bi-directional macroblock mode in TML-8 and the utilized multihypothesis mode. Section 4 is dedicated to encoder issues and Section 5 captures the simulation conditions for the experimental results in Appendix A.

2  Prediction from Multiple Temporally Subsequent Pictures

Assume a picture type sequence as follows:

P1 / B2 / P3 / B4 / P5 / B6 / P7 / B8 / P9

In TML-8, a forward prediction of B4 might use reference frames P1 and P3, whereas a backward prediction might only stem from P5. Prediction from multiple temporally subsequent pictures will allow backward prediction of B4 by using reference frames P5, P7, and P9.

The particular temporally subsequent picture will be indicated by an additional backward reference frame syntax element:

Code number / Backward reference frame
0 / The next temporally subsequent frame (1 frame forward)
1 / 2 frames forward
2 / 3 frames forward
… / …

As long as the macroblock types for B pictures explicitly distinguish between forward and backward prediction, a backward reference frame syntax element is sufficient. For forward prediction, the forward reference frame syntax element in TML-8 remains unchanged.

In order to reduce the complexity for 1/8-pel motion vector (MV) (MV) resolution, the complexity of the interpolation process is reduced. Therefore new interpolation filters of 6-tap and 8-tap filter length for 1/8-pel resolutioninterpolations are introduced that directly interpolates a specific subpel position. With the 6-tap these new filters the decoder complexity of the 1/8-pel codec is the same than the decoder complexity of the TML-5 (1/4-pel) (TML) codec. Nevertheless, a gain up to 1.0 dB PSNR is obtained with the 1/8-pel MV resolution and the introduced interpolation filters compared to 1/4-pel MV-resolution in TML-5.

A further significant gain is obtained by increasing the complexity to 8-tabtap filters and by using a Hamming windowed Sinc filter.

3  Bi-Directional vs. Multihypothesis ModeInterpolation Complexity in the decoder for 1/4-pel MV resolution in TML-5

In the following section, we will outline the differences between the bi-directional macroblock mode in TML-8 and the utilized multihypothesis mode.

3.1  Bi-Directional Mode

The bi-directional prediction type in TML-8 only allows a linear combination of a forward/ backward prediction signal pair. Forward prediction may utilize multiple reference frames whereas backward prediction is limited to the next temporally subsequent reference frame.

The current implementation of the encoder estimates independently forward and backward prediction signals. There is room for improvement as a joint estimation of forward and backward prediction signals increases prediction efficiency as investigated in [3].

3.2  Multihypothesis Mode

For this mode, we drop the restriction of the bi-directional mode to allow only linear combinations of forward and backward pairs. The additional combinations (forward, forward) and (backward, backward) are obtained by extending the unidirectional picture reference syntax element to an bi-directional picture reference syntax element:

Code number / Reference frame
0 / The temporally last decoded frame (1 frame back)
1 / The temporally subsequent decoded frame (1 frame forward)
2 / 2 frames back
3 / 2 frames forward
4 / 3 frames back
5 / 3 frames forward
... / ...

Utilizing this bi-directional picture reference element, a generic prediction signal, which we will call macrohypothesis, can be formed by signaling the syntax fields Ref_frame, Blk_size, and MVD.

The investigated multihypothesis mode just signals two macrohypothesis to form the prediction signal. The structure of the multihypothesis mode is depicted in Figure 1.

Figure 1: Structure of the multihypothesis mode. Two macrohypotheses are signaled.

The multihypothesis mode includes the bi-directional prediction mode when the first macrohypothesis origins from a temporally prior reference frame and the second from the temporally subsequent reference frame. The multihypothesis mode has the advantage that efficient reference frame pairs can be signaled to the decoder. It is demonstrated in the experiments that efficient reference frame pairs improve the coding efficiency of B frames.Interpolation Complexity in the decoder for 1/4-pel MV resolution in TML-5

In TML-5 (doc. qQ15kd59)

the interpolation process is described as a subsequent interpolations with two different interpolation filters. The first interpolation filter is a 6-tap filter that interpolates the image signal on 1/2-pel positions and the second one is a bilinear filter that interpolates the image signal on 1/4-pel positions. For an upsampling process, where the whole image is sampled up, this representation may be ais good choice. But if only a few subpel positions have to be interpolated (e.g. in the decoder), these few subpel positions should be interpolated directly in order to reduce the interpolation complexity.

IE.g. iIf the decoder receives a displacement vector for a block with a fractional-pel resolution (e.g. 1/42½-pel or 1/84¼-pel resolutiohn) the motion compensation in the decoder only has to interpolate the image signal on the specific subpel position, the vector refers to. Figure 1Figure 1Figure 1 shows a one-dimensional example of a subsequent and direct and indirect interpolation of a 1/2½-pel and a 1/4-pel sample in a 1/4¼-pel MV resolution scheme.

Figure 1: One-dimensional visualisation of direct interpolation of
a 1/2-pel and a 1/4-pel sample.

The upper 6 arrows denote the 6-tap interpolation with the filter coefficients (a3,a2,a1,a1,a2,a3) for the interpolation of the 1/2½-pel sample. The lower two arrows denote the bilinear interpolation with the filter-coefficients (b1,b1) for the interpolation of the ¼-pel sample that uses the result of the 1/2 ½-pel interpolation with the 6-tabtap filter. This corresponds to the subsequent interpolation scheme described in TML-5 []. In order to reduce the complexity of the 1/4¼-pel sample interpolation, the interpolation should be done in one step. Thus, the intermediate result of the 1/2½-pel sample hasn’t got to be computed. The 1/4¼-pel sample can be computed directly from the six fullpel samples. The coefficients of such an filter in dependence of the coefficients of the 6-tap and bilinear filter are depicted in Figure 1Figure 1. In Table 1 the coefficients of the direct interpolaition filters for TML-5 are depicted.

Subpel-postion / Filter Coefficients
1/41/4 / (1, -5, 52, 20, -5, 1)/64
2/4 / (2, -10, 40, 40, -10, 2)/64
3/43/4 / (1, -5, 20, 52, -5, 1)/64

Table 1: Filter coefficients for direct interpolation of different 1/4-pel positions in TML-5

In order to interpolate a two dimensional image, the one-dimensional interpolation filters are applied first horizontal and after this vertically (TML-5, doc. q15k59).

5  Encoder

The encoder operates in a rate-distortion efficient setting with rate-constrained motion estimation and mode decision. The estimation of efficient macrohypothesis pairs is accomplished by a low complexity interative algorithm that utilizes estimation tools already used in the TML software.

Figure 2: Average number of iterations vs. quantization parameter.

The iterative algorithm performs conditional rate-constrained motion estimation and is initialized with the most efficient single macrohypothesis. The algorithm continues with:

1.  One macrohypothesis is fixed and both reference frame and block size estimation with motion search is applied to the complementary macrohypothesis such that the multihypothesis costs are minimized.

2.  The complementary macrohypothesis is fixed and the first macrohypothesis is optimized.

The two steps are repeated until convergence.

Figure 2 shows the average number of iterations for multihypothesis motion estimation over the quantization parameter for the CIF sequence Mobile with 30 fps. It takes about 2 iterations to achieve an error smaller than 0.5% relative to the error in the previous iteration. The algorithm converges faster for higher quantization parameter values.Direct iInterpolation for 1/84-pel MV resol

For the 1/8-pel interpolation ist is proposed to use direct interpolaition filters, which directly interpolate one specific subpel position. For the direct interpolation two different Hamming windowed Sinc filters are tested. One with a 6-tap filter length and one with a 8-tapb filter length. With the 6-tap versions, the

In order to achieve the same interpolation complexity for 1/8-pel interpolation than in TML-4 is achievedfor 1/4-pel interpolation direct interpolation filters are introduced. . Each of the introduced direct interpolation filters for 1/8-pel resolutions have a filter length of 6-tap, which is exactly the same filter length than the direct interpolation filters of the TML would have. Therefore the complexity of the direct interpolation is the same than in TML.

In Table ? the applied direct interpolation for 1/8-pel resolutions are depicted.

Subpel-Position / Filter Coefficients
1/8
2/8
3/8
4/8 / (2, -10, 40, 40, -10, 2)/64
5/8
6/8
7/8

6  SimulationsExperimental Results

The implementation of the prediction from multiple subsequent pictures and the multihypothesis mode is based on the software TML 8.0. This software is also used to generate the anchor data. We insert two B frames between successive P frames. The P frames do not utilize the multihypothesis mode as discussed in [4,5]. Moreover, they are identical to the anchor P frames. Thus, we provide in the experimental results the rate-distortion performance of the B frames only. The simulations are performed with all specified block sizes and 5 temporally prior reference frames. For motion estimation, the search range is 16 and the accuracy quarter pel. UMV are enabled. The encoder uses the high-complexity mode for motion estimation and mode decision. So far, UVLC codes are utilized. The sequences are encoded with constant quantizer (12, 14, 16, 20). Table 2 shows the test-sequences that are used for the experimtental results.

Table 2: Applied test-sequences.

For each test sequence specified in the recommended simulation conditions for H.26L [6] one rate-distortion plot is given in the Appendix A. R esults for the sequence Flower Garden are also provided. The rate-distortion plots show luminance PSNR vs. kbits/frame for the B frames, and depictcontain the following curves:

·  TML 8, B frames, 1 subseq. picture, bi-dir: Anchor with TML 8.0 software and coded picture sequence IBBP. Only one temporally subsequent frame is used and no backward reference frame element is coded. The bi-directional prediction mode is enabled.

·  TML 8, B frames, 3 subseq. pictures, MH: Investigated extension with 3 temporally subsequent reference pictures. The coded picture sequence is also IBBP. The bi-directional prediction mode is replaced by the multihypothesis mode.

Bit-rate savings up to 14% over the TML-8 anchor are observed for the sequence Container, up to 13% for the sequence Mobile, up to 9 % for the sequence Flower Garden, and up to 8% for the sequence Foreman. (News 7%, Paris and Tempete 6%, Silent Voice 5%) Overall, the compression efficiency improves for increasing bit-rate. TML: 1/4-pel with 6-tap and bilinear interpolation filter (62)

·  1/8-pel with 6-tap and bilinear interpolation filter (66b) (same as in q15k21)

·  1/8-pel with 8-tap and bilinear interpolation filter (88b) (same as in q15k21)

·  1/8-pel with direct interpolation andwith 6-tap filters (6-tap direct interp.)

·  1/8-pel with direct interpolation and 8-tap filters (8-tap direct interp.)

7  ConclusionsSummary

This document investigates the coding efficiency of B frames that additionally allow prediction from multiple temporally subsequent pictures. In addition, the bi-directional mode is replaced by a multihypothesis mode that allows not only a linear combination of forward and backward prediction signals but also combinations of two forward or two backward prediction signals. For that, a bi-directional picture reference syntax element is required. Multiple subsequent reference frames allow also efficient macrohypothesis pairs. The experimental results in this document show up to 14% bit-rate savings with 3 temporally subsequent reference frames and MH mode.

8  Patent Statement

We believe that Netergy Microsystems holds granted patents and pending applications, whose use would be required to implement this multiple reference picture technique.

Netergy Microsystems has indicated that they are prepared to grant a license to an unrestricted number of applicants on a worldwide, non-discriminatory basis and on reasonable terms and conditions to manufacture, use and/ or sell implementations of the above proposal.