Multihypothesis Motion Pictures for H.26L

ITU - Telecommunications Standardization Sector
STUDY GROUP 16 Question 6
Video Coding Experts Group (VCEG)Video Coding Experts Group (Question 15)
______
Twelfth Meeting: Eibsee, Germany, 9-12 January, 2001Eleventh Meeting: Portland, Oregon, USA, 22-25 August, 2000 / Document VCEG-L240
Filename: VCEG-L240.doc
Generated: Jan 3. 200129 Dec ’00Document Q15-K-21
Filename: q15k21.doc
Generated: 08 Aug ’00
Question: / Q.6/SG16 (VCEG)Q.15/SG16
Source: / Markus FlierlThomas Wedi
Telecommunications Institute IInstitut fuerür Theoretische. Nachrichtentechnik
und Informationsverarbeitung,
University of Erlangen-NurembergUniversity of Hannover
Cauerstr. 7Applestr. 9A
D-9105830167 ErlangenHannover, Germany /
Tel:
Fax:
Email: /
+149 650511 72462 36475304
+149 650511 724762 36485333

Bernd Girod
Information Systems Laboratory
Stanford University
Stanford, CA 94305, USA / Tel:
Fax:
Email / +1 650 724 6354
+1 650 725 8286

Title: / Multihypothesis Motion Pictures for H.26LComplexity Reduced Motion Compensated Prediction with 1/8-pel Displacement Vector Resolution.1/8-pel motion vector resolution for H.26L with same decoder complexity than TML-5 with 1/4-pel motion vector resolutionComplexity reduced motion compensated prediction with 1/8-pel motion vector resolution for H.26L
Purpose: / Proposal

______

1 Introduction

Multihypothesis motion pictures are an extension of P pictures such that each block of a macroblock can be compensated by a linear combination of two motion-compensated blocks. Conventional B pictures also employ two linearly combined motion-compensated blocks but one motion-compensated signal (hypothesis) origins from a future reference frame. In contrast to B pictures, multihypothesis motion pictures (MH pictures) utilize temporally previous pictures for prediction and cause no extra coding delay. In addition, decoded MH pictures are also used for reference to predict future MH pictures.

In order to reduce the complexity for 1/8-pel motion vector (MV) (MV) resolution, the complexity of the interpolation process is reduced. Therefore new interpolation filters of 6-tap and 8-tap filter length for 1/8-pel resolutioninterpolations are introduced that directly interpolates a specific subpel position. With the 6-tap these new filters the decoder complexity of the 1/8-pel codec is the same than the decoder complexity of the TML-5 (1/4-pel) (TML) codec. Nevertheless, a gain up to 1.0 dB PSNR is obtained with the 1/8-pel MV resolution and the introduced interpolation filters compared to 1/4-pel MV-resolution in TML-5.

A further significant gain is obtained by increasing the complexity to 8-tabtap filters and by using a Hamming windowed Sinc filter.

Figure 1: Multihypothesis motion pictures: Two blocks of temporally previous pictures are utilized to predict the current MH picture. MH pictures are also used for reference to predict future MH pictures.

2 Multihypothesis Motion PicturesInterpolation Complexity in the decoder for 1/4-pel MV resolution in TML-5

Interpolation Complexity in the decoder for 1/4-pel MV resolution in TML-5

The proposed multihypothesis pictures extend the P pictures in In TML-5 (doc. QqQ15-Kkd59d1). MH pictures as well as P pictures use temporally previous pictures for prediction.

2.1 Multihypothesis Macroblock Modes

7 multihypothesis macroblock types are added to the standardized macroblock types for inter coding. The additional 7 types allow multihypothesis motion-compensated prediction for 7 different block sizes.

Figure 2: Macroblock modes for multihypothesis motion pictures. For each block a multi-hypothesis block pattern (MHBP) indicates one or two hypotheses.

For the macroblock types in TML-5 one reference frame parameter is assigned to the entire macroblock. For the additional multihypothesis macroblock types each block has its own reference parameter.

2.2 Multihypothesis Block Pattern

Each block can be compensated by one hypothesis (conventional motion compensation) or two hypotheses. A multihypothesis block pattern (MHBP) for each macroblock mode indicates the number of hypotheses for each block.

the interpolation process is described as a subsequent interpolations with two different interpolation filters. The first interpolation filter is a 6-tap filter that interpolates the image signal on 1/2-pel positions and the second one is a bilinear filter that interpolates the image signal on 1/4-pel positions. For an upsampling process, where the whole image is sampled up, this representation may be ais good choice. But if only a few subpel positions have to be interpolated (e.g. in the decoder), these few subpel positions should be interpolated directly in order to reduce the interpolation complexity.

IE.g. iIf the decoder receives a displacement vector for a block with a fractional-pel resolution (e.g. 1/42½-pel or 1/84¼-pel resolutiohn) the motion compensation in the decoder only has to interpolate the image signal on the specific subpel position, the vector refers to. Figure 1Figure 1Figure 1 shows a one-dimensional example of a subsequent and direct and indirect interpolation of a 1/2½-pel and a 1/4-pel sample in a 1/4¼-pel MV resolution scheme.

Figure 1: One-dimensional visualisation of direct interpolation of
a 1/2-pel and a 1/4-pel sample.

The upper 6 arrows denote the 6-tap interpolation with the filter coefficients (a3,a2,a1,a1,a2,a3) for the interpolation of the 1/2½-pel sample. The lower two arrows denote the bilinear interpolation with the filter-coefficients (b1,b1) for the interpolation of the ¼-pel sample that uses the result of the 1/2 ½-pel interpolation with the 6-tabtap filter. This corresponds to the subsequent interpolation scheme described in TML-5 []. In order to reduce the complexity of the 1/4¼-pel sample interpolation, the interpolation should be done in one step. Thus, the intermediate result of the 1/2½-pel sample hasn’t got to be computed. The 1/4¼-pel sample can be computed directly from the six fullpel samples. The coefficients of such an filter in dependence of the coefficients of the 6-tap and bilinear filter are depicted in Figure 1Figure 1. In Table 1 the coefficients of the direct interpolaition filters for TML-5 are depicted.

Subpel-postion / Filter Coefficients
1/41/4 / (1, -5, 52, 20, -5, 1)/64
2/4 / (2, -10, 40, 40, -10, 2)/64
3/43/4 / (1, -5, 20, 52, -5, 1)/64

Table 1: Filter coefficients for direct interpolation of different 1/4-pel positions in TML-5

In order to interpolate a two dimensional image, the one-dimensional interpolation filters are applied first horizontal and after this vertically (TML-5, doc. q15k59).

3 Syntax

For the multihypothesis pictures 7 multihypothesis macroblock types are added. A multihypothesis block pattern indicates one hypothesis or two hypotheses for each block. The multihypothesis block pattern is dependent on the macroblock type. When the MHBP indicates one hypothesis for a block in the macroblock, motion vector data and a reference frame parameter is specified. When the MHBP indicates two hypotheses for a block, two motion vectors and two reference frame parameters are indicated.

3.1 Additional Inter Macroblock Types

The additional multihypothesis macroblock types use the code numbers of the universal VLC as specified in Table 2.

Table 2: Macroblock types for the multihypothesis pictures

3.2 Multihypothesis Block Pattern

The multihypothesis block pattern utilizes the universal VLC:

Code number / MHBP
0 / One hypothesis
1 / Two hypotheses

Table 3: Multihypothesis block pattern

3.3 Reference Frame

One-hypothesis blocks as well as two-hypothesis blocks have independent of the block size individual reference frame parameters and reference frame parameter pairs. The universal VLC code numbers as specified for TML-5 are used for signaling the reference frames.

4 Encoder

The encoder has to determine the multihypothesis macroblock type and the number of hypotheses for each block. For an one-hypothesis block the motion vector data and reference frame parameter are determined by rate-constrained long-term memory motion estimation. An integer-pel accurate estimate for all reference frames is refined to half-pel and quarter-pel accuracy.

For a two-hypothesis block the motion vectors and reference frame parameters are determined by rate-constrained multihypothesis motion estimation.

4.1 Rate-Constrained Multihypothesis Motion Estimation

Rate-constrained multihypothesis motion estimation is performed by an iterative algorithm. The solution for the one-hypothesis block is used for initialization. The algorithm works as follows:

1. One hypothesis is fixed and long-term memory motion estimation is applied to the complementary hypothesis such that the multihypothesis rate-distortion costs are minimized.

2. The complementary hypothesis is fixed and the first hypothesis is optimized.

The two steps are repeated until convergence. Usually, the algorithm converges very fast after 1-2 iterations.

For the conditional motion estimation an integer-pel accurate estimate for all reference frames is refined to half-pel and quarter-pel accuracy.

4.2 Hypothesis Mode Decision

A reliable decision whether one or two hypotheses per block are efficient also depends on the encoding of the prediction residual. The cost function for the decision is the sum of the SAD of the reconstructed block and the weighted total bit-rate of the block. The bit-rate is weighted by the Lagrange multipier

The value QUANTH.263 is specified in TML-5 document.

4.3 Prediction of Vector Components

The prediction of vector components is also important for multihypothesis blocks. The median prediction as specified in TML-5 is also used. In the case of two hypotheses we utilize the motion vector data of the second hypothesis. We use the motion vector data independent of the reference frame.

4.4 Computational Complexity

The computational complexity of a two-hypothesis block is just 2-4 times the complexity of an one-hypothesis block as the iterative algorithm is initialized with the one-hypothesis solution. But this complexity can be further reduced by efficient search strategies.

5 DecoderDirect iInterpolation for 1/84-pel MV resolution (2x6tab filter)

For decoding multihypothesis pictures the decoder has to add two motion compensated signals. When the multihypothesis block pattern indicates a two-hypothesis block the pixel values of the two motion compensated blocks are added and divided by 2 (integer division). No additional memory is required.

For the 1/8-pel interpolation ist is proposed to use direct interpolaition filters, which directly interpolate one specific subpel position. For the direct interpolation two different Hamming windowed Sinc filters are tested. One with a 6-tap filter length and one with a 8-tapb filter length. With the 6-tap versions, the

In order to achieve the same interpolation complexity for 1/8-pel interpolation than in TML-4 is achievedfor 1/4-pel interpolation direct interpolation filters are introduced. . Each of the introduced direct interpolation filters for 1/8-pel resolutions have a filter length of 6-tap, which is exactly the same filter length than the direct interpolation filters of the TML would have. Therefore the complexity of the direct interpolation is the same than in TML.

In Table ? the applied direct interpolation for 1/8-pel resolutions are depicted.

Subpel-Position / Filter Coefficients
1/8
2/8
3/8
4/8 / (2, -10, 40, 40, -10, 2)/64
5/8
6/8
7/8

6 Experimental Results

The implementation of the multihypothesis pictures is based on the software TML4.3 (tml4_r2.zip). The software TML4.3 is also used to generate the reference data with best compression effort. The simulations are performed with all specified block sizes and 5 reference frames. The search range for the motion estimation is [–16,…,16]x[-16,…,16]. The test-sequences are encoded with fixed quantizer for all macroblocks. Up to now, no deblocking filter is used for multihypothesis blocks.

Table 2 shows the test-sequences that are used for the experimtental results.

Test-sequence / Res. / Frame rate / No.ofNo. of frames
Mobile & Calendar / CIF / 30 / 300
ForemanContainer / QCIF / 10 / 1300
Mobile & CalendarNews / QCIF / 10 / 1300
NewsForeman / QCIF / 10 / 1300

Table 4: Applied test-sequences.

Table 2: Applied test-sequences.

Table 4 shows the test-sequences that are used for the experimental results. For each sequence one rate-distortion plot is given in the Appendix. The rate-distortion plots contain the following curves:

· TML: The TML4.3 software with best compression effort.

· MH-Pictures: The proposed extension of the TML P pictures.

Bit-rate savings up to 10% are observed for the sequence Mobile & Calendar in QCIF resolution with 10 fps and also in CIF resolution with 30 fps. For the sequences Foreman and News in QCIF resolution with 10 fps bit-rate savings up to 7% are observed.TML: 1/4-pel with 6-tap and bilinear interpolation filter (62)

· 1/8-pel with 6-tap and bilinear interpolation filter (66b) (same as in q15k21)

· 1/8-pel with 8-tap and bilinear interpolation filter (88b) (same as in q15k21)

· 1/8-pel with direct interpolation andwith 6-tap filters (6-tap direct interp.)

· 1/8-pel with direct interpolation and 8-tap filters (8-tap direct interp.)

7 ConclusionsSummary

The proposed multihypothesis pictures extend the H.26L P pictures and achieve additional compression efficiency. For the test-sequences up to 10% bit-rate savings are measured. MH pictures utilize temporally previous pictures and cause therefore no extra coding delay. Decoded MH pictures are also used for reference to predict future MH pictures. In addition, multiple reference frames allow efficient hypothesis pairs.

The proposed iterative algorithm for multihypothesis motion estimation has low complexity and applies techniques for standard motion estimation. In order to For a multihypothesis block the decoder has to compensate and add two blocks.

Subject to standardization is an architecture providing sufficient flexibility for two motion-compensated signals per block at the encoder and means for signaling the multihypothesis data at reasonable costs to the decoder.reduce the complexity for 1/8-pel motion vector (MV) resolution, the complexity of the interpolation process is reduced. Therefore new interpolation filters of 6-tap and 8-tap filter length for 1/8-pel interpolations are introduced that directly interpolates a specific subpel position. With the 6-tap filters the decoder complexity of the 1/8-pel codec is the same than the decoder complexity of the TML-5 (1/4-pel) codec. Nevertheless, a gain up to 1.0 dB PSNR is obtained with the 1/8-pel MV resolution and the 6-tap interpolation filters compared to 1/4-pel MV-resolution in TML-5. A further significant gain is obtained by increasing the complexity to 8-tap interpolation filters.

8 Further Work

The integration of multihypothesis pictures into the current TML version is neccessary and the deblocking filter for the multihypothesis blocks has to be added. In addition, the following items might improve the coding efficiency:

· Encode pairs of picture references with one universal VLC codeword.

· Efficient encoding of the multihypothesis block pattern (MHBP). Because of the universal VLC, 3 bits are used to signal a multihypothesis block.

· Improved algorithms for multihypothesis motion estimation.