Optimizing Baseline Profile in H.264/AVC Video Coding by Parallel Programming and Fast

TABLE OF ACRONYMS

ASO Arbitrary slice ordering
AVC Advanced video coding
CABAC Context adaptive binary arithmetic coding
CAVLC Context adaptive variable length coding
CBP Coded block pattern
CIF Common intermediate format
DCT Discrete cosine transform
FMO Flexible macro block ordering
IEC International electro technical commission
I-frame Intra frame
ITU-T International telecommunication union
ISO International organization for standardization
JM Joint model
JVT Joint video team
MB Micro block
MPEG Moving picture experts group
MSE Mean square error
NAL Network abstraction layer
PSNR Peak signal to noise ratio
QCIF Quarter common intermediate format
QP Quantization parameter
RDO Rate distortion optimization
RS Redundant slices
SATD Sum of absolute transformed differences
SSIM Structural similarity index metric
VCEG Video coding experts group
VLC Variable length coding

Optimizing Baseline Profile in H.264/AVC Video Coding by Parallel Programming and Fast Intra and Inter predictions

OBJECTIVE:

In this project, thecomputational complexity and encoding time of baseline profile of H.264 are reduced by using parallel programming inencoding video frames[1], [7], instead of sequentially encoding and then by using fast adaptive termination (FAT) algorithmin intra and inter predictions[2][15].

FAT algorithm in intra prediction is executed by using simple directional masks and neighboring modes [2],[8], [15]and in inter prediction mode decision and motion estimation by adapting minimum rate distortion (RD) cost of both skip and non-skip modes and an early-skip mode detection test is proposed for skip mode and a three-stage scheme is proposed to speed up the mode decision process for non-skip mode[3],[9], [15], [20].

INTRODUCTION:

H.264 also known as MPEG(Moving picture experts group) Part10/ AVC“ (MPEG-4’s advanced video coding)” was jointly published in 2003 by International standards bodies - International Telecommunication Union (ITU-T) [17], International Organization for Standardization and International Electro-Technical Commission (ISO / IEC) called as Joint Video Team (JVT) [4].

It has many advantages over previous coding standards MPEG-2 [13] and MPEG-4[14], like significant rate distortion efficiency, achieving higher bit rate reduction, error resilience and most networks friendly compared to other standards.

H.264 - PROFILES:

H.264 has three major profiles which are the baseline, main and extended andin addition to the four high profiles namely High, High 10 [11], High 4:2:2 [11], and High 4:4:4 [5] [11] asgiven in the figure1[5] [11].

-Baseline profile is applicable in real-time conversational services such as video conferencing and videophone. [5] [11]

-Main profile is designed for digital storage media and television broadcasting [5] [11].

-Extended profile targets multimedia services over the internet [5] [11].

-High, High 10, High 4:2:2, and High 4:4:4 [11] are used in the fidelity range extensions for applications such as content-contribution, content-distribution, and studio editing and post-processing respectively [5] [11].

Fig.1: Various profiles of H.264 [5]

Encoder and Decoder of H.264:

A H.264 is a codec i.e., a combination of encoder and decoder complimenting each other to achieve the required compression and better picture quality. H.264 encoder converts the video into a compressed format (.264 formats) and a decoder converts the compressed video back into an uncompressed format with very few losses.

A H.264 video encoder carries out prediction, transform and encoding processes to produce a compressed video form as in the figure2 [6].

Fig. 2 H.264 encoder block diagram [6]

After encoding, the coded video data is organized into network abstraction layer (NAL) containing NAL units, each of which is effectively a packet that contains an integer number of bytes. Each NAL unit is a collection of SLICES which is a group ofmacro blocks(MB) representing MB type, prediction information, coded block pattern (CBP), residual coefficients and quantization parameter (QP) as explained in figure 3 [4].

Fig. 3: NAL unit interface between encoder and decoder [4]

A H.264 video decoder function is to re-produce video sequence by carrying out the complementary functions of the encoder i.e., decoding, inverse transformation and reconstruction as explained in the figure3a[6].

Fig. 3a H.264 decoder block diagram [6]

Prediction Modes:

The prediction modes in H.264 can be categorized as intra prediction (I), inter prediction (P) and their combination.

INTRA PREDICTION MODE:

An intra (I) macro block is a coded reference tothe data only in the current slice. I macro blocks may occur in any slice type. In an intra MB the luma component can be selected in 3 ways, namely 16 × 16,

8 × 8 or 4 × 4. A single prediction block is generated for each chroma componentas shown in table 1 [4]. The modes of prediction for 16 × 16 MB are given in table 2 and figure 4 [4][5].

Table 1: Various intra prediction block sizes and properties. [4]

Table 2: 16x16 luma prediction modes and properties.[4]

Fig 4: 16x16 luma prediction modes, all predicted fron pixelsH and V. [4]

8x8 (for Chroma) –

Mode 0 (DC): mean of upper and left-hand samples (H+V). [5]

Mode 1 (horizontal): extrapolation from left samples (V). [5]

Mode 2 vertical): extrapolation from upper samples (H). [5]

Mode 3 (Plane): a linear “plane” function is fitted to the upper and left-hand samples H and V. This works well in areas of smoothly-varying luminance. [5]The properties of the modes of prediction are given in table 3[4] and the pictorial representation in figure 5 [4Table 3: 4x4 luma prediction modes and properties[4]

Table 3: 4x4 luma prediction modes and properties [4]

Fig.5: 4x4 luma prediction (intra-prediction) modes in H.264[1]

(Pixels A through M which have been coded and reconstructed to form the prediction for the 4 x 4 block.)

INTER PREDICTION MODE:

Inter prediction is the process of predicting a block of luma and chroma samples in a current frane from samples already coded and transmitted from another frame or a reference frame. Initially a prediction region is selected, generating a prediction block and this is removed/subtracted from the original block of samples to form a residual that is then transformed, coded and transmitted along with the sample number or the reference sample[4].

The MB’s are split into four types [4] as shown in the figures 6 and 7.

(a)One 16x16 MB Partition.

(b)Two 8x16 MB Partitions.

(c)Two 16x8 MB Partitions.

(d)Four 8x8 Partitions and

(e)Combination of any of b, c and d.

Fig.6 Macro block partitions: 16x16, 8x16, 16x8, 8x8 [4]

Fig. 7 Macro block sub partitions: 8x8, 4x8, 8x4, 4x4 [4]

Rate Distortion Optimization (RDO) [6],[8],[9],[20]:

Once the prediction is obtained and residual is calculated for all the modes, the best mode among these modes is one which has least residual. The H.264/AVC encoder performs the rate-distortion optimization (RDO) technique for each macro block to obtain the best mode. [2]

Set macro block parameters : Quantization parameter (QP) and Lagrangian multiplier λ

Calculate : 0.85 x 2(QP-12)/3[2]………………………………..(1)

Then calculate the cost, which determines the best mode

Cost = D + λ MODE xR[2],………………………………………(2)

Where D – Distortionand R - Bit rate with given QP

Distortion (D) is obtained by SSD (sum of squared differences) between the original macro block and its reconstructed block.

Bit rate(R) includes the number of bits for the mode information and transforms coefficients for macro block.

Considering the RDO procedure for intra mode selection in H.264/AVC, the number of mode combinations in one macro block is

N8x (16x N4 + N16)=8x(16+16)=592

N8 – number of modes of an 8x8 chroma block

N4 – number of modes of a 4x4 luma block

N16 – number of modes of a 16x16 luma block

The H.264/AVC encoder carries out 592 RDO calculations to choose the best matching MB. As a result, the complexity of the encoder increases extremely[16].

INPUT FORMATS:

H.264 can compress planar and interleaved/packed raw image data (viz., yuv, rgb)and depending upon the video, it converts them into intermediate formats like CIF (common intermediate format), QCIF (quarter common intermediate format), Sub-QCIF and 4 CIF. But mostly CIF and QCIF are used here. The resolutions of the different formats are shown in the table 4 [4]. The resolutions of CIF and QCIF for 4:2:0 sampling are shown in figure 8 [18].

Table 4: Different intermediate formats[4]

Fig. 8 CIF and QCIF resolutions(Y, Cb, Cr), (4:2:0) [8][9]

QUALITY MEASUREMENT:

The major challenge is determining the quality of the image/video obtained as measuring visual quality using objective criteria gives accurate and repeatable results but as yet there are no objective measurement systems that completely reproduce the human visual system [11].

PSNR

Peak signal to noise ratio (PSNR) is measured on a logarithmic scale and depends on the mean squared error (MSE) between an original and a decoded/lossy image or video frame, relative to the square of the highest-possible signal value in the image, where n is the number of bits per image sample [11].

PSNRdB = 10 log10 ((2n − 1)2/ MSE)

This is easy to calculate and is widely used and the most popular measure of quality.

SSIM

The structural similarity index (SSIM) is a method to measure the similarity between two images.But, while calculating SSIM, the reference image used is assumed as a perfectone i.e., the original image without any artifacts. Hence, SSIM is measured by providing the original image or image which is the most close to original one [4].

OPTIMIZATION PROCESS OF BASELINE PROFILE:

H.264 provides the best compression but is computationally much more complex than any of the previous codecs and also time consuming for real time applications. So to make H.264 more adaptable for practical application, the encoding time is to be reduced. In this project, encoding time reduction is achieved by applying following methods simultaneously.

1. Parallel programming in baseline profile [7],

2. Fast algorithm for intra mode selection [8] and

3. Fast algorithm for inter mode selection [9][20].

Baseline profile is selected because of the ease of implementation and the important features of baseline profile are:

a)I and P slice coding.

b)Enhanced error resilience such as flexible macro block ordering (FMO) and arbitrary slice ordering(ASO) and redundant slices (RS).

c)Context adaptive variable length coding (CAVLC)

Baseline profile is primarily used for low-cost applications, for data loss robustness like video conferencing and videophone. The joint model (JM 18.0) implementation of the H.264 encoder is used in this project [10].

1. Parallel Programming in Baseline Profile [7]:

This parallel programming is done by considering several frames together for encoding. This can be achieved by

The strategy adopted for encoding the frames to beparallel is as follows:

Step1. Separate the total number of frames to encode into 2 equal sets.

Ex: If the total number of frames to be encoded is 30, then part ion is done as the frame numbers from 1 to 15into set 1 and frame numbers from 16 to 30into set 2 .

Step2. Perform the parallel intra coding on two frames in both partitions.

Ex:Frame 1 and frame 16 together. Frame 1 can be used as a reference frame for frame 2 and frame 16 can be used as a reference frame for frame 17 and so on.

Step3. Perform inter coding on frame 2 and frame 17 by incorporating changes in the encoding algorithm using OpenMP. Repeat for frame 3 and frame 18 and so on till all the frames are encoded,as given in the figure 9.

Fig.9: Parallel processing of frames to reduce encoding time[7]

2. Fast Algorithm for Intra Mode Selection[8]:

Proposed intra mode selection algorithm for a 4x4 luma block [12], [8]:

In figure 10, black dots indicate positions of the pixels to be computed for investing directional correlation in the 4x4 luma block, and arrows represent the directions of correlation associated with the corresponding mask. Since directions of the H.264/AVC intra-prediction are limited to 8 directions except DC mode, 8 directional masks are proposed instead of a precise edge detector such as Sobel operator [16]. One candidate mode with the minimum difference is selected[8].

Fig.10: The proposed directional masks for a 4.4 luma block. (a) Vertical, (b) Horizontal, (c)

Diagonal down left, (d) Diagonal down right, (e) Vertical right, (f) Horizontal down, (g) Vertical

left, (h) Horizontal up mask [8].

Fig. 11: Pixel indices and modes of adjacent blocks used in the proposed intra mode selection algorithm. (a) Indices used in (3) to (10) for a 4x4 luma block, (b) Modes of upper and left blocks for additional candidate modes [8].

Diff = |a – m| + |b – n| + |c – o| + |d – p|, for vertical direction, (3)

Diff = |a – d| + |e – h| + |i – l| + |m – p|, for horizontal direction, (4)

Diff = |c – i| + 2·|d – m| + |h – n|, for diagonal down left direction, (5)

Diff = |b – l| + 2·|a – p| + |e – o|, for diagonal down right direction, (6)

Diff = |a – n| + 2·|b – o| + |c – p|, for vertical right direction, (7)

Diff = |a – h| + 2·|e – l| + |i – p|,for horizontal down direction, (8)

Diff = |b – m| + 2·|c – n| + |d – o|, for vertical left direction, (9)

Diff = |e – d| + 2·|i – h| + |m – l|, for horizontal up direction, (10)

Where a to p denote the pixels for investing directional correlation associated with the corresponding mask of the indices for pixel positions used in (3) to (10) as shown in figure 10.Diff is used as a criterion for correlation, i.e., the direction with smaller Diff is the more correlated one. From the second observation, additional candidate modes are obtained by using mode information of adjacent blocks, where one is the upper block with the corresponding mode of modeA and the other is the left block with the corresponding mode of modeB, as shown in figure11 [8].

The additional modes are included namely modeA and modeB, to the candidate modes for RDO procedure. Since the directions in the H.264/AVC intraprediction are defined with the directional relation between current block and boundary pixels of adjacent blocks, instead of direction within the current block only. In this case, one mode when modeA and modeB are the same, or two modes when modeA and modeB are different from each other, is included in RDO procedure. [8]

To determine whether DC mode is included in RDO procedure or not, the sum(S) of differencebetween an average of current block to each pixel (pi) is considered (11).

Where the condition is , and pi is each pixel of current block.…….(11)

Condition 1: If S is smaller than a threshold, T1, RDO is carried out for at most 4 candidate modes, i.e., one mode from the proposed masks, at most two modes from adjacent blocks, and DC mode[8].

Condition 2: If S is larger than a threshold, T1, RDO is performed for at most 4 candidate modes, i.e., two modes from the proposed masks (with minimum and second minimum Diff) and at most two modes from adjacent blocks [8].

The proposed intra mode selection algorithm for a 4x4 lumablock is summarized as follows:

Step 1 - For a 4x4 luma block, obtain avg and S by (1). [8]

Step 2a - If S is larger than a threshold, T1, carry out RDO procedure for at most 4candidate modes: two modes with minimum and second minimum Diff by (3) to(10), and at most two modes from adjacent blocks. In this case, DC mode of adjacentblocks is excluded from RDO procedure [8].

Step 2b - If S is smaller than a threshold, T1, carry out RDO procedure for at most 4candidate modes: one mode with minimum Diff by (3) to (10), at most twomodes from adjacent blocks, and DC mode [8].

Proposed intra mode selection algorithm for a 16x16 luma block [12], [8]:

Step 1 - Examine sizes of adjacent blocks: if both blocks (upper block and left block) are 16x16, go to Step 2, otherwise go to Step 4 [8].

Step 2 - Examine modes of adjacent blocks: if both modes are same, go to Step 3, otherwise select the best mode for a 16x16 luma block, which results in the minimum SATD (sum of absolute transformed differences) between two adjacent modes of modeA and modeB [8].

Step 3 - If both adjacent modes are DC mode, go to Step 4, and otherwise select the best mode for a 16x16 luma block, which results in the minimum SATD between the adjacent mode and DC mode [8].

Step 4 - Let ΔV be a vertical difference between upper boundary pixels of the current block and boundary pixels of the upper block, and ΔH be a horizontal difference between left boundary pixels of the current block and boundary pixels of the left block as follows [8].

Where, ΔV = Σ |u(i)-q(i)| for i =0 to 15.

ΔH = Σ |l(i)-r(i)| for i =0 to 15.

u(i) -> upper block boundary pixels,

q(i) -> upper boundary pixels of current block,

l(i) -> boundary pixels of the left block,and

r(i) -> left boundary pixels of the current block.

Fig. 12: Calculation for ΔV and ΔH in 16x16 luma block [2] [8].

Obtain candidate modes by using two difference values, ΔV and ΔH: if |ΔV − ΔH | is smaller than 2xT2, candidate modes are DC mode and plane mode as shown in the figure 12; if (ΔV − ΔH) is larger than T2, candidate modes are DC mode and horizontal mode; if (ΔV − ΔH) is smaller than T2, candidate modes are DC and vertical mode, where T2 is a positive value. The threshold T2 is set equal to 32. Finally, select the best mode between each candidate mode by choosing the mode with minimum SATD.

3. Fast algorithm for Inter Mode Selection [9], [20]:

FAT for mode decision exploits statistical similarity between current macro block and predicted macro block. Predicted mode is obtained from the spatial and temporal macro blocks.

For accuracy, the rate distortion cost is checked against adaptive Threshold I and adaptive Threshold II

Adaptive Threshold I: RD thres = RD pred x (1-8xβ)

Adaptive Threshold II: RD thres = RD pred x (1+10xβ)

Such that

……….(4)

Where, β is the modulator, N is the rows of the image and M is number of columns of N X M MB.If the predicted mode is less than P8 x 8, it is checked if the current macro block is homogeneous or not. Further partitioning is done into 8x4, 4x8 and 4x4 blocks, if the current macro block is not homogenous.

A mode histogram from spatial and temporal neighboring macro blocks is obtained; then the best mode as the index corresponding to the maximum value in the mode histogram is selected.The average rate-distortion cost of each neighboring macro block corresponding to the best mode is then selected as the prediction cost for the current macro block[9], [20].

FAT Algorithm [9][20]:The algorithm is given in figure 13 and is explained below:

Step 1: If current macro block belongs to I slice, check for intra prediction using I4x4 or I16x16, go to step 10 else go to step 2.

Step 2: If a current macro block belongs to the first macro block in P slice check inter and intra prediction modes, go to step 10 else go to step 2.

Step 3: Compute mode histogram from neighboring spatial and temporal macro blocks, go to step 4.

Step 4: Select prediction mode as the index corresponding to maximum in the mode histogram and obtain values of adaptive Threshold I and adaptive Threshold II, go to step 5.

Step 5: Always check over P16x16 mode and check the conditions in the skip mode, if the conditions of skip mode are satisfied go to step 10, otherwise go to step 6.

Step 6: If all left, up , up-left and up-right have skip modes, then check the skip mode against, then check the skip mode against adaptive Threshold I if the rate distortion is less than adaptive Threshold I , the current macro block is labeled as skip mode and go to step 10, otherwise, go to step 7.