MPEG-2 to H.264 Transcoder in the BASELINE PROFILE

MPEG-2 to H.264 Transcoder in the BASELINE PROFILE

HardeepsinhJadeja

Uta id:1000721847

Abstract—Main aim of the project is to develop a highly-efficient MPEG-2 to H.264 transcoder for the baseline profile in the spatial domain. Machine learning tools are used to exploit the correlation between the macroblock(MB) decision of the H.264 video format and the distribution ofthe motion compensated residual in MPEG-2[1]. Moreover, a dynamic motion estimation technique is also used tofurther speed-up the decision process and therebyreducing the computational requirements by up to 90%[1], whilemaintaining the same coding efficiency.

I.Introduction

MPEG-2 [2] is nowadays the video compression standard most widely used, and the one into which a huge amount of money in infrastructure has been invested, e.g., DVD, cable and satellite television, video-on-demand services [1].Nevertheless, new and heterogeneous network technologies characterized by a wide range of transmission and error rateshave entered the market: wireless local area networks, 3Gcellular networks, and more. This has given origin to a new coding standard denominated by H.264/advanced video coding(AVC) or MPEG-4 part 10 [2]. This new standard is capable, among other things, of maintaining the same quality of a sequence encoded using MPEG-2 while reducing the space necessary to store up the video material by half, and therefore, the network bandwidth for transmitting theencoded sequence. The research in the video transcodingfield is particularly interesting due to the ample diffusion and use ofMPEG-2 at the present time, and a clear interest by the industryon technologies to facilitate the migration from MPEG-2 toH.264 [5]. Nevertheless, the big differences existing betweenthem make this task much more computational-intensive than in the case of other type of transcoders. This project presentsan efficient MPEG-2 to H.264 Video Transcoder which will be applied in H.264 baseline profile[1]. The baseline profile ischosen for two main reasons: 1) it is the common profile used in most real-time applications, such as video, mobile TV, and video conference among others, and 2) it is one of the most common profiles used in the research.The various H.264 profiles available are shown in Fig.1.

Fig.1 Various profiles of H.264 [7].

II. Basic Transcoder Operation

One of the most basic structure of the transcoder is a MPEG-2 decoder followed by and H.264 encoder.First the input MPEG-2 bit stream is fully decoded and then it is given to H.264 encoder[1].Fig.2 given below shows the block diagram of basic transcoder. Fig.3gives an insight about H.264 encoder.

input bit streamdecoded H.264 encoded

streamstream

Fig.2Transcoder block diagram.

Fig.3H.264 Encoder basic block structure[4]

Motion estimation in H.264 in transcoder

In H.264, inter frame motion estimation is performed usingdifferent MB sizes from 16 × 16 to 4 × 4 [7]. For each MB(as shown in Fig.4),all different sizes are evaluated and the one leading to theminimum rate-distortion (RD) cost is then selected[1]. This “tryall and select the best” philosophy guarantees the optimalblock size for the final encoding but at the expense of ahigh-computational cost. Within the architecture of an MPEG-2 to H.264 transcoder, the inter prediction in this scenarioneeds to be enhanced. One of the most relevant mechanismshas been introduced by Chen et al[8].

Fig.4H.264 encoder with all possible macroblock structure for motion estimation[4].

III. MPEG-2 to H.264 Video Transcoder

In this Project, focus is on the Inter predictionprocess: the most computational intensive task involved inthe transcoding process.Macroblock coding mode decisions in H.264 videoexhibit a high correlation with the distribution of the motioncompensated residual in MPEG-2 video[1]. Data mining toolsis used to exploit this existing correlation and derive decisiontrees for classifying the incoming MPEG-2 macroblocks into oneof H.264 coding mode[10]. The starting point of this process issimply based on a “look-and-feel” approach of the statisticsapplied to the MPEG-2 motion residual information[10].

A. Data Mining: The Decision Tree

Binary trees are more flexible allowing the development ofan H.264 encoder capable of identifying the best macroblock(MB) modedecision in the bin. TheJRip rule learner (Weka’s implementation of the ripperrule learner [12], proposed by William Cohen), a fastalgorithm for learning “IF–THEN” rules, will be used for creating the rules of the different nodes in the decision tree. Ripper,like most classification learners, requires a set of examplesto be represented as a vector of relatively simple features forlearning and creating the rules.The MPEG-2 video is decoded gathering the decodedMPEG-2 macroblock mode decision, the coded block pattern, andthe MPEG-2 motion vectors. Moreover, additional operationshere are needed for training, such as the mean and the varianceof the 4×4 sub-blocks of the MPEG-2 motion residual macroblock;the variance of the means, and the mean of the variances foreach group of 4 means or 4 variances, are also used as partof the variables to be included for creating the tree. The classvariable used for classifying the samples will be the decisionsmade by the H.264 reference software encoder, among thetraining sequence. Fig.5 below shows us step by step how do we build a decision tree using training sequences.

Fig.5Training algorithm for making decision tree[10]

Fig.6 Decision tree for deciding the macro block mode of H.264 encoder[1].

Fig.6 showsan unbalanced binary decisiontree based on a non-pruning JRip algorithm, with twokinds of nodes: blue nodes force the encoder to follow path in the encoding process, while the white ones are set free for choosing the one exhibiting the minimum cost overthe rate distortion algorithm implemented in the JM referencesoftware encoder.

B. Dynamic Motion Window

The motion estimation computation at the encoder dependslargely on the search range used in the motion estimationprocess. A smaller search range reduces the computationefforts but it could reduce the video quality when achievinga sub-optimal prediction. Adapting the search range basedon the motion vector of the incoming MB could reducethe motion estimation computation without severely affectingthe peak signal-to-noise ratio (PSNR). To reduce the searchrange adaptively for every MB, an approach called dynamicmotion window (DMW)or dynamic search windowis used.Dynamic motion window is combination of dynamic search area, based on the length of the MPEG-2motion vectors, and by exploiting the MPEG-2 motion vectors orientation.

Fig.6 Dynamic motion window [11]

The size of the window is proportional to the length of motion vector,since the smallestsearch window possible for the smallest motion vector is a window of 1 for a motion vector of 1, resulting in a square shaped search area of side 2 (Figure 6.)

C= (H/2)=((C12+C22)/2) [11].

H=length of side of square.

The value of C gives the dynamic search window for a motion vector.The search window is formed by the square with a diagonal perpendicular to the motion vector and with the two sides of the square at 45o to the motion vector. The search window C is given by Equation stated above and is derived using the Pythagoras theorem. If C is less than 1, the search window is set to 1. The dynamic search range of the window thus will be ± C pixels around the end point of the motion vector.

IV.FutureWork :

Future work will consist of making use of above mentioned Data mining tool and Dynamic motion window and to develop an efficient MPEG-2 to H.264 transcoder.The transcoder will then be tested on series of MPEG-2 sequences and parameters like time required ,psnr and bitrate will be calculated and compared with the available results from previous works.

V.Simulation results:

Table 1.below shows result in terms of Time,PSNR and Bit rate of proposed transcoder when compared with cascaded MPEG-2 to H.264 reference transcoder [1].This result provide a good analysis of performance of proposed transcoder.

Table 1.

Time,PSNR and Bit rate for CIF and QCIF sequences [1]

References

G. Fernández-Escribano,and H. Kalva ,”An MPEG-2 to H.264 Video Transcoder in the baseline profile”,IEEE Transactions on circuits and systems for video technology,vol. 20, no. 5,pp. 763-768, May 2010.
Generic Coding of Moving Picture and Associated Audio, ISO/IEC 13818-2, MPEG-2 Draft International Standard, 1994.
Advanced Video Coding for Generic Audiovisual Services, ITU-T Rec. H.264, May 2003.
T.Wiegand, and Gary J. Sullivan, “Overview of H.264/AVC video coding standards,” IEEE Transactions on circuits and systems for video technology, vol. 13, no. 7,pp. 560-576, July 2003.
Reference Software to Committee Draft, JVT-F100 JM14.0, Joint Video Team (JVT) of ISO/IEC MPEG and ITU-T VCEG, 2008.
T.Wiegand,etal,“Rate-constrained coder control and comparison of video coding standards,” IEEE Transactions on circuits and systems for video technology, vol. 13, no. 7, pp. 688–703, Jul. 2003.
I.E.G. Richardson, “H.264/MPEG-4, part 10,” in H.264 and MPEG-4 Video Compression, Hoboken, NJ: Wiley, pp. 159–223,2003.
G. Chen,et al, “Efficient block size selection for MPEG-2 to H.264 transcoding,” in Proc. 12thAnn. Assoc. Compute. Machinery International. Conference Multimedia, New York, pp. 300–303, Oct. 10–16, 2004.
G. Fernández-Escribano, et al, “A fast MB mode decision algorithm for MPEG-2 to H.264 P-frame transcoding,” IEEE Transactions on circuits and systems, vol. 18, no. 2, pp. 172–185, Feb. 2008.
G. Fernández-Escribano, et al, “Low-complexity heterogeneous video transcoding using data mining,” IEEE Tranactions. Multimedia, vol. 10, no. 2, pp. 286–299, Feb. 2008.
G. Fernández-Escribano, et al, “Reducing motion estimation complexity in MPEG-2 to H.264 transcoding,” in Proc. IEEE International Conference Multimedia Expo (ICME), Beijing, China, , pp. 440–443, Jul. 2–5, 2007.
W. W. Cohen and Y. Singer, “A simple, fast, and effective rule learner,” in Proc. 16th Natl. Conf. Artificial Intelligence., Orlando, FL, pp. 335–342,Jul. 18–22, 1999.
M. Isnardi,”MPEG-2 video compression“ Sarnoff corporation,Nov 29,1999.