EE 5359 MULTIMEDIA PROCESSING

SPRING 2011

Proposal

FPGA IMPLEMENTATION OF H.264 VIDEO ENCODER

Under guidance of

DR K R RAO
DEPARTMENT OF ELECTRICAL ENGINEERING
UNIVERSITY OF TEXAS AT ARLINGTON

SPRING 2011

Presented by

KUSHAL KUNIGAL

1000662485

Proposal:

This project presents a detailed study on H.264 video encoder and the algorithms for evaluating the transform and quantization suitable for high speed implementation on FPGA/ASIC.Along with this, detailed architectures of intra prediction, integer transforms and quantization processors are presented.

Overview:

To achieve a real-time H.264 encoding solution, multiple FPGAs and programmable DSPs are often used. The computational complexity alone does not determine if a functional module should be mapped to hardware or remain in software. The architectural issues that influence the overall design decision are:

Data Locality: In a synchronous design, the ability to access memory in a particular order and granularity while minimizing the number of clock cycles due to latency, bus contention, alignment, DMA transfer rate and the types of memory used is very important. The data locality issue (Figure 1) is primarily dictated by the physical interfaces between the data unit and the arithmetic unit (or the processing engine) [3].

Fig 1:H.264 encoder block diagram [3].

Computational Complexity:Programmable DSPs are bounded in computational complexity, as measured by the clock rate of the processor. Signal processing algorithms implemented in the FPGA fabric are typically computationally-intensive.By mapping these modules onto the FPGA fabric, the host processor or the programmable DSP has the extra cycles for other algorithms.Furthermore, FPGAs can have multiple clock domains in the fabric, so selective hardware blocks can have separate clock speeds based on their computational requirements [3].

Modules in H.264 advanced Video Encoder:

Fig.2 shows the block diagram of H.264 encoder. Thereinthe modules designed in this work are shown in greyshades. An input frame or field Fnis processed in unitsof a macro block. A macro block consists of 16x16 pixels.Each macro block is encoded in intra or inter mode andfor each block in the macro block, a prediction P is formedbased on the reconstructed picture samples. In Intramode, P is formed from samples in the current slice thathave been previously constructed. In inter mode, P isformed by motion-compensated prediction from one ortwo reference picture(s) selected from the set of referencepictures.

Fig 2: Modules in H.264 video encoder [3].

Prediction modes in H.264 standard:

H.264 or MPEG-4 part 10 aims at coding video sequences at approximately half the bit rate compared to MPEG-2 at the same quality. It also aims at having significant improvements in coding efficiency using CABAC entropy coder, error robustness and network friendliness. Parameter set concept, arbitrary slice ordering, flexible macroblock structure, redundant pictures, switched predictive and switched intra pictures have contributed to error resilience / robustness of this standard. Adaptive (directional) intra prediction (Figs 3a-3i) is one of the factors which contributed to the high coding efficiency of this standard [2].

Different modes used for block prediction are shown in Fig.3

Fig 3a:Mode 0: vertical Fig 3b:Mode 1: horizontal

Fig 3c:Mode 2: DC

Fig 3d:Mode 3: diagonal down-leftFig 3e:Mode 4: diagonal down-right

Fig 3f:Mode 5: vertical-right Fig 3g:Mode 6: horizontal-down

Fig 3h: Mode 7: vertical-leftFig 3i: Mode 8: horizontal-up

Fig 3a-3h: Different prediction modes used for prediction H.264 [2].

The intra prediction reduces spatial redundancies byexploiting the spatial correlation between adjacent blocks in agiven picture. Each picture is divided into 16×16 pixel MB andeach MB is formed with luma components and chromacomponents. For luma components, the 16×16 pixel MB can bepartitioned into sixteen 4×4 blocks. The chroma componentsare predicted by 8×8 blocks with a similar prediction techniqueas the 16×16 luma prediction. There are 9 prediction modes for4×4 luma blocks and 4 prediction modes for 16×16 lumablocks. For the chroma components, there are 4 prediction

modes that are applied to the two 8×8 chroma blocks (U and V).

A. 4×4 Luma intra prediction modes

The MB is divided into sixteen 4×4 luma blocks and aprediction for each 4×4 luma block is applied individually. 4×4Luma intra prediction modes are well suited for coding ofparts of a picture with a significant detail. The nine differentprediction modes are supported as shown in Figs 3a-3j, where the

prediction values for pixels are calculated from the neighboringboundary pixel values. Each mode is suitable to predictdirectional structures in the picture at different angles (Fig. 3c). Note that DC is a special prediction mode, where the meanof the left handed and upper samples are used to predict the

entire block.For the8×8 intra prediction, 9 prediction modes are used which are thesame as that of 4×4 intra prediction. However, thecomputational complexity of the H.264 encoder is dramaticallyincreased according to this feature of the new extended profile.

B. 16×16 Luma Intra Prediction Modes

16×16 Luma Intra Prediction Modes are more suitable for coding very smooth areas of a picture by prediction for the whole luma component of a MB. Four different prediction modes are supported: Vertical, Horizontal, DC and Plane prediction. Plane prediction mode uses a linear function between the neighboring samples to the left and to the top in

order to predict the current samples.

C. 8×8 Chroma Intra Prediction Modes

The chroma intra prediction of a MB is similar to the 16×16luma intra prediction because the chroma signals are verysmooth in most cases. It is performed always on chroma blocksusing vertical prediction, horizontal prediction, DC-predictionor plane-prediction.

Concept:

By understanding the ideas and importance behindvideo compression, it is possible to use the idea and implement anefficient and high performance encoder, such that it consumersless power and take less clock cycles to encode an image frame.The implementation is considered a simple version of the H.264encoder, similar to the MPEG-4 part 2 digital video codec which isknown to achieving high data compression. The same buildingblocks implemented in the H.264 encoder will be used in the liteversion with exceptions of a few optimizing modifications.

Forexample the Motion Estimation algorithm, it was suggested to usea full search algorithm but after some research, it wasdiscovered that the motion estimation process consumes 66-94%of the cycles [4]. Therefore if there is any optimization to bemade, here would be the place to start.

Therefore, instead ofapplying the full search algorithm, an alternative algorithm wasused, which will be discussed under the background section. Themotion compensation would produce thepredicted frame fromthe motion vectors (from motion estimation) and referenceframe. The residual frame would be generated by the differencebetween the predicted frame and current frame. To compress thedata even further, discrete cosine transform (DCT), a type oflinear transform, will be performed on the residual frame. Inaddition, quantization will be used as well to compress the data.This project will be simulated and synthesized on the Xilinx 8.1ISE (to determine chip size and power consumption), ModelSim(to simulate and observe waveforms). The purpose of this implementation isto learn about video compression system andbe able to incorporate other simulation tools with VHDL along with improving the behavioral VHDLmodel and the FPGA designprocess. Hopefully, with this implementation of the H.264 encoder, it willresult in a design that is efficient and achieves high performance,the hardware design that runs faster, with less powerconsumption and smaller area.

Future work:

Going forward, the motion estimation algorithm will be analyzed from the hardware perspective along with the other modules of the encoder.

References:

[1] T. Wiegand, G. J. Sullivan, G. Bjøntegaard, and A. Luthra “Overview of the H.264/AVC Video Coding Standard”, IEEE Trans. on Circuits and Systems for Video Technology vol. 13, no. 7, pp.560–576, July 2003.

[2] AIC website:

[3]Data locality description:

[4] N. Keshaveni, S.Ramachandran and K. S. Gurumurthy“Design and FPGA Implementation of Integer Transform and QuantizationProcessor and Their Inverses for H.264 Video Encoder”, Advances in Computing, Control, & Telecommunication Technologies, 2009. ACT 2009. International Conference on, pp. 646-649, July 2009

[5] I. Richardson, “The H.264 advanced video compression standard”, Wiley, 2nd edition, 2010.

[6]DSP-Enabled efficient motion estimation for MobileMPEG-4 video encoding-

[7] T.Wiegand, G. J. Sullivan and G. Bjontegaard, and A. Luthra,"Overview of the H.264/AVC Video Coding Standard", IEEETransactions on Circuits and Systems for Video Technology, Vol. 13,No. 7, pp. 560-576, July 2003.

[8] T. Wedi and H. G. Musmann, "Motion- and aliasing-compensatedprediction for hybrid video coding," IEEE Transactions on Circuits andSystems for Video Technology, Vol. 13, No. 7, pp. 577- 586, July 2003.

[9] H. S. Malvar , A. Hallapuro, M. Karczewicz and L. Kerofsky,"Low-complexity transform and quantization in H.264/AVC," IEEETransactions on Circuits and Systems for Video Technology, Vol. 13,No. 7, pp. 598- 603, July 2003.

[10] S. Yeping and S. Ting, "Fast Multiple Reference Frame MotionEstimation for H.264/AVC," IEEE Transactions on Circuits andSystems for Video Technology,Vol. 16, no. 3, pp 447 – 452, June2006.

[11] ModelSim simulation software: .