Presented to the Faculty of the Graduate School Of

H.265 (HEVC) BITSTREAM TO H.264 (MPEG 4 AVC)
BITSTREAMTRANSCODER

DEEPAK HINGOLE

Presented to the Faculty of the Graduate School of

The University of Texas at Arlington in Partial Fulfillment

of the Requirements

for the Degree of

MASTER OF SCIENCE IN ELECTRICAL ENGINEERING

THE UNIVERSITY OF TEXAS AT ARLINGTON

December 2015

ACKNOWLEDGEMENTS

I would like to express my heartfelt gratitude to my advisor Dr. K. R. Rao for his unwavering support, encouragement, supervision and valuable inputs throughout this research work. He has been a constant source of inspiration for me to pursue this research work.

I would also like to extend my gratitude to my colleagues in Adobe Systems Incorporated for their invaluable industrial insight and experience to help me understand and grow in the field of digital video processing.

Additionally I would like to thank Dr. Schizas and Dr. Dillonfor serving as members of my graduate committee.

A big thank you to Vasavee, Rohith, Maitri, Shwetha, Srikanthand Uma,my Multimedia Processing Lab mates for providing valuable suggestions during the course of my research work.

Last but not least; I would like to thank my parents, my siblings and my close friends for believing in me and supporting me in this undertaking. I wish for their continued support in future.

November 25, 2015

ABSTRACT

H.265 (HEVC) BITSTREAM TO H.264 (MPEG 4 AVC)
BITSTREAM TRANSCODER

Deepak Hingole, MS

The University of Texas at Arlington, 2015

Supervising Professor: K. R. Rao

With every new video coding standard the general rule of thumb has been to maintain same video quality at a reduced bit rate of about 50% as compared to the previous standard.

H.265 is the latest video coding standard with support for encoding videos with wide range of resolutions, starting from low resolution to beyond High Definition i.e. 4k or 8k. H.265 also known as HEVC was preceded by H.264 which is very well established and widely used standard in industry and finds its applications in broadcast, storage and multimedia telephony. Currently almost all devices including low power handheld mobile devices have capabilities to decode H.264 encoded bitstream.

HEVC achieves high coding efficiency at the cost of increased implementation complexity and not all devices have hardware powerful enough to process (decode) this HEVC bitstream. In order for HEVC coded content to be played on devices with support for H.264, transcoding of HEVC bitstream to H.264 bitstream is necessary.

Different transcoding architectures are investigated and an easy to implement scheme is studied as part of this research. The scheme is implemented reusing existing reference softwares for H.264 and H.265. Different quality metrices (PSNR, Bitrate) are measured for proposed scheme for different test sequences and conclusionsare drawn based on these results. The conclusions talks about how far the scheme was able to meet quality expectations.

TABLE OF CONTENTS

ACKNOWLEDGEMENTS

ABSTRACT

LIST OF ILLUSTRATIONS

LIST OF TABLES

Chapter 1 Introduction

1.1.Basics of Video Compression

1.2.Need for Video Compression

1.3.Video Coding Standards

1.4.Thesis Outline

Chapter 2 Overview Of H.264

2.1.Introduction

2.2.Profiles and Levels in H.264

2.2.1.Profiles in H.264

2.2.1.1.Baseline profile

2.2.1.2.Main profile

2.2.1.3.Extended profile

2.2.1.4.High profile

2.2.2.Levels in H.264

2.3.H.264 Encoder

2.4.H.264 Decoder

2.5.Summary

Chapter 3 Overview Of HEVC

3.1Introduction

3.2Profiles and levels in H.265

3.3H.265 Encoder and Decoder

3.3.1Coding Tree Units (CTU) and Coding Tree Block (CTB)

3.3.2Coding Units (CU) and Coding Blocks (CB)

3.3.3Prediction Units (PU) and Prediction Blocks (PB)

3.3.4Transform Units (TU) and Transform Blocks (TB)

3.3.5Motion Vector Signaling

3.3.6Motion Compensation

3.3.7Intra-picture prediction

3.3.8Quantization Control

3.3.9Entropy Coding

3.3.10In-loop Deblocking Filtering

3.3.11Sample Adaptive Offset (SAO)

3.4High-Level Syntax Architecture

3.4.1Parameter Set Structure

3.4.2NAL unit syntax structure

3.4.3Slices

3.4.4Supplemental Enhancement Information (SEI) and Video Usability Information (VUI) metadata

3.5Parallel Processing Features

3.5.1Tiles

3.5.2Wavefront Parallel Processing (WPP)

3.5.3Dependent slices

3.6Summary

Chapter 4 Transcoding

4.1Introduction

4.2Transcoding Architectures

4.2.1.Open Loop Transcoding Architecture

4.2.2.Closed-loop Transcoding Architecture

4.2.3.Cascaded Pixel-domain architecture

4.2.4.Motion Compensation in the DCT Domain

4.3.Choice of Transcoding Architecture

4.4.Summary

Chapter 5 Results

5.1Quality Metrics For Cascaded Decoder Encoder Implementation

5.2Peak-Signal-To-Noise-Ratio (PSNR) versus Quantization Parameter (QP)

5.3Bitrate versus Quantization Parameter

5.4Rate Distortion (R-D) Plot

5.5Summary

Chapter 6 Conclusions and Future Work

APPENDIX A Test Sequences

APPENDIX B Test Conditions

Test Environment

APPENDIX C Acronyms

REFERENCES

BIOGRAPHICAL INFORMATION

LIST OF ILLUSTRATIONS

Figure 1—1: I-, P- and B- frames [2]

Figure 1—2: Evolution of video coding standards [45]

Figure 2—1: Different profiles in H.264 with distribution of various coding tools among the profiles [6]

Figure 2—2: H.264 Encoder block diagram [12]

Figure 2—3: Nine prediction modes for 4×4 Luma block [12]

Figure 2—4: H.264 Decoder block diagram [12]

Figure 3—1: 4:2:0 Subsampling [46]

Figure 3—2: Typical HEVC video encoder (with decoder modeling elements shaded in light gray) [13]

Figure 3—3: HEVC Decoder block diagram [43]

Figure 3—4: Integer and fractional sample positions for luma interpolation [13]

Figure 3—4 shows the fractional sample interpolation used in HEVC

Figure 3—5: Modes and directional orientations for intra-picture prediction [13].

Figure 3—6: Subdivision of a picture into slices [13]

Figure 3—7: Subdivision of a picture into tiles [13]

Figure 3—8: Illustration of Wavefront Parallel Processing [13]

Figure 4—1: Open Loop, partial decoding to DCT coefficients then requantize [16]

Figure 4—2: Closed-loop, drift compensation for requantized data [16]

Figure 4—3: Cascade decoder encoder architecture [17]

Figure 4—4: Frame based comparison of open loop, closed loop and cascaded pixel domain architecture [16]

Figure 4—5: General block diagram for proposed transcoding scheme

Figure 5—1: PSNR (dB) versus QP for akiyo_cif.y4m

Figure 5—2: PSNR (dB) versus QP for city_cif.y4m

Figure 5—3: PSNR (dB) versus QP for crew_cif.y4m

Figure 5—4: PSNR (dB) versus QP for flower_cif.y4m

Figure 5—5: PSNR (dB) versus QP for football_cif.y4m

Figure 5—6: Bitrate (kbps) versus QP for akiyo_cif.y4m

Figure 5—7: Bitrate (kbps) versus QP for city_cif.y4m

Figure 5—8: Bitrate (kbps) versus QP for crew_cif.y4m

Figure 5—9: Bitrate (kbps) versus QP for flower_cif.y4m

Figure 5—10: Bitrate (kbps) versus QP for football_cif.y4m

Figure 5—11: R-D plot for akiyo_cif.y4m

Figure 5—12: R-D plot for city_cif.y4m

Figure 5—13: R-D plot for crew_cif.y4m

Figure 5—14: R-D plot for flower_cif.y4m

Figure 5—15: R-D plot for football_cif.y4m

LIST OF TABLES

Table 21 Levels in H.264 [44]

Table 31 Levels limits for Main profile in HEVC [13]

Table 32 : Filter Coefficients for Luma Fractional Sample Interpolation [13]

Table 51 akiyo_cif.y4m sequence quality metrics

Table 52 city_cif.y4m sequence quality metrics

Table 53 crew_cif.y4m sequence quality metrics

Table 54 flower_cif.y4m sequence quality metrics

Table 55 football_cif.y4m sequence quality metrics

Chapter 1Introduction

1.1.Basics of Video Compression

Like many other recent technological developments, the emergence of video and image coding in the mass market is due to convergence of a number of areas. Cheap and powerful processors, fast network access, the ubiquitous Internet and a large-scale research and standardization effort have all contributed to the development of image and video coding technologies[1].

Video can be thought of as a series of images displayed at a constant interval. This constant interval also known as frame rate or frames per second (FPS) is an important factor in video technology [2].

The objective of any compression scheme is to represent the data in a compact form. Representation of data in reduced number of bits is achieved through the exploitation of various redundancies present in data. In case of video, we have spatial and temporal redundancies apart from statistical and perceptual redundancies. Spatial redundancies can be thought of as a block of pixels in a video frame bearing similarities with its neighboring blocks. Similarly, temporal redundancies can be thought of as a set frame bearing similarities with that of the frames that have arrived before and/or that will follow after the current frame of interest.

A picture or frame will belong to one of the I-picture, P-picture or B-picture categories. I – pictures or intra predicted frame is the one in which current frame is predicted without referring to any other frame. P – pictures and B – pictures are said to be inter-coded using motion-compensated prediction from a reference frame. P – pictures make use of a reference frame (the P – picture or I – picture preceding the current P - picture), whereas B – pictures make use two reference frames (the P – and/or I – pictures before and after the current frame). The difference between predicted frame and actual frame carries less information and is coded to achieve compression. The three types of frames are shown in figure 1-1[2].

Figure 1—1:I-, P- and B- frames [2]

1.2.Need for Video Compression

Is compression really necessary once transmission and storage capacities have increased to a sufficient level to cope with uncompressed video! It is true that both the storage and transmission capacities continue to increase.However, an efficient and well-designed video compression system gives very significant performance advantages for visual communication at both low and high transmission bandwidths.

1.3.Video Coding Standards

There have been several video coding standards introduced by organizations like the International Telecommunication Union - Telecommunication Standardization Sector (ITU-T), MovingPicture Experts Group (MPEG) and the Joint CollaborativeTeam on Video Coding (JCT-VC).Each standard is an improvement over the previous standard. With every standard, the general thumb of rule has been to retainthe same video quality by being able to reduce the bit rate by 50%.Figure 1-2 shows the evolution of video coding standards over the years.

Figure 1—2:Evolution of video coding standards [45]

1.4.Thesis Outline

Chapter 2 describes the overview of H.264 also known as MPEG 4 Part 10/AVC [4]. In a similar fashion, the overview of H.265 also known as High Efficiency Video Coding (HEVC) [35] is discussed in Chapter 3. Chapter 4 highlights the need for transcoding along with exploring different transcoding architectures and chooses one of them as preferred choice of transcoding scheme. Chapter 5 summarizes the results of proposed algorithm followed by Chapter 6 discussing about how well the proposed algorithms performed and what conclusions can be drawn from it along with future areas of research in the same direction.

Chapter 2Overview Of H.264

2.1.Introduction

H.264/MPEG4-Part 10 advanced video coding (AVC) was iintroduced in 2003 and was developedby the Joint Video Team (JVT), consisting of Video Coding Experts Group (VCEG) of International Telecommunication Union – Telecommunication standardization sector (ITU-T) andMoving Picture Experts Group (MPEG) of International Standards Organization/International ElectrotechnicalCommision (ISO/IEC)[4].

H.264 can support various interactive (video telephony) and non-interactive

applications (broadcast, streaming, storage, video on demand) as it facilitates a network friendlyvideo representation [7]. It leverages on the previous coding standards such as MPEG-1,MPEG-2, MPEG-4 part 2, H.261, H.262 and H.263 [6][8] and adds many other coding toolsand techniques which give it superior quality and compression efficiency.

Like any other previous motion-based codecs, it uses the following basic principles ofvideo compression [5]:

Transform for reduction of spatial correlation
Quantization for control of bitrate
Motion compensated prediction for reduction of temporal correlation
Entropy coding for reduction in statistical correlation.

The improved coding efficiency of H.264 can be attributed to the additional coding toolsand the new features. Listed below are some of the new and improved techniques used inH.264[7]:

Adaptive intra-picture prediction
Small block size transform with integer precision
Multiple reference pictures and generalized B-frames
Variable block sizes
Quarter pel precision for motion compensation
Content adaptive in-loop deblocking filter and
Improved entropy coding by introduction of context adaptive binary arithmeticcoding (CABAC) and context adaptive variable length coding (CAVLC)

The increase in the coding efficiency and increase in the compression ratio results to agreater complexity of the encoder and the decoder algorithms of H.264, as compared toprevious coding standards. In order to develop error resilience for transmission of information over the network, H.264 supports the following techniques [7]:

Flexible macroblock(MB) ordering
Switched slice
Arbitrary slice order
Redundant slice
Data partitioning
Parameter setting

2.2.Profiles and Levels in H.264

Profiles and levels specify conformance points for implementing the standard in an interoperable way across various applications that have similar functional requirements, whereas a level places constraints on certain key parameters of the bitstream, corresponding to decoder processing load and memory capabilities [13].

2.2.1.Profiles in H.264

A profile defines a set of coding tools or algorithms that can be used in generating a conforming bitstream[13].

The profiles defined for H.264 can be listed as follows [10]:

Baseline profile
Main profile
Extended profile
High Profiles defined in the FRExts amendment

Figure 2-1 illustrates the coding tools for the various profiles of H.264.

Figure 2—1: Different profiles in H.264 with distribution
of various coding tools among the profiles [6]

2.2.1.1.Baselineprofile

The list of tools included in the baseline profile are I (intra coded) and P (predictivecoded) slice coding, enhanced error resilience tools of flexible MB ordering, arbitraryslices and redundant slices. It also supports CAVLC. The baseline profile is intended to be used in low delay applications, applicationsdemanding low processing power and in high packet loss environments. This profile has theleast coding efficiency among all the three profiles.

2.2.1.2.Main profile

The coding tools included in the main profile are I, P, and B (bi-directionally predictioncoded) slices, interlace coding, CAVLC and CABAC. The tools not supported by main profile are error resilience tools, data partitioning andswitched intra (SI) coded and switched predictive (SP) coded slices. This profile is aimed toachieve highest possible coding efficiency.

2.2.1.3.Extended profile

This profile has all the tools included in the baseline profile. As illustrated in the figure 2-1, this profile also includes B, SP and SI slices, data partitioning, interlace frame and fieldcoding, picture adaptive frame/field coding and MB adaptive frame/field coding. This profileprovides better coding efficiency than baseline profile. The additional tools result in increasedcomplexity.

2.2.1.4.High profile

In September 2004 the first amendment of H.264/MPEG-4 AVC video coding standardwas released [10]. A new set of coding tools were introduced as a part of this amendment.These are termed as “Fidelity Range Extensions” (FRExts). The aim of releasing FRExts is tobe able to achieve significant improvement in coding efficiency for higher fidelity material. Theapplication areas for the FRExts tools are professional film production, video production andhigh-definition(HD) TV/DVD.

The FRExts amendment defines four new profiles. Discussion of those profiles is out of scope of this document, so skipped.

2.2.2.Levels in H.264

Level restrictions are established in terms of maximum sample rate, maximum picture size, maximum bit rate, minimum compression ratio and capacities of the decoded picture buffer (DPB), and the coded picture buffer (CPB) that holds compressed data prior to its decoding for data flow management purposes [13].

In H.264 /AVC, 16 levels are specified. The levels defined in H.264 are listed in Table 2-1. Thelevel ‘1b’ was added in the FRExts amendment.

Table 21 Levels in H.264 [44]

2.3.H.264 Encoder

Figure 2-2 illustrates theblock diagram for the H.264 encoder. H.264 encoder works onMB and motion-compensation like most other previous generation codecs. Video isformed by a series of picture frames. Each picture frame is an image which is split down intoblocks. The block sizes can vary in H.264.

Figure 2—2: H.264 Encoder block diagram [12]

The encoder may perform intra-coding or inter-coding for the MBs of a given picture. Intra coded frames are encoded and decodedindependently. They do not need any reference frames. Hence they provide access points tothe coded sequence where decoding can start.

. Figure 2-3 illustrates the nine prediction modes for 4×4 luma block [12].There are total of nine optional prediction modes for each 4×4 luma block, four modes for a 16×16 luma block and four modes for the chroma components.

Figure 2—3:Nine prediction modes for 4×4 Luma block [12]

Inter-coding uses inter-prediction of a given block fromsome previously decoded pictures. The aim to use inter-coding is to reduce the temporalredundancy by making use of motion vectors. Motion vectors give the direction of motion of aparticular block from the current frame to the next frame.

The prediction residuals are obtainedwhich then undergo transformation to remove spatial correlation in the block. The transformedcoefficients, thus obtained, undergo quantization. The motion vectors, obtained from inter-predictionare combined with the quantized transform coefficientinformation. They are then entropy encoded using schemessuch as CAVLCor CABACto reduce statistical redundancies[6].

There is a local decoder within the H.264 encoder. This local decoder performs theoperations of inverse quantization and inverse transform to obtain the residual signal in thespatial domain. The prediction signal is added to the residual signal to reconstruct the inputframe. This input frame is fed in the deblocking filter to remove blocking artifacts at the blockboundaries. The output of the deblocking filter is then fed to inter/intra prediction blocks togenerate prediction signals.

2.4.H.264 Decoder

The H.264 decoder works similar in operation to the local decoder of H.264 encoder.Figure 2-4 illustrates the H.264 decoder block diagram [12]. An encoded bitstream is the input to the decoder. Entropy decoding (CABAC or CAVLC) takesplace on the bitstream to obtain the transform coefficients. These coefficients are then inversescanned and inverse quantized. This gives residual block data in the transform domain. Inversetransform is performed to obtain the data in the spatial domain. The resulting output is 4x4 blocksof residual signal. Depending on inter-predicted or intra-predicted, an appropriate predictionsignal is added to the residual signal. For an inter-coded block, a prediction block is constructeddepending on the motion vectors, reference frames and previously decoded pictures. Thisprediction block is added to the residual block to reconstruct the video frames. Thesereconstructed frames then undergo deblocking before they are stored for future use forprediction or being displayed.