Under the Guidance of Dr. K. R. Rao

A project proposal on

Residual DPCM for improving Inter Prediction in HEVC for Lossless Screen Content Coding

Under the guidance of Dr. K. R. Rao

For the fulfillment of the course Multimedia Processing (EE5359)

Spring 2015

Submitted by

Siddu Basawaraj Pratapur

UTA ID: 1001053422

Email id:

Department of Electrical Engineering

The University of Texas at Arlington

Contents

1. Objective of the project 6

2. Basic Concepts of Video Coding 7

2.1 Color Spaces 7

3. H.265 / High Efficiency Video Coding 9

3.1 Introduction 9

3.2 Encoder and Decoder in HEVC 9

3.3 Features of HEVC 12

3.3.1 Coding tree units and coding tree block (CTB) structure: 13

3.3.2 Coding units (CUs) and coding blocks (CBs): 13

3.3.3 Prediction units and prediction blocks (PBs): 14

3.3.4 TUs and transform blocks: 15

3.3.5 Motion vector signalling: 15

3.3.6 Motion compensation: 15

3.3.7 Intra-picture prediction: 16

3.3.8 Quantization control: 17

3.3.9 Entropy coding: 17

3.3.10 In-loop de-blocking filtering: 18

3.3.11 Sample adaptive offset (SAO). 18

4. Introduction to Screen Content Coding 19

4.1 Introduction 19

4.2. Analysis of HEVC on Screen Content Coding 20

4.3 Angular Intra Prediction 20

4.4 Inter-Prediction 20

4.5 Loop Filters 21

5. Residual DPCM in HEVC inter-prediction 22

5.1 General considerations and the HEVC coding structure 22

5.2 General method for inter RDPCM 22

5.3 Additional tools for inter RDPCM 24

6. Test Configurations 27

6.1 Intra-only configuration 27

6.2 Low-delay configuration 27

6.3 Random-access configuration 28

7 . Comparison Metrics 30

7.1 Peak Signal to Noise Ratio 30

7.2 Bjontegaard Delta Bit-rate (BD-BR) and Bjontegaard Delta PSNR (BD-PSNR) 31

7.3 Implementation Complexity 31

8. Test Sequences 32

9. Implementation 34

9.1 Configuration profiles used for comparison 34

9.2 Parameters modified 34

9.3 Sample command line parameters for HM-16.4+SCM-4.0RC1 34

9.3.1 Encoding 34

9.3.2 Decoding 35

9.4 Testing Platform 35

9.5 Tabular columns for test sequence parameters 36

9.6 Graphs for test sequence parameters 37

9.6.1 Bit-rate 37

9.6.2 Size of the binary file 40

9.6.3 %BD Bit-rate 43

9.6.4 BD-PSNR 46

9.6.5 Encoding time 49

9.6.6 Decoding time 52

References…………………………………………………………………………...... 22

List of Acronyms and Abbreviations

AVC : Advanced Video Coding.

B-frame : Bi-predictive frame.

BD BR : Bjontegaard Bitrate.

BD PSNR : Bjontegaard Peak Signal to Noise Ratio.

CABAC: Context Adaptive Binary Arithmetic Coding

CTB: Coding Tree Block.

CTU: Coding Tree Unit.

CU: Coding Unit.

CB : Coding Block

DCT : Discrete Cosine Transform.

DBF: De-blocking Filter.

HEVC: High Efficiency Video Coding.

HM: HEVC Test Model.

HP : Hierarchical Prediction.

I-frame : Intra-coded frame.

JCT: Joint Collaborative Team.

JCT-VC: Joint Collaborative Team on Video Coding.

JM: H.264 Test Model.

JPEG: Joint Photographic Experts Group.

MV : Motion Vector.

MC: Motion Compensation.

ME: Motion Estimation.

MPEG: Motion Picture Experts Group.

P-frame : Predicted frame.

PC : Prediction Chunking.

PU : Prediction Unit.

PB: Prediction Block.

PSNR : Peak Signal to Noise Ratio.

QP: Quantization Parameter.

RDPCM : Residual Differential Pulse code Modulation.

SAO: Sample Adaptive Offset.

TB: Transform Block.

TU: Transform Unit.

VCEG: Visual Coding Experts Group.

Abstract

In this project, RDPCM is applied to inter predicted residuals and tested in the context of the HEVC range extension development [8]. Video content containing computer generated objects is usually denoted as screen content and is becoming popular in applications such as desktop sharing, wireless displays, etc. Screen content images and videos are characterized by high frequency details such as sharp edges and high contrast image areas. On these areas classical lossy encoding tools – spatial transform plus quantization – may significantly compromise their quality and intelligibility. Therefore, lossless coding is used instead and improved coding tools should be specifically devised for screen content. The proposed method exploits the spatial correlation present in blocks containing edges or text areas which are poorly predicted by motion compensation. When compared to HEVC lossless coding as specified in Version 1 of the standard, the proposed algorithm is expected to achieve up to 8% average Bit-rate reduction while not increasing the overall decoding complexity [1].

1. Objective of the project

The objective of this project is to introduce inter Residual Differential Pulse Code Modulation (inter RDPCM) applied to motion compensated residuals in lossless screen content coding (SCC) scenarios. The novelty brought by this paper is twofold: first, the proposed inter RDPCM is applied to the HEVC standard at three different levels of granularity, namely the coding unit (CU), prediction unit (PU) or transform unit (TU) level. In particular, three DPCM prediction modes (vertical, horizontal or no DPCM) are considered independently. Second, two additional tools are proposed for inter RDPCM: prediction Chunking (PC) and Hierarchical Prediction (HP). PC can be used to improve the overall throughput, thus decreasing complexity. On the other hand, HP can be used to improve the compression efficiency of the proposed inter RDPCM method.

It uses inter RDPCM coding tool to improve inter prediction in lossless screen content coding. Moreover, two other tools Program Chunking (PC) [1] and Hierarchical Prediction (HP) [1] for reducing the complexity or increasing the compression efficiency have also been proposed.

The simulation will be conducted using HM 16.4 software [18], with different video sequences [3], search range, block sizes and number of frames using GPU multi-core computing.

2. Basic Concepts of Video Coding

2.1 Color Spaces

The common color spaces for digital image and video representation are:

· RGB color space – Each pixel is represented by three numbers indicating the relative proportions of red, green and blue colors

· YCrCb color space – Y is the luminance component, a monochrome version of color image. Y is a weighted average of R, G and B:

k r R + k g G

+ kb B

, where k are the weighting factors.

The color information is represented as color differences or chrominance components, where each chrominance component is difference between R, G or B and the luminance Y.

As the human visual system is less sensitive to color than the luminance component, YCrCb has advantages over RGB space. The amount of data required to represent the chrominance component reduces without impairing the visual quality [4].

The popular patterns of sub-sampling [4] are:

· 4:4:4 – The three components YCrCb have the same resolution, which is for every 4 luminance samples there are 4 Cr and 4 Cb samples.

· 4:2:2 – For every 4 luminance samples in the horizontal direction, there are 2 Cr and 2 Cb samples. This representation is used for high quality video color reproduction.

· 4:2:0 – The Cr and Cb each have half the horizontal and vertical resolution of Y. This is popularly used in applications such as video conferencing, digital television and DVD storage.

Figure 1: 4:2:0 sub-sampling pattern [4]

Figure 2: 4:2:2 sub-sampling and 4:4:4 sampling patterns [4]

3. H.265 / High Efficiency Video Coding

3.1 Introduction

High Efficiency Video Coding (HEVC) [5] is an international standard for video compression developed by a working group of ISO/IEC MPEG (Moving Picture Experts Group) and ITU-T VCEG (Video Coding Experts Group). The main goal of HEVC standard is to significantly improve compression performance compared to existing standards (such as H.264/Advanced Video Coding [6]) in the range of 50% bit rate reduction at similar visual quality [7].

HEVC is designed to address existing applications of H.264/MPEG-4 AVC and to focus on two key issues: increased video resolution and increased use of parallel processing architectures [7]. It primarily targets consumer applications as pixel formats are limited to 4:2:0 8-bit and 4:2:0 10-bit. The next revision of the standard, will enable new use-cases with the support of additional pixel formats such as 4:2:2 and 4:4:4 and bit depth higher than 10-bit [8], embedded bit-stream scalability and 3D video [9].

3.2 Encoder and Decoder in HEVC

Source video, consisting of a sequence of video frames, is encoded or compressed by a video encoder to create a compressed video bit stream. The compressed bit stream is stored or transmitted. A video decoder decompresses the bit stream to create a sequence of decoded frames [10].

The video encoder performs the following steps:

· Partitioning each picture into multiple units

· Predicting each unit using inter or intra prediction, and subtracting the prediction from the unit

· Transforming and quantizing the residual (the difference between the original picture unit and the prediction)

· Entropy encoding transform output, prediction information, mode information and headers

The video decoder performs the following steps:

· Entropy decoding and extracting the elements of the coded sequence

· Rescaling and inverting the transform stage

· Predicting each unit and adding the prediction to the output of the inverse transform

· Reconstructing a decoded video image

The Figure 3 represents the block diagram of HEVC CODEC [10] :

Figure 3: Block Diagram of HEVC CODEC [10]

The Figure 4 [6] and Figure 5 [11] represent the detailed block diagrams of HEVC encoder and decoder respectively:

Figure 4: Block Diagram of HEVC Encoder [6]

Figure 5: Block Diagram of HEVC Decoder [11]

3.3 Features of HEVC

The video coding layer of HEVC employs the same hybrid approach (inter-/intra-picture prediction and 2-D transform coding) used in all video compression standards. Figure. 1 depicts the block diagram of a hybrid video encoder, which can create a bitstream conforming to the HEVC standard. Figure.5 shows the HEVC decoder block diagram. An encoding algorithm producing an HEVC compliant bitstream would typically proceed as follows. Each picture is split into block-shaped regions, with the exact block partitioning being conveyed to the decoder. The first picture of a video sequence (and the first picture at each clean random access point in a video sequence) is coded using only intra-picture prediction (that uses prediction of data spatially from region-to-region within the same picture, but has no dependence on other pictures). For all remaining pictures of a sequence or between random access points, inter-picture temporally predictive coding modes are typically used for most blocks.

The encoding process for inter-picture prediction consists of choosing motion data comprising the selected reference picture and Motion Vector (MV) to be applied for predicting the samples of each block. The encoder and decoder generate identical inter-picture prediction signals by applying motion compensation (MC) using the MV and mode decision data, which are transmitted as side information. The residual signal of the intra- or inter-picture prediction, which is the difference between the original block and its prediction, is transformed by a linear spatial transform. The transform coefficients are then scaled, quantized, entropy coded, and transmitted together with the prediction information. The encoder duplicates the decoder processing loop (see gray-shaded boxes in Figure.4) such that both will generate identical predictions for subsequent data. Therefore, the quantized transform coefficients are constructed by inverse scaling and are then inverse transformed to duplicate the decoded approximation of the residual signal. The residual is then added to the prediction, and the result of that addition may then be fed into one or two loop filters to smooth out artifacts induced by block-wise processing and quantization.

The final picture representation (that is a duplicate of the output of the decoder) is stored in a decoded picture buffer to be used for the prediction of subsequent pictures. In general, the order of encoding or decoding processing of pictures often differs from the order in which they arrive from the source; necessitating a distinction between the decoding order (i.e., bitstream order) and the output order (i.e., display order) for a decoder. Video material to be encoded by HEVC is generally expected to be input as progressive scan imagery (either due to the source video originating in that format or resulting from de-interlacing prior to encoding). No explicit coding features are present in the HEVC design to support the use of interlaced scanning, as interlaced scanning is no longer used for displays and is becoming substantially less common for distribution. However, a metadata syntax has been provided in HEVC to allow an encoder to indicate that interlace-scanned video has been sent by coding each field (i.e., the even or odd numbered lines of each video frame) of interlaced video as a separate picture or that it has been sent by coding each interlaced frame as an HEVC coded picture. This provides an efficient method of coding interlaced video without burdening decoders with a need to support a special decoding process for it. In the following, the various features involved in hybrid video coding using HEVC are highlighted as follows.

3.3.1 Coding tree units and coding tree block (CTB) structure: The core of the coding layer in previous standards was the macroblock, containing a 16×16 block of luma samples and, in the usual case of 4:2:0 color sampling, two corresponding 8×8 blocks of croma samples; whereas the analogous structure in HEVC is the coding tree unit (CTU), which has a size selected by the encoder and can be larger than a traditional macroblock. The CTU consists of a luma CTB and the corresponding croma CTBs and syntax elements. The size L×L of a luma CTB can be chosen as L = 16, 32, or 64 samples, with the larger sizes typically enabling better compression. HEVC then supports a partitioning of the CTBs into smaller blocks using a tree structure and quad tree-like signalling [19]. The partitioning of CTBs into CBs ranging from 64*64 down to 8*8 is shown in Figure.6.

Figure 6: 64*64 CTBs split into CBs [13]

3.3.2 Coding units (CUs) and coding blocks (CBs): The quad tree syntax of the CTU specifies the size and positions of its luma and croma CBs. The root of the quadtree is associated with the CTU. Hence, the size of the luma CTB is the largest supported size for a luma CB. The splitting of a CTU into luma and croma CBs is signaled jointly. One luma CB and ordinarily two croma CBs, together with associated syntax, form a coding unit (CU) as shown in Figure.7. A CTB may contain only one CU or may be split to form multiple CUs, and each CU has an associated partitioning into prediction units (PUs) and a tree of transform units (TUs).