FINAL REPORT ON
Topic: Advanced Video Coding
(Comparison of HEVC with H.264 and H.264 with MPEG-2)
A PROJECT UNDER THE GUIDANCE OF DR. K. R. RAO COURSE: EE5359 - MULTIMEDIA PROCESSING, SPRING 2015
SUBMITTED BY JAGRITI DHINGRA
UT ARLINGTON ID: 1001103750
EMAIL ID:
DEPARTMENT OF ELECTRICAL ENGINEERING UNIVERSITY OF TEXAS, ARLINGTON
TABLE OF ACRONYMS
ATSC: Advanced Television Systems Committee AVC: Advanced Video Coding. BD-BR: Bjontegaard Delta Bitrate. BD-PSNR: Bjontegaard Delta Peak Signal to Noise Ratio. CABAC: Context Adaptive Binary Arithmetic Coding. CTB: Coding Tree Block. CTU: Coding Tree Unit. CU: Coding Unit. DBF: De-blocking Filter. DCT: Discrete Cosine Transform. DVB: Digital Video Broadcast HEVC: High Efficiency Video Coding. HM: HEVC Test Model. ICME: International Conference on Multimedia and Expo IEC: International Electro-technical Commission. ISDB: Integrated Services Digital Broadcasting ISO: International Organization for Standardization.
ITU-T: International Telecommunication Union- Telecommunication Standardization Sector.
JCT: Joint Collaborative Team.
JCT-VC: Joint Collaborative Team on Video Coding.
JM: H.264 Test Model.
JPEG: Joint Photographic Experts Group.
MC: Motion Compensation.
ME: Motion Estimation.
MPEG: Moving Picture Experts Group.
MSE: Mean Square Error.
PB: Prediction Block.
PSNR: Peak Signal to Noise Ratio.
QP: Quantization Parameter
SAO: Sample Adaptive Offset.
SSIM: Structural Similarity Index.
TB: Transform Block.
TU: Transform Unit.
VCEG: Visual Coding Experts Group.
ADVANCED VIDEO CODING AND ITS COMPARISON WITH OTHER VIDEO CODING STANDARDS
Objective:
It is proposed to study about advanced video coding (AVC) that is H.264. Coding simulations will be performed on various sets of test images. This paper contains every basic detail of H.264 such as its features, applications, its versions,
The paper also discusses the comparison of H.264 with other video coding standards like HEVC and MPEG-2Standards.
The main objectives of the H.264/AVC standard are focused on coding efficiency, architecture, and functionalities. More specifically, an important objective was the achievement of a substantial increase (roughly a doubling) of coding efficiency over MPEG-2 Video for high delay applications and over H.263 version 2 for low-delay applications, while keeping implementation costs within an acceptable range. Doubling coding efficiency corresponds to halving the bit rate necessary to represent video content with a given level of perceptual picture quality. It also corresponds to doubling the number of channels of video content of a given quality within a given limited bit-rate delivery system such as a broadcast network. The architecture-related objective was to give the design a “network-friendly” structure, including enhanced error/loss robustness capabilities, in particular, which could address applications requiring transmission over various networks under various delay and loss conditions.[3]
Introduction:
H.264orMPEG-4 Part 10, Advanced Video Coding(MPEG-4 AVC) is avideo compression formatthat is currently one of the most commonly used formats for the recording, compression, and distribution of video content.
H.264 technology aims to provide good video quality at considerably low bit rates, at reasonable level of complexity while providing flexibility to wide range of applications.[1]
The MPEG-2 video coding standard (also known as ITU-T H.262) [2], which was developed about ten years ago primarily as an extension of prior MPEG-1 video capability with support of interlaced video coding, was an enabling technology for digital television systems worldwide. It is widely used for the transmission of standard definition (SD) and high definition (HD) TV signals over satellite, cable, and terrestrial emission and the storage of high-quality SD video signals onto DVDs.
H.264/AVC has achieved a significant improvement in compression performance compared to prior standards, and it provides a network-friendly representation of the video that addresses both conversational (video telephony) and non-conversational (storage, broadcast, or streaming) applications.[3]
H.264 is typically used forlossy compressionin the strict mathematical sense, although the amount of loss may sometimes be imperceptible. It is also possible to create trulylossless encodingsusing it — e.g., to have localized lossless-coded regions within lossy-coded pictures or to support rare use cases for which the entire encoding is lossless.
H.264/MPEG-4 AVC is a block-orientedmotion-compensation-based video compression standard developed by theITU-TVideo Coding Experts Group(VCEG) together with theISO/IEC JTC1Moving Picture Experts Group(MPEG). The project partnership effort is known as the Joint Video Team (JVT). The ITU-TH.264standard and the ISO/IECMPEG-4AVCstandard (formally, ISO/IEC14496-10 –MPEG-4Part 10, Advanced Video Coding) are jointly maintained so that they have identical technical content.
H.264 is perhaps best known as being one of the video encoding standards forBlu-ray Discs; all Blu-ray Disc players must be able to decode H.264. It is also widely used by streaming internet sources, such as videos fromVimeo,YouTube, and theiTunes Store, web software such as theAdobe Flash PlayerandMicrosoft Silverlight, and also various HDTV broadcasts over terrestrial (ATSC,ISDB-T,DVB-TorDVB-T2), cable (DVB-C), and satellite (DVB-SandDVB-S2).[4]
Overview of the H.264/AVC Video Coding Standard
The intent of the H.264/AVC project was to create a standard capable of providing good video quality at substantially lower bit rates than previous standards (i.e., half or less the bit rate ofMPEG-2,H.263, orMPEG-4 Part 2), without increasing the complexity of design so much that it would be impractical or excessively expensive to implement. An additional goal was to provide enough flexibility to allow the standard to be applied to a wide variety of applications on a wide variety of networks and systems, including low and high bit rates, low and high resolution video,broadcast,DVDstorage,RTP/IPpacket networks, andITU-Tmultimediatelephonysystems.[4]
Video coding for telecommunication applications has evolved through the development of the ITU-T H.261, H.262 (MPEG-2), and H.263 video coding standards (and later enhancements of H.263 known as H.263+ and H.263++ and has diversified from ISDN and T1/E1 service to embrace PSTN, mobile wireless networks, and LAN/Internet network delivery. Throughout this evolution, continued efforts have been made to maximize coding efficiency while dealing with the diversification of network types and their characteristic formatting and loss/error robustness requirements.[5]
In December of 2001, VCEG and the Moving Picture Experts Group (MPEG) ISO/IEC JTC 1/SC 29/WG 11 formed a Joint Video Team (JVT), with the charter to finalize the draft new video coding standard for formal approval submission as H.264/AVC [1] in March 2003.
The scope of the standardization is illustrated in Fig. 1, which shows the typical video coding/decoding chain (excluding the transport or storage of the video signal).
Fig. 1. Scope of video coding standardization.[5]
The standardization of the first version of H.264/AVC was completed in May 2003. In the first project to extend the original standard, the JVT then developed what was called the Fidelity Range Extensions (FRExt). These extensions enabled higher quality video coding by supporting increased sample bit depth precision and higher-resolution color information, including sampling structures known as Y'CbCr 4:2:2 (=YUV 4:2:2) and Y'CbCr 4:4:4. Several other features were also included in the Fidelity Range Extensions project, such as adaptive switching between 4×4 and 8×8 integer transforms, encoder-specified perceptual-based quantization weighting matrices, efficient inter-picture lossless coding, and support of additional color spaces. The design work on the Fidelity Range Extensions was completed in July 2004, and the drafting work on them was completed in September 2004.[4]
Further recent extensions of the standard then included adding five other new profiles intended primarily for professional applications, adding extended-gamut color space support, defining additional aspect ratio indicators, defining two additional types of "supplemental enhancement information" (post-filter hint and tone mapping), and deprecating one of the prior FRExt profiles that industry feedback indicated should have been designed differently.
The next major feature added to the standard wasScalable Video Coding(SVC). Specified in Annex G of H.264/AVC, SVC allows the construction of bitstreams that contain sub-bitstreamsthat also conform to the standard, including one such bitstream known as the "base layer" that can be decoded by a H.264/AVCcodecthat does not support SVC..
The next major feature added to the standard wasMultiview Video Coding(MVC). Specified in Annex H of H.264/AVC, MVC enables the construction of bitstreams that represent more than one view of a video scene. An important example of this functionality is stereoscopic 3Dvideo coding.[4]
Applications:
- The H.264 video format has a very broad application range that covers all forms of digital compressed video from low bit-rate Internet streaming applications to HDTV broadcast and Digital Cinema applications with nearly lossless coding. With the use of H.264, bit rate savings of 50% or more are reported. For example, H.264 has been reported to give the same Digital Satellite TV quality as current MPEG-2 implementations with less than half the bitrate, with current MPEG-2 implementations working at around 3.5 Mbit/s and H.264 at only 1.5 Mbit/s.[6]
- To ensure compatibility and problem-free adoption of H.264/AVC, many standards bodies have amended or added to their video-related standards so that users of these standards can employ H.264/AVC.
- The Digital Video Broadcast project (DVB) approved the use of H.264/AVC for broadcast television in late 2004.
- TheAdvanced Television Systems Committee(ATSC) standards body in the United States approved the use of H.264/AVC for broadcast television in July 2008, although the standard is not yet used for fixed ATSC broadcasts within the United States.[7][8] It has also been approved for use with the more recentATSC-M/H(Mobile/Handheld) standard, using the AVC and SVC portions of H.264.[9]
- Conversational services over ISDN, Ethernet, LAN, DSL, wireless and mobile networks, modems, etc. or mixtures of these.
- Multimedia messaging services (MMS) over ISDN, DSL, ethernet, LAN, wireless and mobile networks, etc.
- Video-on-demand or multimedia streaming services over ISDN, cable modem, DSL, LAN, wireless networks, etc.[5]
- TheCCTV(Closed Circuit TV) andVideo Surveillancemarkets have included the technology in many products.
- CanonandNikonDSLRsuse H.264 video wrapped in QuickTime MOV containers as the native recording format.
- AVCHDis a high-definition recording format designed bySonyandPanasonicthat uses H.264 (conforming to H.264 while adding additional application-specific features and constraints).
- AVC-Intrais anintraframe-only compression format, developed byPanasonic.
Design Feature Highlights:
To address the need for flexibility and customizability, the H.264/AVC design covers a VCL, which is designed to efficiently represent the video content, and a NAL, which formats the VCL representation of the video and provides header information in a manner appropriate for conveyance by a variety of transport layers or storage media.
Fig. 2. Structure of H.264/AVC video encoder[5]
Some highlighted features of the design that enable enhanced coding efficiency include the following enhancements of the ability to predict the values of the content of a picture to be encoded.
• Variable block-size motion compensation with small block sizes: This standard supports more flexibility in the selection of motion compensation block sizes and shapes than any previous standard, with a minimum luma motion compensation block size as small as 4× 4.
• Quarter-sample-accurate motion compensation: Most prior standards enable half-sample motion vector accuracy at most. The new design improves up on this by adding quarter-sample motion vector accuracy, as first found in an advanced profile of the MPEG-4 Visual (part 2) standard, but further reduces the complexity of the interpolation processing compared to the prior design.
• Motion vectors over picture boundaries: While motion vectors in MPEG-2 and its predecessors were required to point only to areas within the previously-decoded reference picture, the picture boundary extrapolation technique first found as an optional feature in H.263 is included in H.264/AVC.
• In-the-loop deblocking filtering: Block-based video coding produces artifacts known as blocking artifacts. These can originate from both the prediction and residual difference coding stages of the decoding process. Application of an adaptive deblocking filter is a well-known method of improving the resulting video quality, and when designed well, this can improve both objective and subjective video quality. Building further on a concept from an optional feature of H.263 , the deblocking filter in the H.264/AVC design is brought within the motion-compensated prediction loop, so that this improvement in quality can be used in inter-picture prediction to improve the ability to predict other pictures as well.
NAL
The NAL is designed in order to provide “network friendliness” to enable simple and effective customization of the use of the VCL for a broad variety of systems. The NAL facilitates the ability to map H.264/AVC VCL data to transport layers such as:
• RTP/IP for any kind of real-time wire-line and wireless Internet services (conversational and streaming);
• File formats, e.g., ISO MP4 for storage and MMS;
• H.32X for wireline and wireless conversational services;
• MPEG-2 systems for broadcasting services, etc.
The full degree of customization of the video content to fit the needs of each particular application is outside the scope of the H.264/AVC standardization effort, but the design of the NAL anticipates a variety of such mappings. Some key concepts of the NAL are NAL units, byte stream, and packet format uses of NAL units, parameter sets, and access units.[5]
- NAL Units
The coded video data is organized into NAL units, each of which is effectively a packet that contains an integer number of bytes. The first byte of each NAL unit is a header byte that contains an indication of the type of data in the NAL unit, and the remaining bytes contain payload data of the type indicated by the header. The payload data in the NAL unit is interleaved as necessary with emulation prevention bytes, which are bytes inserted with a specific value to prevent a particular pattern of data called a start code prefix from being accidentally generated inside the payload. The NAL unit structure definition specifies a generic format for use in both packet-oriented and bitstream-oriented transport systems, and a series of NAL units generated by an encoder is referred to as a NAL unit stream.
- NAL Units in Byte-Stream Format Use
Some systems (e.g., H.320 and MPEG-2/H.222.0 systems) require delivery of the entire or partial NAL unit stream as an ordered stream of bytes or bits within which the locations of NAL unit boundaries need to be identifiable from patterns within the coded data itself. For use in such systems, the H.264/AVC specification defines a byte stream format. In the byte stream format, each NAL unit is prefixed by a specific pattern of three bytes called a start code prefix. The boundaries of the NAL unit can then be identified by searching the coded data for the unique start code prefix pattern. The use of emulation prevention bytes guarantees that start code prefixes are unique identifiers of the start of a new NAL unit.
- NAL Units in Packet-Transport System Use
In other systems (e.g., internet protocol/RTP systems), the coded data is carried in packets that are framed by the system transport protocol, and identification of the boundaries of NAL units within the packets can be established without use of start code prefix patterns. In such systems, the inclusion of start code prefixes in the data would be a waste of data carrying capacity, so instead the NAL units can be carried in data packets without start code prefixes.
- VCL and Non-VCL NAL Units
NAL units are classified into VCL and non-VCL NAL units. The VCL NAL units contain the data that represents the values of the samples in the video pictures, and the non-VCL NAL units contain any associated additional information such as parameter sets (important header data that can apply to a large number of VCL NAL units) and supplemental enhancement information (timing information and other supplemental data that may enhance usability of the decoded video signal but are not necessary for decoding the values of the samples in the video pictures).
- Parameter Sets
A parameter set is supposed to contain information that is expected to rarely change and offers the decoding of a large number of VCL NAL units. There are two types of parameter sets:
• sequence parameter sets, which apply to a series of consecutive coded video pictures called a coded video sequence;
• picture parameter sets, which apply to the decoding of one or more individual pictures within a coded video sequence.
. In other applications (Fig. 3), it can be advantageous to convey the parameter sets “out-of-band” using a more reliable transport mechanism than the video channel itself.