EE5359 Project Report Sumedha Phatak
Project REPORT
EE5359– Multimedia processing
(Spring 2012)
Study and Implementation of Video Compression Standards
(H.264/AVC and Dirac)
Submitted By:
Sumedha Phatak
UTA ID: 1000731131
Objective:
A study, implementation and comparison of the baseline profiles of H.264/AVC and Dirac is carried out based on factors like video quality, bit rates, compression ratio, complexity and performance analysis. Different test video sequences are used to compare these two standards on quality parameters like SSIM [13], MSE [13] and PSNR [13] at various bit rates.
Important Video Quality Measurement Terms:
Structural similarity metric (SSIM) [13]: This index is a method for measuring the similarity between two frames. It is a full reference metric, or in other words, the measuring of image quality is done using an initial uncompressed or distortion-free frame as reference.
SSIM is designed to improve on methods likepeak signal-to-noise ratio(PSNR) andmean squared error(MSE), which have proved to be inconsistent with human eye perception.
SSIM considers image degradation asperceived change in structural information. Structural information is the idea that the pixels have strong inter-dependencies especially when they are spatially close.
Where
x and y correspond to two different signals that need to be compared for similarity, i.e. two different blocks in two separate images;
Mean squared error (MSE) [13]: The MSE represents the cumulative squared error between the compressed and the original image, whereas PSNR represents a measure of the peak error. The lower the value of MSE, lower is the error. The MSE is computed by averaging the squared intensity differences of the distorted and reference image/frame pixels. Two distorted images with the same MSE may have very different types of errors, some of which are much more visible than others.
Peak signal-to-noise ratio (PSNR) [13]: The PSNR block computes the peak signal-to-noise ratio, in decibels, between two images. This ratio is often used as a quality measurement between the original and a compressed image. The higher the PSNR, the better the quality of the compressed, or reconstructed image.The PSNR is most commonly used as a measure of quality of reconstruction of compression codecs. The signal in this case is the maximum value of the pixels and the noise is the error introduced by compression.
In the previous equation,Ris the maximum fluctuation in the input image data type.
Introduction:
Digital video compression techniques have played an important role in the world of telecommunication and multimedia systems where bandwidth is still a valuable commodity. [1]
In general, data compression or video/image compression meansbit-rate reductionand it involvesencodinginformationusing fewerbitsthan the original representation. Compression can be either lossy or lossless. [1] The former reduces bits by eliminatingstatistical redundancy and no information is lost in this type of compression whereas, the latter by identifying and removing marginally important information. [9]
Most of the video compression techniques are lossy in nature. And in this type of compression, there is atradeoffbetween video qualities, cost of processing the compression and decompression, and system requirements. [1]
Video compression uses modern coding techniques to reduce redundancy in video data. Mostvideo compression algorithmsandcodecs [1]combine spatialimage compressionand temporalmotion compensation. [9]
Today, nearly all commonly used video compression methods (e.g., those in standards developed by theITU-TorISO) [3] apply adiscrete cosine transform(DCT) [20] for spatial redundancy reduction. Other methods, such asfractal compression,matching pursuitand the use of adiscrete wavelet transform(DWT) [11] have been the subject of some research, but are typically not used in practical products (except for the use of wavelet coding as still-image coders without motion compensation).[18]
Table 1 [1] shows the evolution of the various video compression standards.
Table 1: History of video compression standards [1]
Figure 1[2] shows an easier graphical presentation separating the ITU-T and ISO based [15] compression standards:
Figure 1: Evolution of video compression standards [2]
H.264/AVC [3], Dirac [6] and AVS China [3] are among the latest video coding standards by ISO/ITU-T/IEC, ISO/BBC and China standards organization [3].
H.264 [3]
H.264/MPEG-4 Part 10orAVC(advanced video coding) is a standard forvideo compression [3], and is currently one of the most commonly used formats for the recording, compression, and distribution ofhigh definition video. This was mainly intended to create a standard capable of providing good video quality at substantially lower bit rates than previous standards (i.e., half or less the bit rate ofMPEG-2,H.263, orMPEG-4 Part 2) [4], without significantly increasing the complexity of design.[3]
H.264 Profiles and levels [6]
Figure 2 shows the H.264 profiles and levels.
Figure 2 [3]: H.264 profiles and levels [6]
Baseline profile [3]:
Apart from the common features, baseline profile consists of some error resilience tools such as flexible macro block order, arbitrary slice order and redundant slices. It was designed for low delay applications, as well as for applications that run on platforms with low processing power and in high packet loss environment. Among the three profiles, it offers the least coding efficiency. The baseline profile
Extended profile [3]:
The Extended profile is a superset of the Baseline profile [3]. Besides tools of the baseline profile it includes B-, SP- and SI-slices, data partitioning, and interlace coding tools [21]. SP and SI are specifically coded P and I slices respectively which are used for efficient switching between different bitrates in some streaming applications. It however does not include context adaptive binary arithmetic coding [7]. It is thus more complex but also provides better coding efficiency. Its intended applications were streaming video over internet.
Main profile [6]:
Other than the common features main profile includes tools such as CABAC for entropy coding, B-slices. It does not include any error resilience tools such as FMO [17]. Main profile is used in Broadcast television and high resolution video storage and playback. It also contains interlaced coding tools like extended profile [3].
High profile [6]:
High profiles are the superset of main profile [7]. It also includes additional tools such as adaptive transform block size, quantization scaling matrices. High profiles are used for applications such as content-contribution, content-distribution, and studio editing and post-processing [17].
H.264 architecture, encoder and decoder block diagrams [3]
Figure 3 [2] shows the H.264 architecture.
Figure 3: H.264 architecture [2]
Figures 4 [4] and 5 [4] show the block diagrams for the H.264 encoder and H.264 decoder respectively.
Figure 4: H.264 encoder block diagram [4]
Figure 5: H.264 decoder block diagram [4]
Advantages of H.264:
High video quality at low and high bit rates.H.264 is error resilient and can deal with packet losses in packet networks and also bit errors in error-prone wireless networks. Wide areas of application streaming mobile TV, HDTV over IP, extended PVR and storage options for the home user
Dirac [6]
It is an open and free video compression format developed by BBC research. It is mainly intended to provide high quality video compression for applications like Ultra HDTV. It mainly competes with existing standards like H.264 [5] and VC-1 [12].
Dirac [6] is a hybrid video codec because it involves both transform and motion compensation. Motion compensation is used to remove any temporal redundancy in data and transform is used to remove the spatial redundancy. [10] Dirac uses modern techniques like, wavelet transform and arithmetic coding for entropy coding [16]. The image motion is tracked and the motion information is used to make a prediction of a later frame. A wavelet transform [16] is applied to the predicted frame and the transform coefficients are quantized and entropy coded. [10] The applications of Dirac range from high definition television (HDTV) to web streaming due to its flexibility. Dirac compresses pictures from low resolution of 176 × 144 pixels (QCIF) to 1920 × 1080 (HDTV). However, Dirac promises improvements in quality and significant amount of savings in data rate over other codecs like H.264/VC-1. [6]
Figures 6 [5] and 7 [5] show the Dirac encoder and decoder block diagrams respectively.
Figure 6: Dirac encoder block diagram [5]
Figure 7: Dirac decoder block diagram [5]
Dirac uses wavelet transform on the entire picture at once providing flexibility to operate at several resolution ranges. When the transform is applied, the wavelet filters split the signal into 4 frequency sub-bands namely LL (Low-Low), LH (Low-High), HL (High-Low) and HH (High-High). For our sequence the filter is applied both horizontally and vertically. Since, LL sub-band consists of most significant information, for further stages the LL is decomposed and the rest can be discarded. This decomposition is carried out up to 4 stages. The discrete wavelet transform retains the finer details though data is roughly de-correlated in a frequency-sensitive manner. The choice of filters having compact impulse responses to reduce ringing artifacts caused by wavelets is essential. So, Daubechies wavelet filters are used to transform and divide the data in sub-bands which then are quantized with the corresponding RDO (rate distortion optimization) parameters and then variable length encoded. At the decoder these stages are reversed.
H.264 implementation using JM 17.2 [22] software:
JM 17.2 [22] software is being used for H.264 implementation. After downloading the software and installing it, the solution can be built and run using Microsoft Visual Studio. This will generate lencod and ldecod and executable files. The lencod/ldecod file should then be run from the command prompt to get the various output metrics like PSNR, SSIM, MSE and bitrate.
The input video, number of frames to be encoded, frame rate, quantization parameter and the H.264 profile to be used can be changed in the config file inside the folder called ‘bin’.
As the implementation is being done on baseline profile, the encoder_baseline.cfg file is used here. Necessary parameters should be changed along with the required input file and destination to get the output at desired parameters.
Encoder configuration for foreman_qcif.yuv:
##############################################################################
# Files
##############################################################################
InputFile = "foreman_qcif.yuv" # Input sequence
InputHeaderLength = 0 # If the inputfile has a header, state it's length in byte here
StartFrame = 0 # Start frame for encoding. (0-N)
FramesToBeEncoded = 3 # Number of frames to be coded
FrameRate = 100 # Frame Rate per second (0.1-100.0)
SourceWidth = 176 # Source frame width
SourceHeight = 144 # Source frame height
SourceResize = 0 # Resize source size for output
OutputWidth = 176 # Output frame width
OutputHeight = 144 # Output frame height
TraceFile = "trace_enc.txt" # Trace file
ReconFile = "test_rec_foreman.yuv" # Recontruction YUV file
OutputFile = "test_foreman.264" # Bitstream
StatsFile = "stats.dat" # Coding statistics file
##############################################################################
# Encoder Control
##############################################################################
ProfileIDC = 66 # Profile IDC (66=baseline, 77=main, 88=extended; FREXT Profiles: 100=High, 110=High 10, 122=High 4:2:2, 244=High 4:4:4, 44=CAVLC 4:4:4 Intra)
IntraProfile = 0 # Activate Intra Profile for FRExt (0: false, 1: true)
# (e.g. ProfileIDC=110, IntraProfile=1 => High 10 Intra Profile)
LevelIDC = 40 # Level IDC (e.g. 20 = level 2.0)
IntraPeriod = 0 # Period of I-pictures (0=only first)
IDRPeriod = 0 # Period of IDR pictures (0=only first)
AdaptiveIntraPeriod = 1 # Adaptive intra period
AdaptiveIDRPeriod = 0 # Adaptive IDR period
IntraDelay = 0 # Intra (IDR) picture delay (i.e. coding structure of PPIPPP... )
EnableIDRGOP = 0 # Support for IDR closed GOPs (0: disabled, 1: enabled)
EnableOpenGOP = 0 # Support for open GOPs (0: disabled, 1: enabled)
QPISlice = 50 # Quant. param for I Slices (0-51)
QPPSlice = 50 # Quant. param for P Slices (0-51)
FrameSkip = 0 # Number of frames to be skipped in input (e.g 2 will code every third frame).
# Note that this now excludes intermediate (i.e. B) coded pictures
ChromaQPOffset = 0 # Chroma QP offset (-51..51)
DisableSubpelME = 0 # Disable Subpixel Motion Estimation (0=off/default, 1=on)
SearchRange = 32 # Max search range
Figure 8 shows the snapshot of the command prompt after running the executable file generated by running the JM software [22].
Figure 8: Command prompt snapshot
YUV File: news_qcif.yuv
Figure 9: news_qcif.yuv original file
QP =0 QP = 25
QP = 50
Figure 10: Video quality at various QP values
Results:
QCIF sequence: news_qcif.yuv
Height: 176, Width: 144
Total no. of frames: 300
Frames used: 100
Original File size: 3713KB
Frame Rate = 25 fps
QP / Bitrate (kbps) / MSE (Y-component) / PSNR (Y-component) in dB / SSIM (Y-component)0 / 227.80 / 0.0129 / 65.567 / 0.999
10 / 89.42 / 0.49 / 52.24 / 0.997
25 / 14.25 / 8.01 / 40.54 / 0.987
40 / 2.22 / 103.55 / 29.11 / 0.843
50 / 0.723 / 433.89 / 20.66 / 0.610
Table 2: Results for news_qcif.yuv
Figure 11: news_qcif.yuv-MSE vs. bitrate
Figure 12: news_qcif.yuv-PSNR vs. bitrate
Figure 13: news_qcif.yuv-SSIM vs. bitrate
YUV file: foreman_qcif.yuv
Figure 14: foreman_qcif.yuv original file
QP =0 QP = 25QP = 50
Fig 15: Video quality at various QP values
Results:
QCIF sequence: foreman_qcif.yuv
Height: 176, Width: 144
Total no. of frames: 300
Frames used: 100
Original File size: 14850 KB
Frame Rate = 25 fps
QP / Bitrate (kbps) / MSE (Y-component) / PSNR (Y-component) in dB / SSIM (Y-component)
0 / 388.48 / 0.0079 / 69.182 / 0.999
10 / 172.25 / 0.4573 / 51.563 / 0.9975
25 / 23.794 / 7.999 / 39.134 / 0.9708
40 / 3.896 / 90.264 / 28.609 / 0.8457
50 / 1.1475 / 430.874 / 21.821 / 0.6112
Table 3: Results for foreman_qcif.yuv
Figure 16: foreman_qcif.yuv-MSE (Y-component) vs. bitrate
Figure 17: foreman_qcif.yuv-PSNR (Y-component) vs. bitrate
Figure 18: foreman_qcif.yuv- SSIM (Y-component) vs. bitrate
YUV file: container_qcif.yuv
Figure 19: container_qcif.yuv original file
QP=0 QP=25
QP=50
Fig 20: Video quality at various QP values
Results:
QCIF sequence: container_qcif.yuv
Height: 176, Width: 144
Total no. of frames: 300
Frames used: 100
Original File size: 14850KB
Frame Rate = 25 fps
QP / Bitrate (kbps) / MSE (Y-component) / PSNR (Y-component) in dB / SSIM (Y-component)0 / 3124.45 / 0.00805 / 68.126 / 0.999
10 / 1093.6 / 0.511 / 51.295 / 0.9964
25 / 196.73 / 16.111 / 37.776 / 0.9698
40 / 42.88 / 93.767 / 28.4307 / 0.852
50 / 4.22 / 363.973 / 22.783 / 0.6889
Table 4: Results for container_qcif.yuv
Figure 21: container_qcif.yuv- MSE (Y-component) vs. bitrate