FAST INTER AND INTRA MODE DECISION ALGORITHM
BASED ON THREAD-LEVEL PARALLELISM IN H.264 VIDEO CODING
Project Guide – Dr. K. R. Rao
Tejas Sathe (1000731145)
Objective:
To reduce H.264 video encoder complexity by incorporating fast inter and intra mode decision algorithm using thread level parallelism technique.
Motivation:
The most recent advances in microprocessor design for desktop computers involve putting multiple processors on a single computer chip. These multicore designs are completely replacing the traditional single core designs that have been the foundation of desktop computers.
The primary problem is that regular software has not been designed to take advantage of the new multicore architectures. In fact, to see any real speedup from the new multicore architectures, currently used software will have to be redesigned.
In H.264 encoder, the major complexity lies in Motion Estimation block. Using thread-level parallelization, not only hardware resources can be efficiently utilized, but also significant speed up can be achieved in encoding.
Introduction:
1.H.264 CODEC standard:
H.264/MPEG-4 Part 10 or AVC(Advanced Video Coding) is a standard by the ITU-T Video Coding Experts Group (VCEG) and the ISO/IEC Moving Picture Experts Group (MPEG). It is used for video compression, and is currently one of the most commonly used formats for the recording, compression, and distribution of high definition video.
H.264 is a new video compression scheme that is becoming the worldwide digital video standard for consumer electronics and personal computers. H.264/AVC has achieved a significant improvement in the rate-distortion efficiency providing, typically, a factor of two in bit-rate savings when compared with existing standards such as MPEG-2. In particular, H.264 has already been selected as a key compression scheme (codec) for the next generation of optical disc formats, HD-DVD and Blu-ray disc.
H.264 has following profiles as shown in Fig.1 [1]:
1. Baseline Profile: Real-time conversational services e.g. video conferencing and videophone.
2. Main Profile: Designed for digital storage media and television broadcasting
3. ExtendedProfile: Multimedia services over Internet
4. Four High Profiles: Content-contribution, content-distribution, and studio editing and post-processing
Fig.1. H.264 Profiles [1]
Fig.2. Block diagram of H.264 algorithm [1]
As shown in Fig.2 encoder does inter and intra prediction in order to get rid of spatial and temporal redundancy, respectively in the frames. For this, mode selection algorithms are used to select the best prediction mode for the current macro block within a frame.
To select the best mode for one Macro block in the intra prediction, the H.264/AVC encoder carries out 592 RDO calculations. As a result, the complexity of the encoder increases extremely.
This project focuses on the complexity reduction of encoder using thread level parallelism technique.
2. Thread level Parallelism:
The focus of software design and development will have to be changed from sequential programming techniques to parallel and multithreaded programming techniques.
3. Multicore [6]:
A multicore is an architecture design that places multiple processors on a single die (computer chip). Eachprocessor is called a core. As chip capacity increased, placing multiple processors on a single chipbecame practical.
These designs are known as Chip Multiprocessors (CMPs) because they allow for singlechip multiprocessing.
Multicore architectures are now center stage in terms of improving overall systemperformance.
CMPs come in multiple flavors: two processors (dual core), four processors (quad core), and eightprocessors (octa - core) configurations.
When implemented properly, threading can enhance performance by making better use of hardware resources.
To take advantage of multicore processors, knowledge of details of software threading model as well as capabilities of the platform hardware is necessary.
4. Thread [7]:
A thread can be defined from both, hardware and software point of view.
A thread is a discrete sequence of related instructions that is executed independently of other instruction sequences.
In a program there is at least one thread called main thread, which, furthermore, can create other threads.
On the other hand, at hardware level, thread is an execution path that remains independent of other hardware execution paths.
Goal:
Though thread-level parallelization can parallelize some of the threads within a process, major hurdle for implementation of the same is complicated data dependences in multimedia applications.
H.264 encoder has data dependences between the inter mode decision and the intra mode decision, especially when rate-distortion optimization (RDO) is used.
Goal is to implement RDO mode decision algorithm based on thread-level parallelization for the H.264 encoder using JM reference software (version 17.2), which can efficiently resolvethe dependences and exploit thread-level parallelism for fast mode decision.
Reduction in the total encoding time without PSNR loss and bit rate increment is the challenge in the project.
References:
[1] Soon-kak Kwon, A. Tamhankar and K.R. Rao,“Overview of H.264/MPEG-4 part 10”, Video/Image Processing and Multimedia Communications, 2003.
[2] T. Wiegand, et al “Overview of the H.264/AVC video coding standard”, IEEE Trans. on circuits and systems for video technology, vol. 13, pp. 560-576, July 2003.
[3] D. Marpe, T. Wiegand and G. J. Sullivan, “The H.264/MPEG-4 AVC standard and its applications”, IEEE Communications Magazine, vol. 44, pp. 134-143, Aug. 2006.
[4] J. Kim, D. Kim, and J. Jeong, “Complexity reduction algorithm for intra mode selection in H.264/AVC video coding” J. Blanc-Talon et al. (Eds.): ACIVS 2006, LNCS 4179, pp. 454 – 465, 2006.Springer-Verlag Berlin Heidelberg, 2006.
[5] Ju-Ho Hyun, “Fast mode decision algorithm based on thread-level parallelization and thread slipstreaming in h.264 video coding” Multimedia and Expo (ICME), 2010 IEEE International Conference
[6] Cameron Hughes and Tracey Hughes, “Professional Multicore Programming Design and Implementation for C++ Developers”, Wiley 2010
[7] Shameem Akhter and Jason Roberts,“Multi-Core Programming Increasing Performance through Software Multi-threading”, Intel Press 2006
[8] Eric Q. Li and Yen-KuangChen, “Implementation of H.264 Encoder on General-Purpose Processors with Hyper-Threading Technology”, Visual Communications and Image Processing 2004, edited by Sethuraman Panchanathan,Bhaskaran Vasudev, Proc. of SPIE-IS&T Electronic Imaging, SPIE Vol. 5308
[9] Bongsoo Jung, et al “Adaptive Slice-Level Parallelism for Real-Time H.264/AVC Encoder with Fast Inter Mode Selection”, Multimedia Systems and Applications X, edited by Susanto Rahardja, JongWon Kim, Jiebo Luo,Proc. of SPIE Vol. 6777, 67770J, (2007)
[10] E. Richardson, “The H.264 advance video compression standard”, 2nd Edition. Wiley 2010.