Multiplexing H.264/AVC Video Codec with AAC Audio Codec

Multiplexing H.264/AVC video codec with AAC audio codec

Proposal :

The objective of this thesis is to multiplex H.264 elementary video stream (Baseline profile) with AAC elementary audio stream (Low complexity profile) followed by de-multiplexing and achieving lip-sync between the corresponding video and audio.

H.264/AVC codec :

H.264, MPEG-4Part10, orAVC, for Advanced Video Coding, is a digital video codec standard which is noted for achieving very high data compression. It was developed by the ITU-T Video Coding Experts Group (VCEG) together with the ISO/IEC Moving Picture Experts Group (MPEG) as the product of a collective partnership effort known as the Joint Video Team (JVT)[1].

The AVC/H.264 standard defines four different Profiles: .

Baseline Profile offers I/P-Frames, supports progressive and CAVLC only
Extended Profile offers I/P/B/SP/SI-Frames, supports progressive and CAVLC only
Main Profile offers I/P/B-Frames, supports progressive and interlaced, and offers CAVLC or CABAC. .
High Profile (aka FRExt) adds to Main Profile: 8x8 intra prediction, custom quantization, lossless video coding, more YUV formats (4:4:4...)

Fig 1 : Various profiles of H.264[3]

The encoder block diagram is as shown in Fig 2:

Fig 2 : Encoder block diagram of H.264[1]

Advantages of H.264:

Up to 50% in bit rate savings: Compared to H.263v2 (H.263+) or MPEG-4[4]Simple Profile, H.264 permits a reduction in bit rate by up to 50% for a similar degree of encoder optimization at most bit rates.
High quality video: H.264 offers consistently good video quality at high and low bit rates.
Error resilience: H.264 provides the tools necessary to deal with packet loss in packet networks and bit errors in error-prone wireless networks.
Network friendliness: Through the Network Adaptation Layer (NAL), H.264 bit streams can be easily transported over different networks.
Wide areas of application streaming mobile TV, HDTV over IP, extended PVR and storage options for the home user

Due to these advantages and wide areas of application, H.264 codec is chosen for this thesis. Baseline profile of H.264 is used in the thesis.

Advanced Audio Coding (AAC) :

AAC is a wideband audio coding algorithm that exploits two primary coding strategies to dramatically reduce the amount of data needed to convey high-quality digital audio.

Signal components that are perceptually irrelevant are discarded.
Redundancies in the coded audio signal are eliminated.
The signal is processed by a modified discrete cosine transform (MDCT) according to its complexity.
Internal error correction codes are added.

The AAC encoder block diagram is given in fig 3.

Fig 3 : Encoder block diagram of AAC[9]

AAC takes a modular approach to encoding. Depending on the complexity of the bitstream to be encoded, the desired performance and the acceptable output, implementers may create profiles to define which of a specific set of tools they want use for a particular application. The standard offers the following three default profiles:

Main Profile. Uses all tools except the gain control module. Provides the highest quality for applications where the amount of random accessory memory (RAM) needed is not constrained.
Low-complexity Profile. Deletes the prediction tool and reduces the temporal noise-shaping tool in complexity. This is the most widely used profile
Sample-rate Scaleable (SRS) Profile. Adds the gain control tool to the low complexity profile. Allows the least complex decoder.

Depending on the AAC profile and the MP3 encoder, 96 kbit/s AAC can give nearly the same or better perceptional quality as 128 kbit/s MP3.

Advantages:

AAC is the first codec system to fulfill the ITU-R/EBU requirements for indistinguishable quality at 128 kbps/stereo[15]. It has approximately 100% more coding power than MPEG -I Layer II and 30% more power than the former MPEG performance leader, MPEG -1 Layer III.
Sample frequencies from 8 kHz to 96 kHz (official MP3: 16 kHz to 48 kHz)
Up to 48 channels
Higher efficiency and simpler filterbank (hybrid → pure MDCT)
Higher coding efficiency for stationary signals (blocksize: 576 → 1024 samples)
Higher coding efficiency for transient signals (blocksize: 128 → 192 samples)
Much better handling of frequencies above 16 kHz
More flexible joint stereo (separate for every scale band)
Superior performance at bit rates > 64 kbps and at bit rates reaching as low as 16 kbps.

Due to theseadvantages AAC audio codec is chosen. Low-complexity profile is used due do its wide application and as the name suggests low-complexity.

Proposed method of implementation :

Multiplexing :

The elementary video bit stream is obtained from the H.264/AVC encoder (10.2 JM software[12]) and elementary audio bit stream is obtained from the AAC encoder ( Psytel software[11]) . Then a system clock frequency of 90 KHz is generated and time-stamps for video and audio are decided in reference to this clock. Each video/audio frame is given a header and a time stamp. Then the two streams are multiplexed together. The single multiplexed stream contains the video and audio stream with the corresponding header and time stamps.

Demultiplexing :

The single stream is then de-multiplexed, by detecting the corresponding video and audio headers, into separate video and audio streams. The two separate streams are run into the corresponding decoders preserving the time stamps of the video and audio frames. At the end of this process decoded video and audio streams and the time stamps of video and audio frames are obtained.

Lip-sync:

Synchronization between video and audio is obtained by presenting the audio and video frames after comparing their time stamps with the system clock reference (SCR). The information regarding the SCR is contained in the multiplexed stream. This is used to generate the clock which is in synchronization with the system clock in the multiplexer side. This achieves lip-sync between video and audio. The introduction of time stamps also helps in achieving lip-sync even if the video is started from any arbitrary frame.

Softwares used:

H.264 / AVC codec: 10.2 JM software[12].

AAC codec : PsyTEL software[11].

The whole process is shown in the form of block diagram in fig 4.

Fig 4 : Block diagram of proposed implementation method

Current Progress:

I used the above mentioned softwares to obtain the encoded audio and video bit streams . I have so far managed to successfully multiplex and de-multiplex the video and audio bit streams by assigning and detecting the unique headers for the two bit streams. The multiplexing and de-multiplexing codes are written in MATLAB, which are listed below

Multiplexer code :

clear all

clc

videofile = input('enter the file name of the encoded video bitstream','s');

fid = fopen(videofile);

video = fread(fid);

audiofile = input('enter the file name of the encoded audio bitstream','s');

fid = fopen(audiofile);

audio = fread(fid);

vhead = '111111111';

ahead = '111111101';

lvideo = length(video);

laudio = length(audio);

video = dec2bin(video,8);

audio = dec2bin(audio,8);

v = 1;

a = 1;

while(v<= lvideo || a<=laudio)

if v == 1

stream =vhead;

else

stream = strcat(stream,vhead);

end

for i = 1:4

if (v<=lvideo)

stream = strcat(stream,video(v,:),'0');

v = v+1

else

stream = strcat(stream,dec2bin(0,8),'0');

end

stream = strcat(stream,ahead);

for i = 1:3

if (a<= laudio)

stream = strcat(stream,audio(a,:),'0');

a = a+1

else

stream = strcat(stream,dec2bin(0,8),'0');

end

fid = fopen('stream.txt' ,'wb');

fwrite(fid,stream);

fclose(fid);

Demultiplexer code:

clc

infile =input('enter the filename of the stream','s');

fid = fopen(infile);

stream = fread(fid);

stream = cellstr(stream);

stream = cell2mat(stream);

l = length(stream);

vhead = '111111111';

ahead = '111111101';

n=1

video = '';

audio = '';

while(n<l)

if stream(n:n+8) == vhead

n=n+9

for j = 1:100

if (n<l)

if ( (stream(n:n+8) == ahead) )

break

else

vbyte = stream(n:n+7);

video = strcat(video,vbyte);

n=n+9

end

else

break

end

elseif stream(n:n+8) == ahead

n=n+9

for k = 1:100

if (n<l)

if ( (stream(n:n+8) == vhead) )

break

else

abyte = stream(n:n+7);

audio = strcat(audio,abyte);

n=n+9

end

else

break

end

Possible improvements in the code:

The code even though successfully multiplexes and de-multiplexes the two streams; the time taken for the multiplexing process is pretty large. The structure will be modified during the later stages of the thesis in order to improve the efficiency. Also, the total size of the multiplexed bit stream file is not optimized. The header structure can be modified in order to achieve minimum number of extra bits added for header and time stamps.

Future work :

The future work will mainly be focused towards achieving lip-sync between the video and audio streams. So, the next step is to embed time stamps at the beginning of video and audio frames during multiplexing and detect them during de-multiplexing. This requires the generation of system clock for reference. Then this time stamps will be used to achieve lip-sync as mentioned above.

Timeframe :

The proposed work is planned to be completed by the end of spring semester.

Reference books and Papers:

[1] Soon-kak Kwon, A. Tamhankar andK.R. Rao ”Overview of H.264 / MPEG-4 Part10”, J. Visual Communication and Image Representation, vol. 17, pp.183-552, April 2006.

[2] G. Sullivan, P. Topiwalla and A. Luthra, “The H.264/AVC video coding standard: overview and introduction to the fidelity range extensions”, SPIE Conference on Applications of Digital Image Processing XXVII, vol. 5558, pp 53-74, August 2004.

[3] A.Puri,X.Chen and A. Luthra , “ Video coding using H.264/MPEG-4 AVC compression srtandard”, Signal processing : Image communication, vol 19, pp 793-849, oct 2004.

[4] J.Watkinson, “The MPEG Handbook” , Second Edition ,Oxford ; Burlington, MA : Elsevier/Focal Press, 2004

[5] P.D.Symes, “Digital video compression“,New York ,NY,McGraw-Hill, c2004 .

[6] C. Wootton, “Practical guide to video and audio compression : from sprockets and rasters to macro blocks”,Oxford : Focal, 2005.

[7] H. Kalva , et al “Implementing Multiplexing, Streaming,

and Server Interaction for MPEG-4”, IEEE Transactionsoncircuitsand systems forvideo technology, vol 9, december 1999.

[8]M. Bosi, M. Goldberg, and E. Richard “Introduction to digital audio coding and standards”,Boston : Kluwer Academic Publishers, 2003.

Reference Websites :

[9] Audio coding website

[10] MPEG official website

[11]AAC software :

[12] 10.2JM H.264 software

[13] AAC reference :

[14] Website with source codes

[15] Forum for AAC

[16] Forum for MPEG

[17] JVT documents :

[18] Audio test files

[19] Reference for H.264

[20] Free H.264 software

THESIS PROPOSAL

- Harishankar Murugan

Student I.D.: 000-05-5244

Date : Aug 9, 2006